2014-07-11

在GlusterFS架構下安裝Clustered Samba




1. Prerequisite and Foundation

1. CentOS 6.5
2. GlusterFS
3. CTDB
4. Samba

Terminology
CIFSCommon Internet File System簡單地說, Windows的網路上的芳鄰, 網路文件共享系統(CIFS)
NFSNetwork File System
PVPhysical Volume___
VGVolume Group___
LVLogical Volume___


1.1. 準備兩台機器, 各有三張網路卡介面



# NFS/CIFS access
192.168.18.220  nas1.rickpc gluster01
192.168.18.2  nas2.rickpc gluster02

# CTDB interconnect
192.168.3.101    gluster01c
192.168.3.102    gluster02c

# GlusterFS interconnect
192.168.2.101    gluster01g
192.168.2.102    gluster02g


1.2. 建立實體硬碟

若要瞭解Linux磁碟檔案系統的基本原理和如何使用fdisk來分切磁碟可參考[3]鳥哥網站精闢的介紹, 以下僅列出基本指令
## Prepare phylical partition to create /dev/sdb5
fdisk /dev/sib
partprobe

分切nas1和nas2的磁碟, 結果如下,
筆者所使用的硬碟為8G, 但只切出
/dev/sdb4 64M
/dev/sdb5 2.1G (將做為physical volume空間)

Disk /dev/sdb: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x9815603c

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               9        1044     8321670    5  Extended
/dev/sdb4               1           8       64228+  83  Linux
/dev/sdb5               9         270     2104483+  83  Linux

1.3. 建立Linux Volume

若對於PV, VG, LV的概念原理想深入瞭解的話, 可參考[4]鳥哥網站的解釋, 筆者的知識也都是來自於鳥哥的教學~



## Create phylical volume
pvcreate /dev/sdb5

## Create volume group
vgcreate vg_bricks /dev/sdb5

## Create logical volume
lvcreate -n lv_lock -L 64M vg_bricks
lvcreate -n lv_brick01 -L 1.5G vg_bricks

## Install XFS package
yum install -y xfsprogs

## format linux file system
mkfs.xfs -i size=512 /dev/vg_bricks/lv_lock
mkfs.xfs -i size=512 /dev/vg_bricks/lv_brick01
echo '/dev/vg_bricks/lv_lock /bricks/lock xfs defaults 0 0' >> /etc/fstab
echo '/dev/vg_bricks/lv_brick01 /bricks/brick01 xfs defaults 0 0' >> /etc/fstab
mkdir -p /bricks/lock
mkdir -p /bricks/brick01
mount /bricks/lock
mount /bricks/brick01

分別在nas1和nas2上建立PV, VG和LV, 結果如下:
[root@nas1 ~]# lvdisplay 
  --- Logical volume ---
  LV Path                /dev/vg_bricks/lv_lock
  LV Name                lv_lock
  VG Name                vg_bricks
  LV UUID                rnRNbZ-QFun-pxvS-AS3f-pvn3-dvCY-h3qXgi
  LV Write Access        read/write
  LV Creation host, time nas1.rickpc, 2014-07-04 16:54:20 +0800
  LV Status              available
  # open                 1
  LV Size                64.00 MiB
  Current LE             16
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:2
   
  --- Logical volume ---
  LV Path                /dev/vg_bricks/lv_brick01
  LV Name                lv_brick01
  VG Name                vg_bricks
  LV UUID                BwMD2T-YOJi-spM4-aarC-3Yyj-Jfe2-nsecIJ
  LV Write Access        read/write
  LV Creation host, time nas1.rickpc, 2014-07-04 16:56:11 +0800
  LV Status              available
  # open                 1
  LV Size                1.50 GiB
  Current LE             384
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:3

1.4. 安裝GlusterFS and create volumes

想瞭解CTDB與GlusterFS之間是如何運作以及如何安裝GlusterFS和CTDB, 可參考[5][6].

## Install GlusterFS packages on all nodes
yum install -y rpcbind glusterfs-server
chkconfig rpcbind on
service rpcbind restart
service glusterd restart
# Do not auto start glusterd with chkconfig.
## Configure cluster and create volumes from gluster01
## 將 gluster02g 加入可信任的儲存池 (Trusted Stroage Pool)
gluster peer probe gluster02g

## 確認信任關係
gluster peer status

## 建立 Volume: 在 glusterfs 的架構中,每一個 volume 就代表了單獨的虛擬檔案系統。
# transport tcp
gluster volume create lockvol replica 2 gluster01g:/bricks/lock gluster02g:/bricks/lock force
gluster volume create vol01 replica 2 gluster01g:/bricks/brick01 gluster02g:/bricks/brick01 force
gluster vol start lockvol
gluster vol start vol01

nas1和nas2分別建立了GlusterFS的虛擬檔案系統, 結果如下:

/dev/mapper/vg_bricks-lv_lock
                         60736    3576     57160   6% /bricks/lock
/dev/mapper/vg_bricks-lv_brick01
                       1562624  179536   1383088  12% /bricks/brick01
localhost:/lockvol       60672    3584     57088   6% /gluster/lock
localhost:/vol01       1562624  179584   1383040  12% /gluster/vol01

1.5. Install and configure Samba/CTDB

## Install Samba/CTDB packages on all nodes
# samba-3.6.9, samba-client-3.6.9, ctdb-1.0.114.5
yum install -y samba samba­client ctdb

## Install NFS
# rpcbind-0.2.0, nfs-utils-1.2.3
yum install -y rpcbind nfs-utils
chkconfig rpcbind on
service rpcbind start

## Configure CTDB and Samba only on gluster01
mkdir -p /gluster/lock
mount -t glusterfs localhost:/lockvol /gluster/lock

## Edit /gluster/lock/ctdb
CTDB_PUBLIC_ADDRESSES=/gluster/lock/public_addresses 
CTDB_NODES=/etc/ctdb/nodes
# Only when using Samba. Unnecessary for NFS. 
CTDB_MANAGES_SAMBA=yes
# some tunables
CTDB_SET_DeterministicIPs=1
CTDB_SET_RecoveryBanPeriod=120
CTDB_SET_KeepaliveInterval=5
CTDB_SET_KeepaliveLimit=5
CTDB_SET_MonitorInterval=15

## Edit /gluster/lock/nodes
192.168.3.101
192.168.3.102

## Edit /gluster/lock/public_addresses
192.168.18.201/24 eth0
192.168.18.202/24 eth0


## Edit /gluster/lock/smb.conf
[global]
    workgroup = MYGROUP
    server string = Samba Server Version %v
    clustering = yes
    security = user
    passdb backend = tdbsam
[share]
    comment = Shared Directories
    path = /gluster/vol01
    browseable = yes
    writable = yes


## Create symlink to config files on all nodes
mv /etc/sysconfig/ctdb /etc/sysconfig/ctdb.orig
mv /etc/samba/smb.conf /etc/samba/smb.conf.orig
ln -s /gluster/lock/ctdb /etc/sysconfig/ctdb
ln -s /gluster/lock/nodes /etc/ctdb/nodes
ln -s /gluster/lock/public_addresses /etc/ctdb/public_addresses
ln -s /gluster/lock/smb.conf /etc/samba/smb.conf

## Set SELinux permissive for smbd_t on all nodes due to the non-standard smb.conf location
yum install -y policycoreutils-python
semanage permissive -a smbd_t
# We'd better set an appropriate seculity context, but there's an open issue for using chcon with GlusterFS.

## Create the following script for start/stop services in /usr/local/bin/ctdb_manage
#!/bin/sh
function runcmd {
        echo exec on all nodes: $@
        ssh gluster01 $@ &
        ssh gluster02 $@ &
        wait
}
case $1 in
    start)
        runcmd service glusterd start
        sleep 1
        runcmd mkdir -p /gluster/lock
        runcmd mount  -t glusterfs localhost:/lockvol /gluster/lock 
        runcmd mkdir -p /gluster/vol01
        runcmd mount  -t glusterfs localhost:/vol01 /gluster/vol01
        runcmd service ctdb start
        ;;

    stop)
        runcmd service ctdb stop
        runcmd umount /gluster/lock
        runcmd umount /gluster/vol01
        runcmd service glusterd stop
        runcmd pkill glusterfs
        ;;
esac

1.6. Start services

## Set samba password and check shared directories via one of floating IP's.
pdbedit -a -u root

## test samba connection
smbclient -L 192.168.18.201 -U root
smbclient -L 192.168.18.202 -U root

## check Windows connection
ssh gluster01 netstat -aT | grep microsoft

2. Testing your clustered Samba

2.1. Client Disconnection

在一台Windows的PC上, 設定Z槽的網路磁碟機, 並執行下述的run_client.bat
echo off
:LOOP
echo "%time% (^_-) Writing on file in the shared folder...."
echo %time% >> z:/wintest.txt
sleep 2

echo "%time% (-_^) Writing on file in the shared folder...."
echo %time% >> z:/wintest.txt
sleep 2

每兩秒會將目前的timestamp寫入Z:/wintest.txt中, 測試步驟如下:
1. 執行run_client.bat
2. 將Windows上的網路卡介面關閉, 程式無法把資料寫入cluster file system
3. 重新啟動網路卡介面, 程式又在很短時間內寫入cluster file system


2.2. CTDB Failover

使用ctdb status和ctdb ip查看目前cluster file system的狀態
測試步驟:
1. 在Windows PC上執行run_client.bat
2. 在任一台Cluster node上, 關閉ctdb, 指令如下:
[root@nas2 ~]# ctdb stop
3. 觀察PC上的timestamp正常寫入cluster file system


2.3. Cluster Node Crash

將一台Cluster node reboot, 觀察Windows PC上的連線狀況
測試步驟:
1. 在Windows PC上執行run_client.bat
2. 將任一台Cluster node OS shutdown
3. 觀察PC上的timestamp的變化
"12:16:49.59 (-_^) Writing on file in the shared folder...."
"12:16:51.62 (^_-) Writing on file in the shared folder...."
"12:16:53.66 (-_^) Writing on file in the shared folder...."
"12:16:55.70 (^_-) Writing on file in the shared folder...."
"12:16:57.74 (-_^) Writing on file in the shared folder...."
"12:17:41.90 (^_-) Writing on file in the shared folder...."
"12:17:43.92 (-_^) Writing on file in the shared folder...."
"12:17:45.95 (^_-) Writing on file in the shared folder...."
"12:17:48.00 (-_^) Writing on file in the shared folder...."

紅色兩行的結果, 發現Winodws的連線會有數秒的中斷,  但若仍就符合一定程度的HA-level recovery

2.4. Ping_pong for CTDB lock rate

Ping_pong[7]是Samba open source所提供的一個小工具, 用來測量CTDB的lock rate
筆者稍微修改原程式碼, 並加入了將lock rate寫入到Graphite[7], 方便長時間觀察lock rate的變化
ping_pong.socket.c

3. Reference

[1] NFS 伺服器, 鳥哥


沒有留言 :

張貼留言