BitUs DT: 7月 2014

1. Prerequisite and Foundation

1. CentOS 6.5
2. GlusterFS
3. CTDB
4. Samba

Terminology

CIFS	Common Internet File System	簡單地說, Windows的網路上的芳鄰, 網路文件共享系統(CIFS)
NFS	Network File System
PV	Physical Volume	___
VG	Volume Group	___
LV	Logical Volume	___

1.1. 準備兩台機器, 各有三張網路卡介面

# NFS/CIFS access

192.168.18.220  nas1.rickpc gluster01

192.168.18.2  nas2.rickpc gluster02

# CTDB interconnect

192.168.3.101    gluster01c

192.168.3.102    gluster02c

# GlusterFS interconnect

192.168.2.101    gluster01g

192.168.2.102    gluster02g

1.2. 建立實體硬碟

若要瞭解Linux磁碟檔案系統的基本原理和如何使用fdisk來分切磁碟可參考[3]鳥哥網站精闢的介紹, 以下僅列出基本指令
## Prepare phylical partition to create /dev/sdb5
fdisk /dev/sib
partprobe

分切nas1和nas2的磁碟, 結果如下,
筆者所使用的硬碟為8G, 但只切出
/dev/sdb4 64M
/dev/sdb5 2.1G (將做為physical volume空間)

Disk /dev/sdb: 8589 MB, 8589934592 bytes

255 heads, 63 sectors/track, 1044 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x9815603c

   Device Boot      Start         End      Blocks   Id  System

/dev/sdb1               9        1044     8321670    5  Extended

/dev/sdb4               1           8       64228+  83  Linux

/dev/sdb5               9         270     2104483+  83  Linux

1.3. 建立Linux Volume

若對於PV, VG, LV的概念原理想深入瞭解的話, 可參考[4]鳥哥網站的解釋, 筆者的知識也都是來自於鳥哥的教學~

## Create phylical volume

pvcreate /dev/sdb5

## Create volume group

vgcreate vg_bricks /dev/sdb5

## Create logical volume

lvcreate -n lv_lock -L 64M vg_bricks

lvcreate -n lv_brick01 -L 1.5G vg_bricks

## Install XFS package

yum install -y xfsprogs

## format linux file system

mkfs.xfs -i size=512 /dev/vg_bricks/lv_lock

mkfs.xfs -i size=512 /dev/vg_bricks/lv_brick01

echo '/dev/vg_bricks/lv_lock /bricks/lock xfs defaults 0 0' >> /etc/fstab

echo '/dev/vg_bricks/lv_brick01 /bricks/brick01 xfs defaults 0 0' >> /etc/fstab

mkdir -p /bricks/lock

mkdir -p /bricks/brick01

mount /bricks/lock

mount /bricks/brick01

分別在nas1和nas2上建立PV, VG和LV, 結果如下:

[root@nas1 ~]# lvdisplay 

  --- Logical volume ---

  LV Path                /dev/vg_bricks/lv_lock

  LV Name                lv_lock

  VG Name                vg_bricks

  LV UUID                rnRNbZ-QFun-pxvS-AS3f-pvn3-dvCY-h3qXgi

  LV Write Access        read/write

  LV Creation host, time nas1.rickpc, 2014-07-04 16:54:20 +0800

  LV Status              available

  # open                 1

  LV Size                64.00 MiB

  Current LE             16

  Segments               1

  Allocation             inherit

  Read ahead sectors     auto

  - currently set to     256

  Block device           253:2

  --- Logical volume ---

  LV Path                /dev/vg_bricks/lv_brick01

  LV Name                lv_brick01

  VG Name                vg_bricks

  LV UUID                BwMD2T-YOJi-spM4-aarC-3Yyj-Jfe2-nsecIJ

  LV Write Access        read/write

  LV Creation host, time nas1.rickpc, 2014-07-04 16:56:11 +0800

  LV Status              available

  # open                 1

  LV Size                1.50 GiB

  Current LE             384

  Segments               1

  Allocation             inherit

  Read ahead sectors     auto

  - currently set to     256

  Block device           253:3

1.4. 安裝GlusterFS and create volumes

想瞭解CTDB與GlusterFS之間是如何運作以及如何安裝GlusterFS和CTDB, 可參考[5][6].

## Install GlusterFS packages on all nodes

wget -nc http://download.gluster.org/pub/gluster/glusterfs/3.5/LATEST/RHEL/glusterfs-epel.repo -O /etc/yum.repos.d/glusterfs-epel.repo

yum install -y rpcbind glusterfs-server

chkconfig rpcbind on

service rpcbind restart

service glusterd restart

# Do not auto start glusterd with chkconfig.

## Configure cluster and create volumes from gluster01

## 將 gluster02g 加入可信任的儲存池 (Trusted Stroage Pool)

gluster peer probe gluster02g

## 確認信任關係

gluster peer status

## 建立 Volume: 在 glusterfs 的架構中，每一個 volume 就代表了單獨的虛擬檔案系統。

# transport tcp

gluster volume create lockvol replica 2 gluster01g:/bricks/lock gluster02g:/bricks/lock force

gluster volume create vol01 replica 2 gluster01g:/bricks/brick01 gluster02g:/bricks/brick01 force

gluster vol start lockvol

gluster vol start vol01

nas1和nas2分別建立了GlusterFS的虛擬檔案系統, 結果如下:

/dev/mapper/vg_bricks-lv_lock

                         60736    3576     57160   6% /bricks/lock

/dev/mapper/vg_bricks-lv_brick01

                       1562624  179536   1383088  12% /bricks/brick01

localhost:/lockvol       60672    3584     57088   6% /gluster/lock

localhost:/vol01       1562624  179584   1383040  12% /gluster/vol01

1.5. Install and configure Samba/CTDB

## Install Samba/CTDB packages on all nodes

# samba-3.6.9, samba-client-3.6.9, ctdb-1.0.114.5

yum install -y samba sambaclient ctdb

## Install NFS

# rpcbind-0.2.0, nfs-utils-1.2.3

yum install -y rpcbind nfs-utils

chkconfig rpcbind on

service rpcbind start

## Configure CTDB and Samba only on gluster01

mkdir -p /gluster/lock

mount -t glusterfs localhost:/lockvol /gluster/lock

## Edit /gluster/lock/ctdb

CTDB_PUBLIC_ADDRESSES=/gluster/lock/public_addresses 

CTDB_NODES=/etc/ctdb/nodes

# Only when using Samba. Unnecessary for NFS. 

CTDB_MANAGES_SAMBA=yes

# some tunables

CTDB_SET_DeterministicIPs=1

CTDB_SET_RecoveryBanPeriod=120

CTDB_SET_KeepaliveInterval=5

CTDB_SET_KeepaliveLimit=5

CTDB_SET_MonitorInterval=15

## Edit /gluster/lock/nodes

192.168.3.101

192.168.3.102

## Edit /gluster/lock/public_addresses

192.168.18.201/24 eth0

192.168.18.202/24 eth0

## Edit /gluster/lock/smb.conf

[global]

    workgroup = MYGROUP

    server string = Samba Server Version %v

    clustering = yes

    security = user

    passdb backend = tdbsam

[share]

    comment = Shared Directories

    path = /gluster/vol01

    browseable = yes

    writable = yes

## Create symlink to config files on all nodes

mv /etc/sysconfig/ctdb /etc/sysconfig/ctdb.orig

mv /etc/samba/smb.conf /etc/samba/smb.conf.orig

ln -s /gluster/lock/ctdb /etc/sysconfig/ctdb

ln -s /gluster/lock/nodes /etc/ctdb/nodes

ln -s /gluster/lock/public_addresses /etc/ctdb/public_addresses

ln -s /gluster/lock/smb.conf /etc/samba/smb.conf

## Set SELinux permissive for smbd_t on all nodes due to the non-standard smb.conf location

yum install -y policycoreutils-python

semanage permissive -a smbd_t

# We'd better set an appropriate seculity context, but there's an open issue for using chcon with GlusterFS.

## Create the following script for start/stop services in /usr/local/bin/ctdb_manage

#!/bin/sh

function runcmd {

        echo exec on all nodes: $@

        ssh gluster01 $@ &

        ssh gluster02 $@ &

        wait

}

case $1 in

    start)

        runcmd service glusterd start

        sleep 1

        runcmd mkdir -p /gluster/lock

        runcmd mount  -t glusterfs localhost:/lockvol /gluster/lock 

        runcmd mkdir -p /gluster/vol01

        runcmd mount  -t glusterfs localhost:/vol01 /gluster/vol01

        runcmd service ctdb start

;;

    stop)

        runcmd service ctdb stop

        runcmd umount /gluster/lock

        runcmd umount /gluster/vol01

        runcmd service glusterd stop

        runcmd pkill glusterfs

;;

esac

1.6. Start services

## Set samba password and check shared directories via one of floating IP's.

pdbedit -a -u root

## test samba connection

smbclient -L 192.168.18.201 -U root

smbclient -L 192.168.18.202 -U root

## check Windows connection

ssh gluster01 netstat -aT | grep microsoft

2. Testing your clustered Samba

2.1. Client Disconnection

在一台Windows的PC上, 設定Z槽的網路磁碟機, 並執行下述的run_client.bat

echo off

:LOOP

 echo "%time% (^_-) Writing on file in the shared folder...."

 echo %time% >> z:/wintest.txt

 sleep 2

 echo "%time% (-_^) Writing on file in the shared folder...."

 echo %time% >> z:/wintest.txt

 sleep 2

每兩秒會將目前的timestamp寫入Z:/wintest.txt中, 測試步驟如下:

1. 執行run_client.bat

2. 將Windows上的網路卡介面關閉, 程式無法把資料寫入cluster file system

3. 重新啟動網路卡介面, 程式又在很短時間內寫入cluster file system

2.2. CTDB Failover

使用ctdb status和ctdb ip查看目前cluster file system的狀態

測試步驟:

1. 在Windows PC上執行run_client.bat

2. 在任一台Cluster node上, 關閉ctdb, 指令如下:

[root@nas2 ~]# ctdb stop

3. 觀察PC上的timestamp正常寫入cluster file system

2.3. Cluster Node Crash

將一台Cluster node reboot, 觀察Windows PC上的連線狀況

測試步驟:

1. 在Windows PC上執行run_client.bat

2. 將任一台Cluster node OS shutdown

3. 觀察PC上的timestamp的變化

"12:16:49.59 (-_^) Writing on file in the shared folder...."

"12:16:51.62 (^_-) Writing on file in the shared folder...."

"12:16:53.66 (-_^) Writing on file in the shared folder...."

"12:16:55.70 (^_-) Writing on file in the shared folder...."

"12:16:57.74 (-_^) Writing on file in the shared folder...."

"12:17:41.90 (^_-) Writing on file in the shared folder...."

"12:17:43.92 (-_^) Writing on file in the shared folder...."

"12:17:45.95 (^_-) Writing on file in the shared folder...."

"12:17:48.00 (-_^) Writing on file in the shared folder...."

紅色兩行的結果, 發現Winodws的連線會有數秒的中斷, 但若仍就符合一定程度的HA-level recovery

2.4. Ping_pong for CTDB lock rate

Ping_pong[7]是Samba open source所提供的一個小工具, 用來測量CTDB的lock rate

筆者稍微修改原程式碼, 並加入了將lock rate寫入到Graphite[7], 方便長時間觀察lock rate的變化

ping_pong.socket.c

3. Reference

[1] NFS 伺服器, 鳥哥

[2] SAMBA 伺服器, 鳥哥

[3] Linux 磁碟與檔案系統管理, 鳥哥

[4] 邏輯捲軸管理員 (Logical Volume Manager), 鳥哥

[5] GlusterFS/CTDB Integration, Etsuji Nakai

[6] Clustered NAS For Everyone Clustering Samba With CTDB, Michael Adam

[7] Ping pong, Samba

[8] Graphite - Scalable Realtime Graphing

BitUs DT

2014-07-29

Log4j: Make a file appender that rolls the log file every time the application runs

When to Use It:

How to Do It:

How to Use It:

2014-07-11

在GlusterFS架構下安裝Clustered Samba