Storage Cluster Test
Raspberry Pi Setup
Arch Linux is my preferred distro for the Raspberry Pi and is used in this guide.
Arch Linux Arm:
https://archlinuxarm.org/
Repeat the following for all Pi's in the cluster.
Fixed IP
Disable networkd in favor of netctl:
systemctl disable systemd-networkd.service
Copy static IP example into place:
cd /etc/netctl
cp examples/ethernet-static .
Edit
ethernet-static
to have appropriate fixed IP:
Description='A basic static ethernet connection'
Interface=eth0
Connection=ethernet
IP=static
Address=('192.168.0.71/24')
#Routes=('192.168.0.0/24 via 192.168.1.2')
Gateway='192.168.0.1'
DNS=('192.168.0.1')
Enable
ethernet-static
config:
netctl enable ethernet-static
Set Hostname
Edit /etc/hostname:
pi01
Overclock the slower Pi B+
Note: holding down the shift key during boot up will disable the overclock for that boot, allowing you to select a lower level.
Edit /boot/config.txt:
arm_freq=1000
sdram_freq=500
core_freq=500
over_voltage=6
temp_limit=75
boot_delay=0
disable_splash=1
gpu_mem=16
Performance Utilities
Read the temperature:
/opt/vc/bin/vcgencmd measure_temp
SD Write Speed Test:
[root@pi01 ~]# sync; time dd if=/dev/zero of=~/test.tmp bs=500K count=1024; time sync
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 68.0134 s, 7.7 MB/s
real 1m8.033s
user 0m0.001s
sys 0m10.100s
real 0m11.875s
user 0m0.001s
sys 0m0.013s
SD Read Speed Test:
[root@pi01 ~]# dd if=~/test.tmp of=/dev/null bs=500K count=1024
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 27.4617 s, 19.1 MB/s
USB Write Speed Test:
mkdir USB
mount /dev/sda1 USB
[root@pi01 ~]# sync; time dd if=/dev/zero of=~/USB/test.tmp bs=500K count=1024; time sync
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 93.4846 s, 5.6 MB/s
real 1m33.499s
user 0m0.013s
sys 0m12.158s
real 0m16.322s
user 0m0.013s
sys 0m0.001s
USB Read Speed Test:
[root@pi01 ~]# dd if=~/USB/test.tmp of=/dev/null bs=500K count=1024
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 38.2419 s, 13.7 MB/s
Storage Setup
All 3 Pi's will have 4 USB sticks merged and mounted to /storage. The plan is to use a different method to create storage on each Pi.
Setup Storage on pi01 with mdadm
For this we will try using mdadm to create a software raid 10.
Source:
https://www.digitalocean.com/community/tutorials/how-to-create-raid-arrays-with-mdadm-on-ubuntu-16-04
Source:
http://www.ducea.com/2009/03/08/mdadm-cheat-sheet/
Delete partitions on USB drives:
fdisk /dev/sda
o
w
Identify the Component Devices:
lsblk -o NAME,SIZE,FSTYPE,TYPE,MOUNTPOINT
Output:
[root@pi01 ~]# lsblk -o NAME,SIZE,FSTYPE,TYPE,MOUNTPOINT
NAME SIZE FSTYPE TYPE MOUNTPOINT
sda 14.8G disk
sdb 14.8G disk
sdc 14.8G disk
sdd 14.8G disk
mmcblk0 7.4G disk
|-mmcblk0p1 200M vfat part /boot
`-mmcblk0p2 7.2G ext4 part /
Create the Array:
mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 /dev/sda /dev/sdb /dev/sdc /dev/sdd
Output:
mdadm: layout defaults to n2
mdadm: layout defaults to n2
mdadm: chunk size defaults to 512K
mdadm: partition table exists on /dev/sda
mdadm: partition table exists on /dev/sda but will be lost or
meaningless after creating array
mdadm: partition table exists on /dev/sdb
mdadm: partition table exists on /dev/sdb but will be lost or
meaningless after creating array
mdadm: partition table exists on /dev/sdc
mdadm: partition table exists on /dev/sdc but will be lost or
meaningless after creating array
mdadm: partition table exists on /dev/sdd
mdadm: partition table exists on /dev/sdd but will be lost or
meaningless after creating array
mdadm: size set to 15454208K
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
The mdadm tool will start to configure the array (it actually uses the recovery process to build the array for performance reasons). This can take some time to complete, but the array can be used during this time. You can monitor the progress of the mirroring by checking the /proc/mdstat file:
cat /proc/mdstat
Output:
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md0 : active raid10 sdd[3] sdc[2] sdb[1] sda[0]
30908416 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
[>....................] resync = 3.0% (949376/30908416) finish=62.5min speed=7987K/sec
unused devices: <none>
Create and Mount the Filesystem
Create a filesystem on the array:
mkfs.ext4 -F /dev/md0
Create mount point and mount filesystem:
mkdir /storage
mount /dev/md0 /storage
Show available space:
df -h -x devtmpfs -x tmpfs
Output:
Filesystem Size Used Avail Use% Mounted on
/dev/mmcblk0p2 7.0G 1.4G 5.3G 21% /
/dev/mmcblk0p1 200M 25M 176M 13% /boot
/dev/md0 29G 45M 28G 1% /storage
Save the Array Layout
To make sure that the array is reassembled automatically at boot, we will have to adjust the
/etc/mdadm/mdadm.conf
file. We can automatically scan the active array and append the file by typing:
mdadm --detail --scan | tee -a /etc/mdadm.conf
Afterwards, you can update the initramfs, or initial RAM file system, so that the array will be available during the early boot process:
How to do this with Arch. Don't think we need for our purposes.
Add the new filesystem mount options to the /etc/fstab file for automatic mounting at boot:
echo '/dev/md0 /storage ext4 defaults,nofail,discard 0 0' | tee -a /etc/fstab
Your RAID 10 array should now automatically be assembled and mounted each boot.
Speed tests with mdadm
RAID 10 Write Speed Test (Looks CPU bound?):
[root@pi01 storage]# time dd if=/dev/zero of=/storage/test.tmp bs=500K count=1024; time sync
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 253.617 s, 2.1 MB/s
real 4m14.632s
user 0m0.001s
sys 0m10.434s
real 0m44.117s
user 0m0.000s
sys 0m0.013s
RAID 10 Read Speed Test:
[root@pi01 storage]# dd if=/storage/test.tmp of=/dev/null bs=500K count=1024
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 15.0556 s, 34.8 MB/s
Try to break mdadm array
Source:
https://raid.wiki.kernel.org/index.php/Detecting,_querying_and_testing
I just plucked out the middle usb stick on a 3 drive RAID 10 and the system continued to function as normal.
Check status of array:
mdadm --detail /dev/md0
Output:
/dev/md0:
Version : 1.2
Creation Time : Wed Feb 21 22:03:50 2018
Raid Level : raid10
Array Size : 23181312 (22.11 GiB 23.74 GB)
Used Dev Size : 15454208 (14.74 GiB 15.83 GB)
Raid Devices : 3
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Wed Feb 21 23:34:48 2018
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : near=2
Chunk Size : 512K
Name : manjaro-borx01:0 (local to host manjaro-borx01)
UUID : a4ec3c18:da3ba91b:1e3fb75f:d1e61894
Events : 53
Number Major Minor RaidDevice State
0 8 32 0 active sync /dev/sdc
- 0 0 1 removed
2 8 64 2 active sync /dev/sde
Drive shows as "removed" so I'll skip the remove step. I plugged in the old drive and ran the following:
[root@manjaro-borx01 ~]# mdadm /dev/md0 -a /dev/sdd
mdadm: added /dev/sdd
Recovery looks like it will take almost as long as the original RAID build.
If I needed to remove the drive first I would do the following:
mdadm /dev/md0 -r /dev/sdd
Setup Storage on pi02 with btrfs
Source:
https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices
Source:
https://wiki.archlinux.org/index.php/Btrfs
Source:
https://www.howtoforge.com/a-beginners-guide-to-btrfs
Install btrfs user space utilities:
pacman -S btrfs-progs
Use raid10 for both data and metadata:
mkfs.btrfs -m raid10 -d raid10 /dev/sda /dev/sdb /dev/sdc /dev/sdd -f
Output:
btrfs-progs v4.15
See http://btrfs.wiki.kernel.org for more information.
Label: (null)
UUID: 77fecf7f-a8d3-43ba-89a1-9448baa463a9
Node size: 16384
Sector size: 4096
Filesystem size: 58.98GiB
Block group profiles:
Data: RAID10 2.00GiB
Metadata: RAID10 2.00GiB
System: RAID10 16.00MiB
SSD detected: no
Incompat features: extref, skinny-metadata
Number of devices: 4
Devices:
ID SIZE PATH
1 14.75GiB /dev/sda
2 14.75GiB /dev/sdb
3 14.75GiB /dev/sdc
4 14.75GiB /dev/sdd
Mount to
/storage
mkdir /storage
Once you create a multi-device filesystem, you can use any device in the FS for the mount command:
mount /dev/sda /storage
Add the following to
/etc/fstab
/dev/sda /storage btrfs defaults 0 1
Maintenance
Don't forget to periodically scrub:
btrfs scrub start /storage
Check status of scrub:
btrfs scrub status /storage
scrub status for 77fecf7f-a8d3-43ba-89a1-9448baa463a9
scrub started at Sat Mar 3 03:26:43 2018 and finished after 00:00:51
total bytes scrubbed: 1006.91MiB with 0 errors
Speed tests with btrfs
RAID 10 Write Speed Test:
[root@pi02 ~]# time dd if=/dev/zero of=/storage/test.tmp bs=500K count=1024; time sync
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 60.8095 s, 8.6 MB/s
real 1m0.850s
user 0m0.001s
sys 0m4.429s
real 0m25.362s
user 0m0.008s
sys 0m0.103s
RAID 10 Read Speed Test:
[root@pi02 storage]# dd if=/storage/test.tmp of=/dev/null bs=500K count=1024
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 23.4445 s, 22.4 MB/s
Try to break btrfs array
I pulled one drive while writing a 1GB file with no negative impact on the array.
Check status (See device 2 is missing):
[root@manjaro-borx01 pi]# btrfs filesystem show
Label: none uuid: d0f8e192-573b-4f4c-b81e-caa229a2c06c
Total devices 4 FS bytes used 2.15GiB
devid 1 size 14.75GiB used 3.01GiB path /dev/sdc
devid 3 size 14.75GiB used 3.01GiB path /dev/sde
devid 4 size 14.75GiB used 3.01GiB path /dev/sdf
*** Some devices missing
Replace missing drive:
btrfs replace start -f 2 /dev/sdg /storage
Show status:
[root@manjaro-borx01 pi]# btrfs filesystem show
Label: none uuid: d0f8e192-573b-4f4c-b81e-caa229a2c06c
Total devices 4 FS bytes used 2.15GiB
devid 1 size 14.75GiB used 3.06GiB path /dev/sdc
devid 2 size 14.75GiB used 3.06GiB path /dev/sdg
devid 3 size 14.75GiB used 3.06GiB path /dev/sde
devid 4 size 14.75GiB used 3.06GiB path /dev/sdf
Setup Storage on pi03 with zfs
Source:
https://project.altservice.com/issues/521
Source:
https://wiki.archlinux.org/index.php/ZFS
Prepare the Arch environment:
pacman -Syu
pacman -S base-devel cmake linux-headers
Uncomment the following in your sudoers file:
%wheel ALL=(ALL) ALL
Make sure you normal user is in the
wheel
group.
Now manually download
spl-dkms
and
zfs-skms
from AUR and extract in a regular users home directory:
wget https://aur.archlinux.org/cgit/aur.git/snapshot/spl-dkms.tar.gz
wget https://aur.archlinux.org/cgit/aur.git/snapshot/zfs-dkms.tar.gz
tar xvzf spl-dkms.tar.gz
tar xvzf zfs-dkms.tar.gz
Now two things need to be done in PKGBUILD for both packages:
Note: The following takes forever to install on a Raspberry Pi 1. Be patient it is working. It can take a well over an hour.
Now install spl-dkms:
cd spl-dkms
makepkg -csi
Now install zfs-dkms:
cd ../zfs-dkms
makepkg -csi
The above takes so long you may not be around when it is ready to actually install with pacman.
IF you miss the prompt you can run the following:
sudo pacman -U zfs-dkms-0.7.6-1-armv6h.pkg.tar.xz zfs-utils-0.7.6-1-armv6h.pkg.tar.xz
Now reboot to make sure you are running the proper kernel.
Install the zfs kernel module:
depmod -a
modprobe zfs
Check that the zfs modules were loaded:
lsmod
zfs 1229845 0
zunicode 322454 1 zfs
zavl 5993 1 zfs
zcommon 43765 1 zfs
znvpair 80689 2 zfs,zcommon
spl 165409 5 zfs,zavl,zunicode,zcommon,znvpair
Now future kernel updates will break this install and require manual intervention. To gain control over this, block upgrades to the kernel. This way you can choose when to upgrade. Edit pacman.conf and add the following line:
# Pacman won't upgrade packages listed in IgnorePkg and members of IgnoreGroup
IgnorePkg = linux*
Lets mount some disks
Enable zfs service:
systemctl enable zfs.target
The zfs on Linux developers recommend using device ids when creating ZFS storage pools of less than 10 devices. To find the id's, simply:
ls -lh /dev/disk/by-id/
The ids should look similar to the following:
lrwxrwxrwx 1 root root 13 Feb 24 01:38 mmc-00000_0x89f9628d -> ../../mmcblk0
lrwxrwxrwx 1 root root 15 Feb 24 01:38 mmc-00000_0x89f9628d-part1 -> ../../mmcblk0p1
lrwxrwxrwx 1 root root 15 Feb 24 01:38 mmc-00000_0x89f9628d-part2 -> ../../mmcblk0p2
lrwxrwxrwx 1 root root 9 Feb 24 20:40 usb-Flash_USB_Disk_37270114F12D098819575-0:0 -> ../../sda
lrwxrwxrwx 1 root root 9 Feb 24 20:40 usb-Flash_USB_Disk_37270324074E902919147-0:0 -> ../../sdd
lrwxrwxrwx 1 root root 9 Feb 24 20:40 usb-Flash_USB_Disk_3727064C62C1975323333-0:0 -> ../../sdc
lrwxrwxrwx 1 root root 9 Feb 24 20:40 usb-Flash_USB_Disk_37270929A6E8169419149-0:0 -> ../../sdb
Create a ZFS pool named storage which will mount at /storage:
zpool create -f storage raidz usb-Flash_USB_Disk_37270114F12D098819575-0:0 usb-Flash_USB_Disk_37270324074E902919147-0:0 usb-Flash_USB_Disk_3727064C62C1975323333-0:0 usb-Flash_USB_Disk_37270929A6E8169419149-0:0
To automatically mount a pool at boot time execute:
zpool set cachefile=/etc/zfs/zpool.cache storage
In order to mount zfs pools automatically on boot you need to enable the following services and targets:
systemctl enable zfs-import-cache
systemctl enable zfs-mount
systemctl enable zfs-import.target
Reboot to test.
Maintenance
Don't forget to periodically scrub the zfs pool:
zpool scrub storage
Check status of scrub:
zpool status
pool: storage
state: ONLINE
scan: scrub repaired 0B in 0h0m with 0 errors on Fri Mar 2 22:20:44 2018
config:
NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
usb-Flash_USB_Disk_3727073D57B5E50068851-0:0 ONLINE 0 0 0
usb-Flash_USB_Disk_37270159CFECB74014160-0:0 ONLINE 0 0 0
usb-SanDisk_Ultra_4C531001540408112233-0:0 ONLINE 0 0 0
usb-SanDisk_Ultra_4C531001600408109430-0:0 ONLINE 0 0 0
errors: No known data errors
Speed tests with zfs
First speed tests without tweaking
raidz (RAID5) Write Speed Test:
[root@pi03 ~]# time dd if=/dev/zero of=/storage/test.tmp bs=500K count=1024; time sync
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 159.19 s, 3.3 MB/s
real 2m39.212s
user 0m0.014s
sys 0m5.790s
real 0m7.041s
user 0m0.008s
sys 0m0.005s
raidz (RAID5) Read Speed Test:
[root@pi03 ~]# dd if=/storage/test.tmp of=/dev/null bs=500K count=1024
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 21.7062 s, 24.2 MB/s
Try to break zfs raidz array
First check status of working raidz array:
[root@manjaro-borx01 storage]# zpool status
pool: storage
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
usb-Flash_USB_Disk_3727073D57B5E50068851-0:0 ONLINE 0 0 0
usb-Flash_USB_Disk_37271220BA47887714528-0:0 ONLINE 0 0 0
usb-SanDisk_Ultra_4C531001540408112233-0:0 ONLINE 0 0 0
usb-SanDisk_Ultra_4C531001600408109430-0:0 ONLINE 0 0 0
errors: No known data errors
Now while writing a large file to the array, pluck out a random drive and check status:
[root@manjaro-borx01 storage]# zpool status
pool: storage
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: none requested
config:
NAME STATE READ WRITE CKSUM
storage DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
usb-Flash_USB_Disk_3727073D57B5E50068851-0:0 ONLINE 0 0 0
usb-Flash_USB_Disk_37271220BA47887714528-0:0 UNAVAIL 0 0 0
usb-SanDisk_Ultra_4C531001540408112233-0:0 ONLINE 0 0 0
usb-SanDisk_Ultra_4C531001600408109430-0:0 ONLINE 0 0 0
errors: No known data errors
Insert a new drive and replace UNAVAIL disk:
zpool replace storage usb-Flash_USB_Disk_37271220BA47887714528-0:0 usb-Flash_USB_Disk_37270159CFECB74014160-0:0
Check status:
[root@manjaro-borx01 storage]# zpool status
pool: storage
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Mar 1 23:31:45 2018
1.17G scanned out of 4.66G at 4.77M/s, 0h12m to go
300M resilvered, 25.16% done
config:
NAME STATE READ WRITE CKSUM
storage DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
usb-Flash_USB_Disk_3727073D57B5E50068851-0:0 ONLINE 0 0 0
replacing-1 DEGRADED 0 0 0
usb-Flash_USB_Disk_37271220BA47887714528-0:0 UNAVAIL 0 0 0
usb-Flash_USB_Disk_37270159CFECB74014160-0:0 ONLINE 0 0 0 (resilvering)
usb-SanDisk_Ultra_4C531001540408112233-0:0 ONLINE 0 0 0
usb-SanDisk_Ultra_4C531001600408109430-0:0 ONLINE 0 0 0
Raspberry Pi 3 with same USB Sticks
USB Write Speed Test:
mkdir USB
mount /dev/sda1 USB
[root@pi01 ~]# sync; time dd if=/dev/zero of=~/USB/test.tmp bs=500K count=1024; time sync
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 45.4632 s, 11.5 MB/s
real 0m45.473s
user 0m0.001s
sys 0m4.613s
real 0m11.998s
user 0m0.006s
sys 0m0.000s
USB Read Speed Test:
[root@pi01 ~]# dd if=~/USB/test.tmp of=/dev/null bs=500K count=1024
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 22.6531 s, 23.1 MB/s
Load Average was 3.78 to 4.2 with very minimal cpu usage while initializing the RAID
- So not CPU bound?
- Slow USB controller?
Load Average was 5+ with very minimal cpu usage while running speed test on RAID 10
RAID 10 Write Speed Test:
[root@pi01 storage]# time dd if=/dev/zero of=/storage/test.tmp bs=500K count=1024; time sync
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 156.956 s, 3.3 MB/s
real 2m36.963s
user 0m0.001s
sys 0m3.202s
real 0m51.538s
user 0m0.005s
sys 0m0.001s
RAID 10 Read Speed Test:
[root@pi01 storage]# dd if=/storage/test.tmp of=/dev/null bs=500K count=1024
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 5.77702 s, 90.8 MB/s
Core I7 with same USB Sticks
Not a great test on modern hardware
USB Write Speed Test:
mkdir USB
mount /dev/sda1 USB
[root@pi01 ~]# sync; time dd if=/dev/zero of=~/USB/test.tmp bs=500K count=1024; time sync
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 0.416458 s, 1.3 GB/s
real 0m0.436s
user 0m0.000s
sys 0m0.270s
real 0m52.598s
user 0m0.000s
sys 0m0.000s
USB Read Speed Test:
[root@pi01 ~]# dd if=~/USB/test.tmp of=/dev/null bs=500K count=1024
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 0.126265 s, 4.2 GB/s
Creating and resyncing the array is only slightly faster that on the Pi 3
RAID 10 Write Speed Test:
[root@manjaro-borx01 storage]# time dd if=/dev/zero of=/storage/test.tmp bs=500K count=1024; time sync
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 0.228207 s, 2.3 GB/s
real 0m0.231s
user 0m0.007s
sys 0m0.227s
real 1m11.674s
user 0m0.003s
sys 0m0.000s
RAID 10 Read Speed Test:
[root@manjaro-borx01 storage]# dd if=/storage/test.tmp of=/dev/null bs=500K count=1024
1024+0 records in
1024+0 records out
524288000 bytes (524 MB, 500 MiB) copied, 0.0977792 s, 5.4 GB/s
GlusterFS Cluster Setup
Source:
https://sysadmins.co.za/setup-a-3-node-replicated-storage-volume-with-glusterfs/
Source:
https://wiki.archlinux.org/index.php/Glusterfs
Source:
http://sumglobal.com/rpi-glusterfs-install/
Source:
https://nickhowell.co.uk/2016/07/23/raspberry-pi-nas-with-gluster/
First thing to do is add all nodes to DNS. Once that is done install glusterfs on all nodes:
pacman -S glusterfs rpcbind
Enable and start glusterd and rpcbind service on each node:
systemctl enable rpcbind.service
systemctl enable glusterd
systemctl start glusterd
Lets call pi01 the master. From master probe each peer:
[root@pi01 ~]# gluster peer probe pi01
peer probe: success. Probe on localhost not needed
[root@pi01 ~]# gluster peer probe pi02
peer probe: success.
[root@pi01 ~]# gluster peer probe pi03
peer probe: success.
Clear any test data from /storage and then create a 'brick' sub folder in each /storage folder:
cd /storage ; mkdir brick
List the gluster pool:
[root@pi01 ~]# gluster pool list
UUID Hostname State
426d9109-eb3d-4e87-b116-b1b7327245c2 pi02 Connected
97963c16-0073-491d-ab04-85bf1516294b pi03 Connected
36867961-309b-49e4-900a-b02093dee76d localhost Connected
Let's create our Replicated GlusterFS Volume, named gfs:
gluster volume create gfs replica 3 \
pi01:/storage/brick \
pi02:/storage/brick \
pi03:/storage/brick
volume create: gfs: success: please start the volume to access data
Ensure volume is created correctly:
[root@pi01 ~]# gluster volume info
Volume Name: gfs
Type: Replicate
Volume ID: 944475ff-82ef-4b0c-96f2-cdf946651a95
Status: Created
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: pi01:/storage/brick
Brick2: pi02:/storage/brick
Brick3: pi03:/storage/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
Start volume:
[root@pi01 ~]# gluster volume start gfs
volume start: gfs: success
View the status of our volume:
[root@pi01 ~]# gluster volume status gfs
Status of volume: gfs
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick pi01:/storage/brick 49152 0 Y 687
Brick pi02:/storage/brick 49152 0 Y 609
Brick pi03:/storage/brick 49152 0 Y 1533
Self-heal Daemon on localhost N/A N/A Y 709
Self-heal Daemon on pi02 N/A N/A Y 631
Self-heal Daemon on pi03 N/A N/A Y 1643
Task Status of Volume gfs
------------------------------------------------------------------------------
There are no active volume tasks
From a GlusterFS level, it will allow clients to connect by default. To authorize these 3 nodes to connect to the GlusterFS Volume:
[root@pi01 ~]# gluster volume set gfs auth.allow 192.168.0.71,192.168.0.72,192.168.0.73
volume set: success
Then if you would like to remove this rule:
gluster volume set gfs auth.allow *
Now mount the volume on each host:
mkdir -p /mnt/glusterClientMount
mount.glusterfs localhost:/gfs /mnt/glusterClientMount
Verify the Mounted Volume:
[root@pi03 storage]# df -h
Filesystem Size Used Avail Use% Mounted on
dev 232M 0 232M 0% /dev
run 239M 280K 239M 1% /run
/dev/mmcblk0p2 7.0G 1.7G 5.0G 26% /
tmpfs 239M 0 239M 0% /dev/shm
tmpfs 239M 0 239M 0% /sys/fs/cgroup
tmpfs 239M 0 239M 0% /tmp
/dev/mmcblk0p1 200M 25M 176M 13% /boot
storage 43G 128K 43G 1% /storage
tmpfs 48M 0 48M 0% /run/user/1000
localhost:/gfs 43G 435M 43G 1% /mnt/glusterClientMount
Now add a file at /mnt/glusterClientMount on one of the nodes and check that it exists on all 3 nodes in the same location.
Make it mount at boot on all nodes!:
echo 'localhost:/gfs /mnt/glusterClientMount glusterfs defaults,_netdev 0 0' >> /etc/fstab
GlusterFS Client Setup
Source:
http://docs.gluster.org/en/latest/Administrator%20Guide/Setting%20Up%20Clients/
Gluster Native Client
Add the FUSE loadable kernel module (LKM) to the Linux kernel:
modprobe fuse
Verify that the FUSE module is loaded:
dmesg | grep -i fuse
Install glusterfs tools on the client:
pacman -S glusterfs
Make sure your client is allowed to connect to the cluster:
[root@pi01 ~]# gluster volume set gfs auth.allow 192.168.0.71,192.168.0.72,192.168.0.73,192.168.0.99
volume set: success
Mount on the client:
mkdir -p /mnt/glusterClientMount
mount -t glusterfs pi01:/gfs /mnt/glusterClientMount
Make it mount at boot:
echo 'pi01:/gfs /mnt/glusterClientMount glusterfs defaults,_netdev 0 0' >> /etc/fstab
Followup
2018-03-13 Status
So pi03 with ZFS basically ate itself. It acted like a single physical drive failed du to too many write errors but I was unable to replace the device. It turns out two of the thumb drives are dead. They no longer come up with their correct drive id's. Instead they display as generic. I can partition them but one does not save and they both are tiny.
- This looked like it would work but didn't help. Could not get the brick going after rebuilding storage:
- Created
/storage/brick2
and replaced the old brick:
- gluster volume replace-brick gfs pi03:/storage/brick pi03:/storage/brick2/ commit force
- Watch status of heal
- gluster volume heal gfs info
pi01 with an mdadm array doesn't look good. One drive is flashing constantly with the cpu at around 3% and load average at 8.4. Running
mdadm --detail /dev/md0
is hung. After a reboot it looks like two disks died. One is readonly and acting funny. The other is just gone although it has a status led.
- Bottom left and top right are dead. Bottom left was the one flashing before reboot.
- Added a 3rd drive
- mdadm /dev/md0 -a /dev/sdb
- mdadm --detail /dev/md0
So far the most reliable has been pi02 with btrfs. No issues yet.
- Wear and tear better on the thumb drives?
- Just dumb luck?
- ZFS with the same drives on a real linux box were still ok.
2018-04-09 Status
pi01 is happy. I'm still missing one drive since I didn't have a spare but the array is in good shape.
pi02 is unreachable. Even after a power cycle I can ping or ssh into the machine.
pi03 with ZFS lost another drive. These USB sticks just suck and can't handle the load from ZFS. Actually the machine was also in bad shape because 512MB is not enough ram for this application. After a restart I could do zpool scrub and the bad drive was FAULTED.
gluster volume status
shows pi01 and pi03 are present. As expected pi02 is MIA.