All HowTo's Linux Redhat, Fedora and CentOS Linux

Testing Gluster 4.1.5 on CentOS 7

Gluster is an amazing Software Defined Storage (SDS) solution offered as part of the Redhat solutions portfolio. Here we push Gluster to see how it performs in different circumstances to help you understand the benefits and value of Gluster in your business.

You can see the next article in this series here. The next article focuses on Samba 4 native integration with Gluster.

This article demonstrates the installation and configuration process for Gluster 4.1.5 on CentOS 7. The servers are EC2’s in the Amazon IaaS environment. We compare the performance of replica and disperse volume schemes in both a normal working state and a degraded state. Finally we recover from the degraded state and present a summary of our findings.

The results of this article are two Gluster volumes with three participating servers. The volumes are in “replica” and “disperse” schemes.

The following steps where used to prepare the servers and client.

On all servers and the client Gluster system

Put this in “/etc/hosts” on servers and clients. Note the IP addresses used for the systems. Refer to them throughout the article.

10.0.1.229 server1
10.0.1.234 server2
10.0.1.177 server3
10.0.1.232 client1

On the Gluster servers

The server IP addresses are listed above. Do the following for each Gluster server.

ssh -l user 10.0.1.229
sudo su -
yum install -y centos-release-gluster wget
yum install -y glusterfs glusterfs-server

# Change the "serverX" to be the server name such as "server1", etc...
echo "serverX" > /etc/hostname
reboot

On the Gluster client

The IP address for the client in this demonstration is “10.0.1.232”.

ssh -l user 10.0.1.232
sudo su -
yum install -y centos-release-gluster wget
yum install -y glusterfs glusterfs-fuse
echo "client1" > /etc/hostname
systemctl enable glusterd
mkdir /mnt/gvol0
mkdir /mnt/gvol1
reboot

On each Gluster server

We need to add two disks to each Gluster server. In this demo we’re using loopback devices (fake disks) but you would use  a real block device in production. The following will create a 500MB disk and mount it at “/mnt/<volume-name>/export”.

For gvol0, the replica volume:

dd if=/dev/zero of=/gvol0 count=500000 bs=1024
mkfs.xfs /gvol0
mkdir /mnt/gvol0
mount /gvol0 /mnt/gvol0
mkdir /mnt/gvol0/export

For gvol1, the disperse volume:

dd if=/dev/zero of=/gvol1 count=500000 bs=1024
mkfs.xfs /gvol1
mkdir /mnt/gvol1
mount /gvol1 /mnt/gvol1
mkdir /mnt/gvol1/export

On server1

We start by creating an SSH key pair to allow easy management of the other servers. Note: A default CentOS 7 server requires changes to the “/etc/ssh/sshd_config” file, and a restart of SSHd to allow setting up of SSH keys using the following method.

ssh-keygen
ssh-copy-id server2
ssh-copy-id server3

We need to probe the nodes (Gluster servers) that we’re going to include in our trusted pool.

gluster peer probe server2
gluster peer probe server3

We’re going to create two Gluster volumes.

  1. The first volume is a replica volume with 3 nodes (three servers) which we’ll call “gvol0”. Files will be the same on all three server.
  2. The second volume is a dispersed (also called erasure) volume with 3 nodes (three servers) which we’ll call “gvol1”.
gluster volume create gvol0 replica 3 server{1..3}:/mnt/gvol0/export
gluster volume create gvol1 disperse-data 2 server{1..3}:/mnt/gvol1/export

Now we start both volumes:

gluster volume start gvol0
gluster volume start gvol1

At this stage we have two Gluster volumes; the first (gvol0) is the replica volume where all files are replicated to all node, and the second (gvol1) is the dispersed volume where all files are made highly available. In both volumes, we can expect to be able to lose a single node and still have access to all files on the remaining nodes.

On the client

We need to mount both gvol0 and gvol1 on the client system.

mkdir /mnt/gvol0
mkdir /mnt/gvol1
mount server1:/gvol0 /mnt/gvol0
mount server1:/gvol1 /mnt/gvol1

From the client’s point of view, the disk space available for the two volumes are as follows.

As you can see, the gvol0 (replica) volume has 500MB in total (because all nodes have the same data – mirrors) while the gvol1 (disperse) volume has 1G in total (because there are two bricks of 500MB and the third is for redundancy).

Filesystem Size Used Avail Use% Mounted on
server1:/gvol0 485M 30M 456M 7% /mnt/gvol0
server1:/gvol1 970M 60M 911M 7% /mnt/gvol1

Speed testing

I wrote a very small script to test the speed of both volumes. The test is very simple; to see how long it takes to sequentially write 100 file of 1MB each (total 100MB) to the volume using the FUSE client on the client system. The table below shows my results.

The lower the “Seconds Taken” the faster the writes.

Test Number Volume Name Seconds Taken
1 gvol0 / Replica 3.6
2 gvol0 / Replica 3.6
3 gvol0 / Replica 3.8
4 gvol0 / Replica 3.8
5 gvol1 / Disperse 1.9
6 gvol1 / Disperse 2.1
7 gvol1 / Disperse 2.1
8 gvol1 / Disperse 2.3

Degraded testing

The Gluster volumes used in this environment can both tolerate the loss of one out of the three nodes. The following table shows this and repeats the above performance tests to see if degraded volumes affect write speeds.

The lower the “Seconds Taken” the faster the writes.

Test Number Volume Name Seconds Taken
1 gvol0 / Replica 3.7
2 gvol0 / Replica 2.8
3 gvol0 / Replica 3.0
4 gvol0 / Replica 3.3
5 gvol1 / Disperse 1.3
6 gvol1 / Disperse 1.3
7 gvol1 / Disperse 1.4
8 gvol1 / Disperse 1.4

Recovery from the degraded state

The second set of performance tests above were done while Gluster (both volumes) were in a degraded state. The server2 node was powered off. The following command needed to be run on restart of server2. Note that i had to remount the bricks but in production that should be done automatically using the “/etc/fstab”.

gluster volume heal gvol0 enable
gluster volume heal gvol1 enable

And monitor the recovery using the command:

gluster volume heal gvol0 info summary
gluster volume heal gvol1 info summary

You should see results change from this (a bad state):

Brick server1:/mnt/gvol0/export
Status: Connected
Total Number of entries: 100
Number of entries in heal pending: 100
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick server2:/mnt/gvol0/export
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick server3:/mnt/gvol0/export
Status: Connected
Total Number of entries: 100
Number of entries in heal pending: 100
Number of entries in split-brain: 0
Number of entries possibly healing: 0

To this (a good state):

Brick server1:/mnt/gvol0/export
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick server2:/mnt/gvol0/export
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick server3:/mnt/gvol0/export
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

We’ll finish this article by conducting the test one more time.

The lower the “Seconds Taken” the faster the writes.

Test Number Volume Name Seconds Taken
1 gvol0 / Replica 3.7
2 gvol0 / Replica 3.6
3 gvol0 / Replica 3.7
4 gvol0 / Replica 3.7
5 gvol1 / Disperse 1.8
6 gvol1 / Disperse 2.4
7 gvol1 / Disperse 1.9
8 gvol1 / Disperse 2.2

Summary

Both replica and disperse volume schemes remained available while in a degraded state. Disperse provided more space (was more efficient) with the same brick nursers and sizes. Both recovered from the degraded state without issues. Disperse performed better during normal state (non-degraded) and while in a degraded state.

I accept that my tests were unscientific and simplistic and focused on writes only. Consider this in your tests.

Leave a Reply

Your email address will not be published. Required fields are marked *