Fully redundant linux cluster
with shared storage

(Last revised: 25 Jan 2005)

For some time now I have been walking around with thoughts about how to do a fully redundant linux cluster with shared storage. My plan is not a High Performance Cluster, but a failover cluster.

Below you will see a picture showing how I have imagined the hardware setup.



First a few explanations about the setup:
The two switches are there to provide redundancy, they should be gigabit switches.
Each storage unit is also a server with two gigabit network cards. Again to provide redundancy.
The two 'frontend' servers requires at least 4 network cards, 2 gigabit cards and 2 100Mbit cards. The reason for this is that the gigabit cards are used for data transfer, one 100Mbit card is used for heartbeat and the last one is used for connection to the rest of the world.

Now to explain how this is meant to work. I will start from the bottom with the storage servers as these are actually the easiest to do.
The storage servers will share their diskspace to the network with the Enhanced Network Block Device (ENBD) module. That is more or less it.
The storage servers should share equally sized space to the network in order to enable the 'frontend' servers to use the space in a RAID set (more about this later).
Now to the network cards in the storage servers. There are two gigabit cards and I suggest that they do not get IP numbers in the same range. It could look something like this:

Storage 1 NIC 1: 192.168.1.1 / 255.255.255.0
Storage 1 NIC 2: 172.16.1.1 / 255.255.255.0

Storage 2 NIC 1: 192.168.1.2 / 255.255.255.0
Storage 2 NIC 2: 172.16.1.2 / 255.255.255.0

Storage 3 NIC 1: 192.168.1.3 / 2555.255.255.0
Storage 3 NIC 2: 172.16.1.3 / 255.255.255.0

That is it for the storage servers, when they are sharing their diskspace to the network and have two network cards on which they can be contacted, there is no more configuration for these servers. They are just 'stupid' disk storage enclosures.

Now to the tricky part, the 'frontend' servers. These are the ones that are going to do all the work of distributing data and making sure that at least one of them is allways online to answer requests from the internet.
To distribute data we again need ENBD as the plan is to use the 3 storage servers as one big RAID5. For this it seems to be a good idea to also get the fr5 driver installed as it should be better at handling resyncronisation of RAID5 sets. Had there been only two storage servers we could have used them as a mirror instead. But then we would need to install the fr1 driver.