What is RAID and how to create software RAID in linux environment

It is logical that most important things on a server are the data itself, not the hardware. The basic method to protect them (your data) is to replicate/distribute in 2 or more physical drives. This technique is called RAID, redundant array of independent disks.


There are 2 ways to create RAID:

Using hardware controller, commonly with built-in multiple hard drive interfaces.

In this case the RAID is called hardware RAID. Hardware RAID has great performance and doesn't use the host resources for needed calculations because everything is done by the controller (Except for some hybrid or pseudo-hardware RAID controllers). Combined with dedicated memory, called cache, hardware RAID is the best solution for applications that require high read/write throughput. The only downside is the cost of hardware controller.

Configuring the operating system to distribute the data across multiple drives.

In this case local resources will be used for necessary calculations. For that reason the performance is lower and resource consuming.



They are different schemas or data distribution to increase security and/or performance, called RAID level. Most commonly used are RAID 0, 1, 5, 6 and 10.

The idea is to split the data (your files) in stripes (block size) and send/receive from/to RAID and then each stripe in chunks (chunk size, stripe size) to read/write in/from each hard drive member. Stripe width is also the term that describe the number of hard drives excluding parity.

If D is the number of hard drives on RAID, C is the chunk size and P is the parity then the file will split up on ( C * ( D - P ) ) stripes (This is valid for RAID 0, 5, 6).



RAID 0

RAID 0 technique can be used between two or more hard drives. Called also striping, this RAID level has high performance but no fault tolerance. If one of the hard drives fails then all the data is lost and cannot be recovered.



RAID 0 illustrationEach portion of file (stripes) will split up (in chunks) and write simultaneously to two or more hard drive.

RAID 1

RAID 1 technique can be used between two or more hard drives. Called also mirroring, this RAID level has full fault tolerance. If one of the drives fails the data will be available.



RAID 1 illustration
The size of whole RAID is the same as the size of one member hard drive.

RAID 5

RAID 5 technique can be used between three or more hard drives. In this level the parity is 1 (single parity) and if one drive fails the data will be re-generated from parity, which in this case the parity information is rotated (distributed) on all drives (RAID 3 has a dedicated drive for parity).



RAID 5 illustrationRAID 6
The same logic as RAID 5 but on this level are rotated two parity on all drives. RAID 6 needs at minimum four hard drives to be configured. Even if two hard drives fails the data can be build from his double parity.

RAID 10

This level (called also RAID 1+0) is a nested (or hybrid) RAID, a combination of two RAID levels, basically a RAID 0 of two RAID 1 (A stripe between two sets of mirrored drives). The minimum required hard drives is four.



To create, manage and monitor Linux software RAID you have to use mdadm. On Linux you can create all standard RAIDs mentioned above. If mdadm is not available you have to install using command:

[[email protected] /]# yum install mdadm
Or

[[email protected] /]# dnf install mdadm
In our testing environment (centos 6.7) we have 4 hard drives (10GB each)

[[email protected] /]# ls /dev/sd[a-z]
/dev/sda /dev/sdb /dev/sdc /dev/sdd

/dev/sda is the system drive and the other drives are free to use for our tests, RAID 0, 1 and 5.
First you have to check if these drives are already using any RAID.

[[email protected] /]# mdadm --examine /dev/sd[b-z]
Software RAID 0

Using fdisk you have to create RAID partition on all disks:

[[email protected] /]# fdisk /dev/sdb
Command(m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1305, default 1):
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-1305, default 1305):
Using default value 1305

Command (m for help): p
Device Boot Start End Blocks Id System
/dev/sdb11 1 1305 10482381 83 Linux

You can type letter L to list all known partition types and check for Linux raid autodetect . Standard id for Linux raid autodetect is FD
Then change the partition system id to FD using command T. Print the partition table (press P) just to be sure if you have selected the right partition type. To write the table to disk and exit press W.

Repeat each step for next 2 drives or just copy the configuration from this drive with command:

[[email protected] /]# sfdisk -d /dev/sdb | sfdisk /dev/sdc
[[email protected] /]# sfdisk -d /dev/sdb | sfdisk /dev/sdd

Now create RAID Device /dev/md0 using the mdadm command with parameters -c (--create), -l (--level, RAID level) and -n (--raid-devices, number of hard drive members):

[[email protected] /]# mdadm -C /dev/ -l raid0 -n 3 /dev/sd[b-d]1
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started

Verify the status of RAID using the command below:

[[email protected] /]# cat /proc/mdstat
Personalities : [raid0]
md0 : active raid0 sdd1[2] sdc[1] sdb[0]
31421952 blocks super 1.2 512k chunks
unused devices: < none >

/proc/mdstat stores a snapshot of kernels RAID/ms state. Always check /proc/mdstat for possible failures.

Also you can use mdadm to examine the RAID:

[[email protected] /]# mdadm --detail /dev/md0
Or

[[email protected] /]# mdadm -E /dev/sd[b-d]1
Now is easy to create a filesystem on our RAID using mkfs and mount-it using mount command:

[[email protected] /]# mkfs.ext4 /dev/md0
[[email protected] /]# mkdir /storage
[[email protected] /]# mount /dev/md0 /storage
[[email protected] /]# df -h

Do not forget to modify /etc/fstab to auto-mount your RAID and save the configuration:

[[email protected] /]# vi /etc/fstab
Insert /dev/md0 /storage ext4 default 0 0

[[email protected] /]# mdadm -E -s -v >> /etc/raid.conf
The procedure is exactly the same for RAID 1, 5 and 6. You have to change the RAID level :

[[email protected] /]# mdadm -C /dev/ -l raid1 -n 3 /dev/sd[b-d]1
[[email protected] /]# mdadm -C /dev/ -l raid5 -n 3 /dev/sd[b-d]1
For RAID 6 you will need at least four hard drives due the double distributed parity.

[[email protected] /]# mdadm -C /dev/ -l raid6 -n 4 /dev/sd[b-e]1