Tuesday, June 28, 2016

How to create a software RAID array in Linux | mdadm


In my previous post I explained on the most common used RAID levels and their write penalties. In this post I will show how you can easily create a software RAID array. These type of solutions are very common in a home NAS or it can also be used as an extension to a hardware RAID solution since not all RAID controllers support RAID 5+0 which is striped volume of two or more RAID 5 volumes to increase the availability, some speed and still deliver a large(r) usable volume.




Home NAS

The one RAID level I only mentioned but did not compare against the other RAID levels (1, 5 and 1+0) was level 0 itself. I did mention that RAID level 0 should not be used in production environments but I do see an exception when it comes to  a home NAS. I use RAID 0 myself as partition to store my downloads which can be re-downloaded again in case of disk failure. Typically these files are large such as .iso files containing a variety of Linux distributions and backups of my DVD collection. Next to this volume I also have a volume which is a RAID 1 array to hold backups of my photo's, video's and other documents.

mdadm in short

The Linux utility used is called mdadm and stands for Multiple Device Administration. In CentOS 7 minimal (which I used for this demo) this utility is not installed by default so lets install it first.

sudo yum -y install mdadm

When you issue the mdadm --help command you will find that there are not over a million possible options. In fact, the options listed are all you need. However, for this post I will only focus on how to create an array. You will need to create raid partitions in the drives in order to create an array out of them. But, hey, wait, that means that you can actually mix different RAID levels on the physical disks...

Partitioning the drives

I have a machine with 4 hard drives. Looking at the output the following fdisk command we can see three empty hard drives. Namely /dev/sdb, /dev/sdc and /dev/sdd.
[user@demo ~]$ sudo fdisk -l | grep dev
Disk /dev/sda: 8589 MB, 8589934592 bytes, 16777216 sectors
/dev/sda1   *        2048     1026047      512000   83  Linux
/dev/sda2         1026048    16777215     7875584   8e  Linux LVM
Disk /dev/sdb: 8589 MB, 8589934592 bytes, 16777216 sectors
Disk /dev/sdc: 8589 MB, 8589934592 bytes, 16777216 sectors
Disk /dev/sdd: 8589 MB, 8589934592 bytes, 16777216 sectors
Disk /dev/mapper/centos-root: 7159 MB, 7159676928 bytes, 13983744 sectors
Disk /dev/mapper/centos-swap: 859 MB, 859832320 bytes, 1679360 sectors

In order to be able to create an MD (software raid) device you will need to partition your drives so that they contain so called raid partitions. These partitions will be the input partitions for the MD device. In this example we have three physical HDDs. We will only use a part of the physically available space so that we can add another type of RAID level next to this first example.

Here we will create a 2GB raid partition on each of the drives. My favorite tool is still fdisk to accomplish this partitioning task.
[user@demo ~]$ sudo fdisk /dev/sdb
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them. Be careful before using the write command.

Device does not contain a recognized partition table Building a new DOS disklabel with disk identifier 0xf302fdde.

Command (m for help):

In short you would need to create a new partition using:
  1. n (as in "new") 
  2. p (as in "primary"
  3. 1 (as in partition number 1)
  4. [enter] (as in leave the default first sector)
  5. +2G (as in add a 2GB partition)
  6. t (to into the change partition type id selector)
  7. fd (to set the partition type to "Linux raid auto")
  8. w (as in write the changes to the disk and exit fdisk)

Execute the above steps for all the physical drives. Just change the /dev/sd* name every time you execute the fdisk command.

Now issue the "sudo partprobe" command in order to scan the physical drives for their partitions.

[user@demo ~]$ sudo fdisk -l | grep dev
Disk /dev/sda: 8589 MB, 8589934592 bytes, 16777216 sectors
/dev/sda1   *        2048     1026047      512000   83  Linux
/dev/sda2         1026048    16777215     7875584   8e  Linux LVM
Disk /dev/sdb: 8589 MB, 8589934592 bytes, 16777216 sectors
/dev/sdb1            2048     4196351     2097152   fd  Linux raid autodetect
Disk /dev/sdc: 8589 MB, 8589934592 bytes, 16777216 sectors
/dev/sdc1            2048     4196351     2097152   fd  Linux raid autodetect
Disk /dev/sdd: 8589 MB, 8589934592 bytes, 16777216 sectors
/dev/sdd1            2048     4196351     2097152   fd  Linux raid autodetect
Disk /dev/mapper/centos-root: 7159 MB, 7159676928 bytes, 13983744 sectors
Disk /dev/mapper/centos-swap: 859 MB, 859832320 bytes, 1679360 sectors

As you can see we now have /dev/sdb1, /dev/sdc1 and /dev/sdd1 which are the raid partitions

Create a RAID 5 array

Using the three raid partitions from above we can now create a RAID 5 volume using mdadm. This process is really simple so I will just show you the command to do this. Please note that this command is on one line.
[user@demo ~]$ sudo mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1

The output will be
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

You can verify the status of the raid array by looking at the /proc/mdstat file.
[user@demo ~]$ sudo cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd1[3] sdc1[1] sdb1[0]
      4190208 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>

 /dev/md0 is your RAID 5 volume and is now available to format using your favorite file system. I will use xfs here since that is the default in CentOS 7.
sudo mkfs.xfs /dev/md0

Create a mount folder
sudo mkdir /raid5

And mount the volume
sudo mount /dev/md0 /raid5/

You can also make the mount persistent by adding it into /etc/fstab by adding this line to it
/dev/md0        /rair5  xfs     defaults        0 0


If you would look at the output of the fd -h command you will find that /raid5 is 4GB in size. Which is correct since we created a raid 5 array. You will loose one the space of one raid partition (2GB) which is reserved for the parity data.

Other RAID levels

Creating RAID 0 is a similar process but then in stead of --level=5 use --level=0
[user@demo ~]$ sudo mdadm --create /dev/md0 --level=0 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1

Creating a RAID 1 is limited to 2 drives (or raid partitions)
[user@demo ~]$ sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdb1 /dev/sdc1


Have fun!

Tuesday, April 19, 2016

Protecting your data | RAID levels and their write penalties

Protecting your data | RAID levels and their write penalties

RAID Levels


Redundant array of independent disks or RAID is commonly used in servers, enterprise storage appliances and even your home NAS (I hope) to protect the data against disk failure.

Usually a RAID array is created using a RAID controller but software RAID is possible too. Common is that in an enterprise environment hardware RAID controllers are used since they offer some benefits over a software RAID implementation. In either case, if a disk fails, the data would not be lost. In fact, your storage will stay online as long as you did not create a RAID 0 volume or a so called striped volume.

Three types of RAID or RAID levels are very common. But each have their own characteristics.

  • RAID 1
  • RAID 5
  • RAID 1+0

RAID 0 is not on the list, simply because if its intolerance for disk failure. In other words, if just one disk fails you loose all data on the whole volume. RAID 6 has a high write penalty and therefor I will only focus on the listed RAID levels.

RAID 1

This RAID level is most commonly seen in servers for the OS and application files. RAID 1 is a mirrored pair of disks. Every write is mirrored so the speed of the write speed is equal to one disk. The write penalty on a RAID 1 volume is 2. Reads however are coming from both disks simultaneously. If one disk fails then the other disk will continue to respond to the read and write (I/O) requests. The broken disk can be replace on-line and will usually start the sync data from the good disk. This is called a rebuild.

RAID 5

RAID 5 is usually used for larger volumes where write speed is not very important. Three or more disks are members of the array. If you have 3 900GB drives the net capacity will be 1800GB. The capacity of one drive is subtracted. That space is used for parity blocks. Please note that the parity blocks do not reside on that disk though. There are evenly distributed as shown in the simple table below.

disk 0 disk 1 disk 2
data data parity
data parity data
parity data data

The write penalty here is determined by what one write I/O causes. The old data and the parity belonging to it must be read. Then the new data and the parity must be written. Counting the IO actions, 2 reads and 2 writes, leads to 4. That results in a write penalty on a RAID 5 volume of 4.

RAID 1+0

This RAID level is commonly used in scenarios where you would want to have the most redundancy and the lowest write penalty. Data is striped and the mirrored. Usually database files are stored on such type if volumes. However, this results in half the net storage capacity. To get a net 1.8TB volume you would need 4 900GB drives.

Performance and write penalty

Now that we have covered the basics of the most commonly used RAID levels I will dive a little further into the performance of each RAID level. As you have read, I touched upon the write penalties. In all three levels (1, 5 and 1+0) reads are coming from all disks. The speed of the writes are depending on the write penalties.

The following tabels summarized the RAID levels, minimal number of disks and their write penalties.

RAID level Minimum # disks Write penalty
RAID 1 2 (max) 2
RAID 5 3 4
RAID 1+0 4 2

If a disk can deliver 180 I/O per second (based on the average seek time and the rotational delay) then if you have just 1 disk your maximum random writes (in a 100% write scenario) would be limited to 180 IOPS. If you have a RAID 1 array, and therefor 2 disks with a combined performance op 2*180=360 random IOPS then you would need to divide that by the write penalty. So, on a RAID 1 array you will have 360 random IOPS in a 100% read scenario and 180 random IOPS in a 100% write scenario.

With RAID 5 you would need at least 3 disks. Here it would be 3*180=540 IOPS. Now divide that by the RAID 5 write penalty which is 4 you would end up having 540/4=135 random write IOPS in a 100% write scenario. But each time you add a disk the performance increases.

At RAID 1+0 the write penalty is the same as RAID level 1 but you need at least 4 disks. The maximum number of random write IOPS in a 100% write scenario would then be (4*180)/2=360. Now, if you would have 8 disks in a RAID 1+0 array you would end up with (8*180)/2=720 random IOPS.

To conclude this post, lets compare RAID 5 and RAID 1+0 in the following graph using the same amount of disks

Quite a big difference...

Perhaps I should write up some examples on how to create a software based RAID in Linux in a future post. Fun!

Saturday, March 26, 2016

Search and replace | Linux command "sed"


SED, search and replaceNext to the commands grep and cut the sed command is what I often use. Using the sed command you can search and replace strings in files. I find this tool handy when it comes to configuration files.

You can create template files from configuration files where you would put a unique string on the configuration items and then use set to replace these values.
Lets take the following contents of the file fruit.txt for example.
apple green tree
banana yellow tree
strawberry red plant
Now you could replace the word apple with pear by executing the following command.
sed 's/apple/pear/g' fruit.txt
pear green tree
banana yellow tree
strawberry red plant
In the command parameter above you see the "s" which stands for search and the "g" for global or search the whole file.

I use this for configuration files which have for example the Listen configuration item. Then I would save the file locally and set the Listen parameter to be "addrport" then I use the above command for this.
sed 's/addrport/192.168.1.10:80/g' /etc/httpd.conf
...
Listen 192.168.1.10:80
...
The above command with its parameters outputs the result to the standard output (to the shell) so if you want to leave the source file intact you can write the output to another file as well. You can do this like this.
sed 's/addrport/192.168.1.10:80/g' /etc/httpd.conf > /tmp/httpd.conf
And if you want to modify the source file directly (can be dangerous though) you can add the -i parameter which means in file.
sed -i 's/addrport/192.168.1.10:80/g' /etc/httpd.conf
As with most Linux command line tools and commands sed has a great variety of parameters and options. I mostly use the sed command like in the examples but you can find the manual here

Tuesday, March 15, 2016

SSD comparison | Consumer vs Enterprise SSD

SSD comparison | Consumer vs Enterprise SSD
The storage world has had a major upgrade when Solid State Disks came to market just a few years ago. Quickly the market was overwhelmed with many sizes and types of these ultra fast drives. Different types of cells like multi level cells, single level cells, cell sizes, different interfaces such as PCI-e, SATA and SAS and a great variety in performance and endurance.

Many IT colleagues had amazing posts with graphs on the raw throughput in MB/Sec. And then what happens when we create a striped volume of a bunch! Wow!

But when do you "need" an SSD? And the next questions would be "which type do I need then?" and "SSD has a short lifespan doesn't it?". Often the pricing of an enterprise SSD can result in quite a shock to your manager or purchasing authority.

In this post I will try to explain high level differences between the four major SSD types as indicated by the hardware vendors such as Dell, HP, IBM and obviously the SSD manufacturers themselves. And when to choose which type of SSD for your application.

So let's kick off with the four major types of SSDs and the models I used for this article.

The table below shows which SSDs I compare in this article. Please note that the prices are derived from pricespy.

Type Brand and Model Capacity Interface Endurance Warranty Price
Consumer grade Intel 730 Series 480 GB SATA 600 70GB/day 5 years € 269
Ent. Read intensive Seagate 1200.2 Light Endurance 480 GB SAS 12Gbps 3 DWDP 5 years € 1054
Ent. Mix use Seagate 1200.2 Mainstream Endurance 400 GB SAS 12Gbps 10 DWPD 5 years € 1286
Ent. Write intensive Seagate 1200.2 High Endurance 400 GB SAS 12Gbps 25 DWDP 5 years € 2000


As you may have noticed I did not split based on single level cell and multi level cell based SSDs because there isn't that much of a difference anymore, for 99% of the use cases. Additionally I have included the consumer grade SSD to the equation. The simple reason for this is to provide an answer to the question "Why don't we just drive by the local computer shop and by some cheap SSDs! They are much cheaper!"

SSD Endurance

In the table above it is also clear that different manufacturers have different notations for the endurance. It would be more easy to compare if they had the same notation. The warranty is usually limited and directly ties in to the endurance figures. For example Intel's 730 480GB drive is rated for 70GB per day written for a duration of 5 years (the warranty) and the Seagate 1200.2 High endurance is rated for 25 Drive Writes Per Day (DWDP). You can imagine that there is a great difference between the listed SSDs. Lets convert this number to Terra Bytes Written (TBW) and GB per day. TBW is the standard endurance metric proposed by JEDEC (JESD218). They also have created a standard for workload testing as well (JESD219). 

Type Brand and Model Capacity Endurance Warranty TBW GB/day
Consumer grade Intel 730 Series 480 GB 70GB/day 5 years 127,75 70
Ent. Read intensive Seagate 1200.2 Light Endurance 480 GB 3 DWDP 5 years 2628 1440
Ent. Mix use Seagate 1200.2 Mainstream Endurance 400 GB 10 DWPD 5 years 7300 4000
Ent. Write intensive Seagate 1200.2 High Endurance 400 GB 25 DWDP 5 years 18250 10000

Quite significant isn't it? Let's Put it in a graph and divide the TBW over the 5 year warranty as well to show it visually:
So depending on the workload you are planning to put on an SSD you must profile how much data is going to be written to it. Don't forget that RAID also impacts the writes. Especially RAID5 or 6. Each write comes with parity but I will cover RAID levels and their characteristics in another post.

SSD Performance

Just to quickly touch upon the speeds of these drives I'll be very short here since the focus is on endurance. The random writes of the Intel consumer grade SDD are actually higher then the enterprise mix use SSD. This could be due to different controllers and/or a different type of cell like triple level, multi level and single level cells. Or perhaps even because I am comparing two different brands.

Type Brand and Model Random 4KB read Random 4KB write
Consumer grade Intel 730 Series 89000 74000
Ent. Read intensive Seagate 1200.2 Light Endurance 190000 35000
Ent. Mix use Seagate 1200.2 Mainstream Endurance 200000 67000
Ent. Write intensive Seagate 1200.2 High Endurance 200000 80000


When to use SSDs

This might be the most tricky question since there is not one clear answer.

For example, only looking at the price per GB then SSD is very expensive compared to traditional spinning platter hard drives. Obviously, if you need TB's of storage which is infrequently accessed then don't go for SSD.

If you database server is utilizing a lot of swap space, or page file if you like, then rather consider adding more RAM to your box. But if you system is waiting for the disk all the time then you could consider SSD. You can easily check this by monitoring the wa counter extracted from the Linux top command or the disk queue length in the Windows performance monitor. They should be zero. An increasing wa or queue length counter can point to other things as well. Among them is fragmentation.

Basically try to solve the problem first before thinking of just putting it an SSD. High IO applications could really benefit form SSD anyway.

I use them too

Just to give you an idea, I am managing +/- 300 physical servers and only 4 of them are equipped with SSDs. The applications we run on them are two NGINX reverse caching proxies, an SQL server and a Splunk instance. 

NGINX reverse proxy

I chose to put SSDs in the NGINX servers since we wanted to be able to squeeze up to 10Gbit/s out of the cache  which would be around 800GB in size, and since the requests are completely random for small and medium size files a lot of random IO was expected. We use two 800GB read intensive SSD's in RAID 1 since we estimated that not more then 1 write cycle per day would happen.

SQL Server

This SQL server is used for performance metrics which are inserted at an very high rate (+/- 10000 per second) . All this data is also queried often. We found that even after indexing was applied the IO queue length would increase a lot at every insert batch. We installed write intensive disks in this server since data stored in raw format and then aggregated every minute. A lot of writes would happen. 

Splunk

We have a Splunk indexer running which is receiving +/- 4GB of data per day. We use to query (or search) data over longer periods of time. Even though a search itself is single threaded the wa counter would increase making other searches which were running simultaneous very slow by the rim we were still using a few spinning disks. In this server I installed 3 read intensive SSDs providing us 800GB of RAID5 protected capacity. Just like the SQL server, if the capacity is going to be exhausted I will add another read intensive SSD.

Conclusion

Yes, enterprise SSDs are more expensive than consumer grade SSDs but a write intensive disk can handle 143 times the amount of writes while it is not 143 times more expensive. If you are planning to store data and query it many many times but not deleting it you could go for read intensive disks. If you are using SSDs as caching disks as one would find in a web proxy then depending on the content cache lifetime you could use read intensive of mix use. Or if you are expecting your application to write more then ten times the SSD size per day then write intensive SSDs would be a good fit.

Wednesday, March 9, 2016

Using the output of one command as the input for another command | pipe "|"


Using the output of one command as the input for another command | pipe &quot;|&quot;
In my previous two posts I described two commands which I use a lot. Grep to search for data and cut to display characters and/or columns from files.
As the title of this post suggests you can use the output of one command as the input of another.

In the following examples I will use the same fruit.txt example file which was used in my previous post about the cut command.
apple green tree
banana yellow tree
strawberry red plant

To send output data to another command you would need to use the pipe "|" character in between them. For example if you want to select only the first column of the fruit.txt file an the only the lines containing the word "tree" you would use the following command
grep "tree" fruit.txt | cut -d' ' -f1
apple
banana

Or the other way around
cut -d' ' -f1 fruit.txt | grep -v "strawberry"
apple
banana

But with the last command you would need to know which word to exclude using -v operator for the grep command. If you want to know how many lines are returned you can use the wc -l command to count the number of lines. Like this
cut -d' ' -f1 fruit.txt | grep -v "strawberry" | wc -l
2

Tuesday, March 8, 2016

Selecting columns from files | Linux command "cut"



CUT, Selecting columns from files
In my previous post I have covered the grep command on a high level basis. Grep is often used to search for data in a file or multiple files at once. Perhaps the files you are searching are column based. Web server logs for instance. Using the cut command would allow you to effectively select characters, columns and ranges of them. As with other commands I write about I am not going into great detail. Some more options are available for this command but these are the ones I use the most. A full manual can be found here.

Cut can be used to do a variety of things. Displaying ranges of characters or displaying columns or fields if you like.

Lets take the following contents of the file fruit.txt for example.
apple green tree
banana yellow tree
strawberry red plant

If you would want to display the second character of each line in this file you would use the "-c" operator appended with the character number. "-c2" for the second character of each line for example.
cut -c2 fruit.txt
p
a
t

You can also use -c1-3 to display characters 1 through to 3.
cut -c1-3 fruit.txt
app
ban
str

If you would want to display the second column or field you would use the "-f" operator to select the field number. The "-d" operator will allow you to specify a single delimiter like a space of comma between the single quotes.
cut -d' ' -f2 fruit.txt
green
yellow
red

Also with the "-f " you can specify a range or the column numbers. Like this for a range.
cut -d' ' -f2-3 fruit.txt
green tree
yellow tree
red plant

And this for the column numbers.
cut -d' ' -f1,3 fruit.txt
apple tree
banana tree
strawberry plant

Friday, March 4, 2016

Grep still greps! | Linux command "grep"


GREP, search for stuff"Could you grep that for me?". Somehow grep beats any Windows equivalent text search tool I know of. I even use my Linux based machine to "grep" information in log files from Windows based machines.
Perhaps this is the most famous and well known command among IT geeks.

Grep supports searching in multiple files stored in multiple directories on any mount point on your machine. Sometimes you have to find something in a log file. But lets say you don't know in which log file you should start. Then the recursive flag would offer an outcome. I won't go into great detail here but these iterations of the grep command I use the most. Obviously this nifty tool can do a lot more than what I am writing here. Regular expressions for example. The full manual can be found here.

Plain grep
grep "{search string}" /some/file

Case insensitive grep
grep -i "{search string}" /some/file

Recursive grep
grep -r "{search string}" /some/directory

Case insensitive and recursive grep
grep -ir "{search string}" /some/directory

Exclude lines with a specific character
grep -v "#" /some/config/file 

How to create a software RAID array in Linux | mdadm

In my previous post I explained on the most common used RAID levels and their write penalties. In this post I will show how you c...