Saturday, April 11, 2015

RAID (Redundant Array of Independent Disks) -Basic

RAID (Redundant Array of Independent Disks) A disk subsystem that increases performance or provides fault tolerance or both. RAID uses two or more physical disk drives and a RAID controller, which is plugged into motherboards that do not have RAID circuits. Today, most motherboards have built-in RAID but not necessarily every RAID configuration (see below). In the past, RAID was also accomplished by software only but was much slower. In the late 1980s, the "I" in RAID stood for "inexpensive" but was later changed to "independent."

In large storage area networks (SANs), floor-standing RAID units are common with terabytes of storage and huge amounts of cache memory. RAID is also used in desktop computers by gamers for speed and by business users for reliability. Following are the various RAID configurations. See NAS and SAN.

The way in which you configure that fault tolerance depends on the RAID level you set up. RAID levels depend on how many disks you have in a storage device, how critical drive fail over and recovery is to your data needs, and how important it is to maximize performance. A business will generally find it more urgent to keep data intact in case of hardware failure than, for example, a home user will. Different RAID levels represent different configurations aimed at providing different balances between performance optimization and data protection.


 RAID Levels

On most situations you will be using one of the following four popular levels of RAIDs.
  • RAID 0
  • RAID 1
  • RAID 5
  • RAID 10 (also known as RAID 1+0)
However there are several non-standard raids, which are not used except in some rare situations. It is good to know what they are.
IN this article I tried to explain with a simple diagram how RAID 2, RAID 3, RAID 4, RAID 6, RAID 50 and RAID 60 works.

In this article I explains the main difference between these raid levels along with an easy to understand diagram.

In all the diagrams mentioned below:
  • A, B, C, D, E and F – represents blocks
  • p1, p2, and p3 – represents parity


RAID Level 0 - Disk Striping for Performance

 

RAID 0 is used to boost a server's performance. It's also known as "disk striping." With RAID 0, data is written across multiple disks. This means the work that the computer is doing is handled by multiple disks rather than just one, increasing performance because multiple drives are reading and writing data, improving disk I/O. A minimum of two disks is required. Both software and hardware RAID support RAID 0, as do most controllers. The downside is that there is no fault tolerance. If one disk fails, then that affects the entire array and the chances for data loss or corruption increases.

 
Following are the key points to remember for RAID level 0.
  • Minimum 2 disks.
  • RAID 0 implements a striped disk array, the data is broken down into blocks and each block is written to a separate disk drive
  • I/O performance is greatly improved by spreading the I/O load across many channels and drives
  • Best performance is achieved when data is striped across multiple controllers with only one drive per controller
  • No parity calculation overhead is involved
  • Very simple design
  • Easy to implement

Disadvantages

  • Not a "True" RAID because it is NOT fault-tolerant
  • The failure of just one drive will result in all data in an array being lost
  • Should never be used in mission critical environments

Recommended Applications


  • Video Production and Editing
  • Image Editing
  • Pre-Press Applications
  • Any application requiring high bandwidth 

 

RAID Level 1 - Mirroring for Fault Tolerance

 

 This is the first mode which actually has redundancy. RAID 1 is a fault-tolerance configuration known as "disk mirroring." With RAID 1, data is copied seamlessly and simultaneously, from one disk to another, creating a replica, or mirror. If one disk gets fried, the other can keep working. It's the simplest way to implement fault tolerance and it's relatively low cost.

The downside is that RAID 1 causes a slight drag on performance. RAID 1 can be implemented through either software or hardware. A minimum of two disks is required for RAID 1 hardware implementations. With software RAID 1, instead of two physical disks, data can be mirrored between volumes on a single disk. One additional point to remember is that RAID 1 cuts total disk capacity in half: If a server with two 1TB drives is configured with RAID 1, then total storage capacity will be 1TB not 2TB.


 Following are the key points to remember for RAID level 1.
  • Minimum 2 disks.
  • Twice the Read transaction rate of single disks, same Write transaction rate as single disks
  • 100% redundancy of data means no rebuild is necessary in case of a disk failure, just a copy to the replacement disk
  • Transfer rate per block is equal to that of a single disk
  • Under certain circumstances, RAID 1 can sustain multiple simultaneous drive failures
  • Simplest RAID storage subsystem design

Disadvantages

  • Highest disk overhead of all RAID types (100%) - inefficient
  • Typically the RAID function is done by system software, loading the CPU/Server and possibly degrading throughput at high activity levels. Hardware implementation is strongly recommended
  • May not support hot swap of failed disk when implemented in "software"

Recommended Applications


  • Accounting
  • Payroll
  • Financial
  • Any application requiring very high availability

RAID Level 5 - Speed and Fault Tolerance


RAID 5 is by far the most useful RAID mode when one wishes to combine a larger number of physical disks, and still maintain some redundancy for business servers and enterprise NAS devices.
RAID-5 can be (usefully) used on three or more disks, with zero or more spare-disks.  

This RAID level provides better performance than mirroring as well as fault tolerance. With RAID 5, data and parity (which is additional data used for recovery) are striped across three or more disks. If a disk gets an error or starts to fail, data is recreated from this distributed data and parity block— seamlessly and automatically. Essentially, the system is still operational even when one disk kicks the bucket and until you can replace the failed drive.

 Another benefit of RAID 5 is that it allows many NAS and server drives to be "hot-swappable" meaning in case a drive in the array fails, that drive can be swapped with a new drive without shutting down the server or NAS and without having to interrupt users who may be accessing the server or NAS. It's a great solution for fault tolerance because as drives fail (and they eventually will), the data can be rebuilt to new disks as failing disks are replaced.

Both read and write performance usually increase, but can be hard to predict how much. Reads are almost similar to RAID-0 reads.The write efficiency depends heavily on the amount of memory in the machine, and the usage pattern of the array. Heavily scattered writes are bound to be more expensive. 


 Following are the key points to remember for RAID level 5.
  • Minimum 3 disks.
  • Highest Read data transaction rate
  • Medium Write data transaction rate
  • Low ratio of ECC (Parity) disks to data disks means high efficiency
  • Good aggregate transfer rate
  • Best cost effective option providing both performance and redundancy. Use this for DB that is heavily read oriented. Write operations will be slow.

Disadvantages

  • Disk failure has a medium impact on throughput
  • Most complex controller design
  • Difficult to rebuild in the event of a disk failure (as compared to RAID level 1)
  • Individual block data transfer rate same as single disk

Recommended Applications


  • File and Application servers
  • Database servers
  • Web, E-mail, and News servers
  • Intranet servers
  • Most versatile RAID level

 RAID Level 10 

RAID 10 is a combination of RAID 1 and 0 and is often denoted as RAID 1+0. It combines the mirroring of RAID 1 with the striping of RAID 0. It's the RAID level that gives the best performance, but it is also costly, requiring twice as many disks as other RAID levels, for a minimum of four.

This is the RAID level ideal for highly utilized database servers or any server that's performing many write operations. RAID 10 can be implemented as hardware or software, but the general consensus is that many of the performance advantages are lost when you use software RAID 10.


 Following are the key points to remember for RAID level 10.
  • Minimum 4 disks.
  • RAID 10 is implemented as a striped array whose segments are RAID 1 arrays
  • RAID 10 has the same fault tolerance as RAID level 1
  • RAID 10 has the same overhead for fault-tolerance as mirroring alone
  • High I/O rates are achieved by striping RAID 1 segments
  • Under certain circumstances, RAID 10 array can sustain multiple simultaneous drive failures
  • Excellent solution for sites that would have otherwise gone with RAID 1 but need some additional performance boost.

Disadvantages

  • Very expensive / High overhead
  • All drives must move in parallel to proper track lowering sustained performance
  • Very limited scalability at a very high inherent cost

Recommended Applications


  • Database server requiring high performance and fault tolerance

( ***Other RAID Levels: These RAID levels are used for specific cases. Here are some short descriptions of each: )

RAID Level 2




  • RAID 2 is similar to RAID 5, but instead of disk striping using parity, striping occurs at the bit-level.
  • In the above diagram b1, b2, b3 are bits. E1, E2, E3 are error correction codes.
  • You need two groups of disks. One group of disks are used to write the data, another group is used to write the error correction codes.
  • This uses Hamming error correction code (ECC), and stores this information in the redundancy disks.
  • When data is written to the disks, it calculates the ECC code for the data on the fly, and stripes the data bits to the data-disks, and writes the ECC code to the redundancy disks.
  • When data is read from the disks, it also reads the corresponding ECC code from the redundancy disks, and checks whether the data is consistent. If required, it makes appropriate corrections on the fly.

Disadvantages

  • Very high ratio of ECC disks to data disks with smaller word sizes - inefficient
  • Entry level cost vey high - requires very high transfer rate requirement to justify
  • Transaction rate is equal to that of a single disk at best (with spindle synchronization)
  • No commercial implementations exist / not commercially viable

 

RAID Level 3 


 

  • RAID 3 is also similar to RAID 5.This uses byte level striping. i.e Instead of striping the blocks across the disks, it stripes the bits across the disks.
  • In the above diagram B1, B2, B3 are bytes. p1, p2, p3 are parities.
  • Uses multiple data disks, and a dedicated disk to store parity.
  • The disks have to spin in sync to get to the data.
  • Sequential read and write will have good performance.
  • Random read and write will have worst performance.
  • Controller design is fairly complex
  • Very difficult and resource intensive to do as a "software" RAID

Recommended Applications


  • Video Production and live streaming
  • Image Editing
  • Video Editing
  • Prepress Applications
  • Any application requiring high throughput

 

RAID Level 4

 

  • This uses block level striping.
  • In the above diagram B1, B2, B3 are blocks. p1, p2, p3 are parities.
  • Uses multiple data disks, and a dedicated disk to store parity.
  • Minimum of 3 disks (2 disks for data and 1 for parity)
  • Good random reads, as the data blocks are striped.
  • Bad random writes, as for every write, it has to write to the single parity disk.
  • It is somewhat similar to RAID 3 and 5, but little different.
  • This is just like RAID 3 in having the dedicated parity disk, but this stripes blocks.
  • This is just like RAID 5 in striping the blocks across the data disks, but this has only one parity disk.
  • This is not commonly used.

 

RAID Level 6

 

  • RAID 6 is essentially an extension of RAID level 5 which allows for additional fault tolerance by using a second independent distributed parity scheme (dual parity)
  • In the above diagram A, B, C are blocks. p1, p2, p3 are parities.
  • This creates two parity blocks for each data block.
  • Data is striped on a block level across a set of drives, just like in RAID 5, and a second set of parity is calculated and written across all the drives; RAID 6 provides for an extremely high data fault tolerance and can sustain multiple simultaneous drive failures
  • RAID 6 protects against multiple bad block failures while non-degraded
  • RAID 6 protects against a single bad block failure while operating in a degraded mode
  • Perfect solution for mission critical applications

Disadvantages

  • More complex controller design
  • Controller overhead to compute parity addresses is extremely high
  • Write performance can be brought on par with RAID Level 5 by using a custom ASIC for computing Reed-Solomon parity
  • Requires N+2 drives to implement because of dual parity scheme

Recommended Applications


  • File and Application servers
  • Database servers
  • Web and E-mail servers
  • Intranet servers
  • Excellent fault-tolerance with the lowest overhead

 

 RAID LEVEL 50: High I/O Rates & Data Transfer Performance

 

 

  • RAID 50 should have been called "RAID 03" because it was implemented as a striped (RAID level 0) array whose segments were RAID 3 arrays (during mid-90s)
  • Most current RAID 50 implementation is illustrated above
  • RAID 50 is more fault tolerant than RAID 5 but has twice the parity overhead
  • High data transfer rates are achieved thanks to its RAID 5 array segments
  • High I/O rates for small requests are achieved thanks to its RAID 0 striping
  • Maybe a good solution for sites who would have otherwise gone with RAID 5 but need some additional performance boost.

Disadvantages

  • Very expensive to implement
  • All disk spindles must be synchronized, which limits the choice of drives
  • Failure of two drives in one of the RAID 5 segments renders the whole array unusable

RAID Level 60

RAID 60 is two (or more) RAID 6 groups striped together. The Sun Fire X4150 we tested had eight drives on two channels, so it nicely split into two RAID 6 groups striped together to make one RAID 60. This further reduces the risk of drive loss. With RAID 60, you can lose up to four drives, compared with the two-drive limit in RAID 6.
However, this added redundancy comes at the cost of additional parity disks. Furthermore, because RAID 60 is a striped collection of RAID 6 arrays, each RAID 6 array must remain consistent. Otherwise, data will be lost.

RAID Level 7

RAID 7 is a proprietary level of RAID owned by the now-defunct Storage Computer Corporation.


                                                       ************************************************
### In here I tried to provide precised and important basic information's of all Raid Levels. In future I'll try to provide detail information's about how individual Raid Level works and the Raid calculation method. So, stay in touch :)
And, thanks for reading.###
                                                       ************************************************



Reference:
https://www.hp.com
https://www.cisco.com
https://www.datarecovery.net
https://www.cuddletech.com
https://web.eecs.umich.edu/.../papers/chen94_1.pdf
ftp://ftp.eecs.berkeley.edu/.../raid/paper.
www.asicomputers.com.au/...WhitePapers/intel_raid_controller
www.kls2.com/~karl/papers/raid-wp-10.0.pdf
thomaschneider.de/papers/DHS14.pdf

2 comments:

  1. Thank you for the explanation.
    It would be better if you provided the more Information about the RAID50 and RAID60

    ReplyDelete