Redundant Array of Inexpensive (or Independent) Disks

Basic Idea: Combine several disks to provide increased performance and reliability

History

Up until the 1980s, hard disks (like just about everything else in computers) weren't really mass-produced. A ``large'' production run would number in hundreds; even things like disks for minicomputers would be sold in quantities of thousands to tens of thousands.

All that changed when IBM introduced their Winchester disk technology, featuring an aerodynamic read/write head which would ``fly'' over the disk, and hermetically sealing the drive. These innovations both served to allow the head to be much closer to the disk media than previous designs: the aerodynamic head allowed the height to be more precisely controlled than a completely mechanically positioned arm could, and the seal kept out dust particles which would otherwise be larger than the gap. The Winchester drive required the use of clean-room techniques that lent themselves well to automation; this in turn made it possible to sell large numbers of drives for a much lower cost per bit. Of course, personal computers coming on the market in the early 1980s provided the market for these disks.

In 1988, a paper was published which made the case that using a number of small, inexpensive disks could give higher performance and reliability than using a large expensive disk, at a lower cost. The paper was by Patterson (yes, one of the authors of your textbook), Gibson, and Katz, titled ``A Case for Redundant Arrays of Inexpensive Disks (RAID),'' published in the Proceedings of ACM SIGMOD (Special Interest Group on Management of Data) '88.

Striping

First, we need to talk about striping. The idea here is that we can break up the data we're going to put on disks by putting one chunk. on the first disk, the second block on the second, and so forth. The chunks can be as small as a bit or as large as a track (or larger, really, but a point of diminishing returns is reached).

The implication of striping is that we can get much higher disk transfer rates: we can pull data off of (or put data on to) multiple disks at once. If we use small chunks in the stripes this will make transfers for individual processes much faster; if we use sector-size striping the system throughput can be much better, or individual processes making really big transfers can be helped.

Hardware and Software RAID

RAID can be implemented either by the operating system (software RAID) or by dedicated RAID controllers (hardware RAID).

In the case of software RAID, the disks making up the RAID array are visible to the OS as separate disks; the OS makes them appear as a single disk to application programs.

In the case of hardware RAID, a dedicated RAID controller offloads this function from the host to a RAID processor. The RAID array appears to the host as a single IDE or SCSI disk drive; the RAID functions are completely hidden from it. However, in modern implementations, the disks used in the array are still conventional drives with built-in controllers, not raw disks (this will become relevant when we talk about RAID 2).

Hardware RAID controllers will also typically provide some extra functionality that is not strictly part of RAID, such as extra-large disk caches, or hot-swappability. A friend of mine told me a story a while back of someone noticing the power light was off on one of the drives making up a RAID array where he worked. It turned out the drive wasn't fully seated; he pushed it into place. It spun up, and for several minutes was very busy as the hardware controller rebuilt the data that should have been on it. The host CPU never knew a thing (why the system wasn't configured to be screaming about a drive missing from the array wasn't clear from the story...).

RAID Levels

It is an unfortunate bit of terminology that Patterson et al chose the term ``levels'' to describe the different types of RAID they defined: it gives the impression that each level is in some sense built upon the prior levels. While that's true in some cases, in others the various levels are simply different points in the design space. It would probably have been better to name them ``RAID types'' or something; regardless, they didn't.

In surveying the different RAID levels, it's also important to remember that technology has changed somewhat in the intervening years, and some of the RAID levels make assumptions which are no longer valid. In particular, some of the RAID levels assume that the disks will not have integrated controllers, and a shared controller will handle all of them. This is no longer common; instead disks come with integrated controllers. This is the case for both IDE and SCSI drives (this shouldn't be confused with ``hardware RAID,'' as described above).

There are six types of RAID (numbered one through five), along with proprietary extensions to the concept. I've probably seen a total of over a dozen; I just want to talk about the original six here.

RAID 0
RAID Level 0 is just striping, with no redundancy (in spite of the name). It lets you distribute writes across all the disks for higher throughput, but doesn't do a thing to help reliability. In fact, reliability is reduced compared to using a number of disks and no striping, since losing a disk means you'll probably be unable to reconstruct your file system (if you weren't using striping, you'd have a file system per disk, so if you lost one you'd lose half your data).

RAID 1
RAID Level 1, is exactly orthogonal to RAID 0: It is mirroring with no striping. This can give better read throughput (since you can be reading from all the disks at once), but no improvement on writing since all writes have to be mirrored to all disks. It's more expensive than other ways of improving reliability (though nice if you've got exactly two disks). Today it also lends itself very well to a typical PC motherboard: one drive can be put on each IDE channel (this is what I do at home and at the theatre, incidentally: at home I've got a single machine I use as a file server for the three machines on the network; this machine has a pair of drives configured as a RAID 1 array. A separate machine has a single big drive; backups are run automatically every night. At the theatre the sound system computer also has a pair of drives configured as a RAID 1 so if one fails the machine is still up. When I take the machine home to work on the sound software it gets backed up by my backup machine).

RAID 2
RAID Level 2 stripes data across some number of disks, and then uses more disks to maintain an error correcting code for the data on the disks. On a read, the data is read simultaneously from all the disks and the ECC checked; on a write, all the data has to be written across all the disks, and new ECC written as well.

RAID Level 2 makes sense with disks that don't have an integrated controller: the disks return the analog signal from the disk head, and the controller works out what the data is. That way if a disk returns bad data, the correct data can be recovered from the ECC. In general, if you want to have n data disks, RAID 2 will require log2n + 1 ECC disks.

Because modern disks include substantial error detection and correction facilities, RAID 2 isn't widely used: a disk drive will very seldom return bad data. It will return correct data, or it will return an error. In either of these cases, simple parity is sufficient to reconstruct the data which has been lost (parity can't in general recover from bad data; it can only tell us that a bit is bad, not which one it is. But if we know which bit is bad, we can reconstruct what it should have been).

RAID 3
The important thing to realize here is that modern disks don't return bad data: the error detection measures built into the disk mean that the disk either returns correct data or an error; in the event that an error occurs it is possible to reconstruct the data on the bad sector from the remaining disks.

So, RAID 3 has one extra disk, used for parity. The data is striped at the bit level (some recent descriptions will say byte level; originally, it was bit level). Notice that RAID 3 provides excellent availability (since when a disk fails its data can be reconstructed from the remaining disks), good read performance (since data is read from all the data disks simultaneously), but no improved write performance (since every write to the disk array requires writing to the parity disk as well). It also makes substantially more efficient use of disk than RAID 1. It would appear to also make more efficient use of space than RAID 2; it actually uses fewer drives (one extra drive rather than log n extra drives), but more space (since the drives themselves use extra space to maintain parity internally).

Attempting to implement RAID 3 in software is inefficient, since every sector must be scattered across the disks.

RAID 4
RAID 4 is like RAID 3, except the striping is at the level of sectors. This provides good performance for large numbers of applications, and makes a software implementation practical, but the parity disk is still a bad write bottleneck.

It should be noted that some manufacturers using RAID 4 systems refer to them as RAID 3.

RAID 5
By distributing parity across all of an array's member disks, RAID Level 5 reduces (but does not eliminate) the write bottleneck inherent in RAID Level 4. As with RAID Level 4, the result is asymmetrical performance, with reads outperforming writes but not as badly as with RAID 4. To reduce or eliminate this intrinsic asymmetry, RAID level 5 is often augmented with techniques such as caching and parallel multiprocessors. Here's a figure showing how a RAID 5 system with three disks works:

raid 5 with 3 disks

The three columns represent disks (so there are three disks); each of the four boxes in a column represents a sector on the disk. Sector 0 is on disk 0, sector 1 is on disk 1, and a sector constructed by taking the parity of corresponding bits of disks 0 and 1 is on disk 2. Similarly, sector 2 is on disk 0, sector 3 is on disk 2, and the parity of sectors 2 and 3 is on disk 1.

Among the original set of RAID levels, RAID 5 is the one that is most commonly used today, followed by RAID 1. In general, if an environment will have at least three disks RAID 5 is the better way to go; if it's only feasible to have two disks, you can use mirroring to protect yourself from head crashes.

A lot of this was taken from the following two web pages:

http://www.mylex.com/documents/raid_levels.htm

http://www.oreilly.com/reference/dictionary/terms/R/Redundant_Array_of_Inexpensive_Disks.htmy

and, of course, from

Patterson, D. A., G. Gibson, and R. H. Katz, "A case for redundant arrays of inexpensive disks (RAID)" in Proceedings of ACM SIGMOD International Conference on Management of Data, pages 109-116, 1-3 June 1988.


Last modified: Wed Mar 16 11:21:23 MST 2005