DCSE Self-Study Training Materials
RAID Theory
|
|
RAID Concepts
The overall goal of RAID is to provide enhanced performance and
reliability over a single disk drive by combining a number of physical drives
into a logical array of drives. As the name implies, the array stores redundant
information about the data to provide protection for the data in the event of
disk failure.
RAID addresses the problems associated with high-volume storage
requirements. When the volume of data requires a large number of drives for
storage, there is an increased chance of drive failure. Drive failure results
in a temporary loss of data, and requires restoration of data from
a backup tape. Also, because backup is not typically done on a continuous basis,
recently created data may not be recoverable.
In theory, RAID works by creating redundant information about the
data in the disk array and then spreading the original data and redundant parity
data across the disk array. The spreading of data ensures that only a small
part of any given file is lost if a single drive fails. The parts of the file
on the other disks in the array plus the parity data are sufficient to reconstruct
the data lost on the failed drive.
The possibility of data loss is no longer tied to the failure of
a single disk. Common forms of RAID do not lose data until two disks in the
array fail at the same time, which is much less likely than a single drive failure.
Fundamental Concepts
Some fundamental concepts must be understood before presenting the
details of RAID. These concepts are described within the context of how RAID
is applied on Dell systems. Definitions may vary from those used elsewhere in
the industry or by other manufacturers. These concepts include:
Array:
A disk array is a collection of disk drives that are linked together
logically so that they appear to act as a unit. The array controller defines the
characteristics of the array. Dell’s current array controller is the PowerEdge
Expandable RAID Controller 3 (PERC 3).
Channel: A channel is any path used for the transfer of
data and the control of information between storage devices and a storage controller.
Drive size: All drives in any RAID must have the same
capacity. If they do not, then the controller logically resizes each drive to
the capacity of the smallest drive in the array so that all sizes match.
Stripe: A stripe is the area on a particular disk drive
where some part of the data from a file is stored. A stripe can be as small
as a sector (512 bytes) or as big as a few megabytes. The controller adjusts
the size of a stripe when it configures the array. The nature of the application
using the array determines the size of the stripe.
Spanning: Spanning is the combining of disk arrays connected
to different SCSI backplanes into a logical unit.
Mirroring: Mirroring is maintaining a duplicate of one disk drive on another
disk drive. Mirroring can be combined with striping. Mirroring is the same as
RAID 1.
Redundancy: Redundancy is a technique to provide greater
data integrity in a RAID array. Redundancy is a backup copy of stored data in
the array. This extra information sufficiently describes the data structures
in the array so that the arrays can be easily rebuilt in the event of drive
failures in the array. The larger the amount of redundant information that is
stored, the easier it is to rebuild the array in the event of drive failure.
Storage Capacity: Storage capacity is the total amount
of disk space in the array. If an array has four 9-GB drives, the total storage
capacity is 36 GB.
Storage Efficiency: Storage efficiency is the percentage
of the total storage capacity that can contain unique data. For mirroring, storage
efficiency is 50 percent because one byte of data occupies a byte of storage
space on each of two drives in an array. In an array of four 9-GB disks with
disk mirroring, only 18 GB of space is available for storage. Therefore, the
storage efficiency is 50 percent. Generally, the greater the redundancy, the
lower the efficiency of the array.
Parity: Parity is a mathematical method that allows a small amount of data
to provide “backup” or protection for a large amount of data. It is based on a
simple Boolean logic operation over the data stored across stripes in a RAID.
The result of the parity calculation is the redundant information used to restore
data when a drive failure occurs. Parity is calculated whenever data is written
to a RAID. If a disk fails, then the same parity operation calculated on the remaining
data and parity stripes in the array reconstructs the data on the failed disk’s
stripe.
RAID Implementations: Hardware Versus Software
A hardware RAID delivers better performance than a software RAID
because the RAID controller’s processor frees the system processor and provides
balanced higher performance.
Figure 1:
Hardware RAID with a Dedicated RAID Controller
A hardware RAID does not require the intervention of the operating
system. This allows applications to run faster. The system processor communicates
with the RAID controller as if it were a normal disk drive. The processor in
the RAID controller acts like a parallel processor dedicated to I/O.
Because the dedicated processor on the RAID controller handles the
striping information, these commands are executed quicker. They do not have
to be passed from the system processor to the drive controller.
Figure 2:
Software RAID with a standard drive controller
A software RAID requires the system processor to perform all the RAID functions.
This slows down the system’s overall performance. The processor must generate
the striping information along with the data being accessed in the drive array.
RAID Levels
The RAID Advisory Board (RAB) is responsible for issuing definitive
statements regarding RAID levels. Most of the industry still uses these numbered
RAID levels. The Dell PowerEdge Expandable RAID Controller (PERC) supports RAID
Levels 0, 1, 3, 5, and the combined Level 10. The Dell PowerEdge Expandable
RAID Controller 2 (PERC 3) supports RAID Levels 0, 1, 5, and the combined Level
10.
|