what is Raid and why does it, in effect, reduce the reported specs
To be concise, RAID (except RAID level 0) is a set of methods that eliminate the loss of data in case of a single disk failure in a multiple disk storage array. Thus, with RAID "on", if a disk drive fails, the system is able to remain operational and no data is lost. The failed drive can be replaced (usually automatically with hot spares) and will be rebuilt by the RAID system in the background while operations continue, though, perhaps, at a degraded speed.
The original description of RAID described RAID levels 1 through 5. The most commonly used RAID levels are 1, 3, and 5. RAID 1 is simply disk mirroring, in which all data is duplicated on two separate disks. RAID 1 is very safe, but it doubles the cost of disk storage.
RAID 3 is like RAID 4, in that it uses a single parity disk, but stripes in RAID 3 are so small that each individual read or write operation must access all disks in the array. For instance, the first byte in a block of data might be on the first disk, the second byte on the second disk, and so on. RAID 3 systems often keep the disk heads synchronized to reduce latency. RAID 3 is a good fit for applications that require a very high data rate for a single large file, such as super-computing and graphics processing. It performs poorly with multi-user applications that generate many unrelated disk operations in parallel because each operation generates traffic on each disk in the array. By contrast, each data disk in a RAID 4 array can satisfy a separate user request at the same time.
RAID 5 is like RAID 4, but instead of keeping all parity blocks on a single disk, it cycles parity among all of the disks in the array: parity for the first stripe is on the first disk, parity for the second stripe on the second disk, etc. The primary advantage of RAID 5 is that it prevents the parity drive from becoming a bottleneck. (See "Eliminating the Parity Disk Bottleneck" below for how WAFL avoids this bottleneck for the filer.) The primary disadvantage is that it is not practical to add a single disk to a RAID 5 array because to add new disks easily, a new array must be added. Thus, if a RAID 5 implementation uses 7 disks in each array, then disks must normally be added 7 at a time.
Some people have used the term RAID 0 to refer to disk striping, which is basically RAID 4 without the parity disk. Since there is no redundancy in disk striping, applying the term RAID to it is misleading. netapp.com;
RAID adds overhead to an I/O system, slowing down the response time compared to that same system with RAID "off". Thus, to get the best published benchmark performance numbers, vendors often run their industry standard benchmark tests with no RAID. That is what EMC has done. NetApp always has RAID "on". It can't be turned off. It is integral to the system. NetApp's performance numbers are very good, even with RAID on, because of the way that NetApp has implemented RAID.
5.2. Eliminating the Parity Disk Bottleneck
Most vendors of RAID peripherals for UNIX and Windows have avoided RAID 4 because with general-purpose file systems, the parity disk becomes a bottleneck. The WAFL file system, on the other hand, uses the flexibility of its "write anywhere" layout to write blocks to locations that are efficient for RAID 4...
Since the Berkeley FFS doesn't understand the underlying RAID 4 layout, it tends to generate requests that are scattered over the data disks, causing the parity disk to seek excessively. WAFL writes blocks in a pattern designed to minimize seeks on the parity disk. Figure 4(b) shows how WAFL allocates these same blocks to make RAID 4 operate efficiently. WAFL always writes blocks to stripes that are near each other, eliminating long seeks on the parity disk. WAFL also writes multiple blocks to the same stripe whenever possible, further reducing traffic on the parity disk. Notice that FFS uses six separate stripes in Figure 4(a), so six parity blocks must be updated. In Figure 4(b), WAFL uses only 3 stripes, so only 3 parity blocks are updated and they are all near each other.
As a WAFL file system becomes full it uses more stripes to write a given number of blocks which increases the number of parity blocks that need to be updated. Even in a very full file system, however, a small range of cylinders contains many free blocks, so the more important benefit of reducing seeks on the parity disk remains. Like FFS, WAFL reserves 10% of disk space to improve performance.
(Same URL as above.)
This is another example of why NTAP WAFL is a discontinuous innovation. Filers dis-integrate the file system from the general purpose Unix and Windows platforms and put those file systems on a more efficient (and patented, standards-compliant, proprietary file system designed from the ground up for high capacity, high availabilty, high demand data serving in a multi-server environment. |