Network Appliance—The Case for a Gorilla (Part III—NTAP’s NAS as a Disruptive Technology) This is the third installment of my argument for Network Appliance as the Gorilla in the Network Attached Storage (NAS) market. The first two installments recapping the Bowling Alley phase of market development and the question of Open Proprietary Architecture can be found at:
I will make the case that NTAP’s NAS solution is a disruptive technology for all other storage solutions which use Unix and Windows file systems.
First, what products use traditional Unix and Windows file systems? The answer is “everybody” who supports Unix and/or Windows, including Sun, H-P, IBM, EMC, et al. NTAP is the only significant player to offer a compatible alternative to the Unix and/or Windows “native” file systems. (I use the qualifier “significant” simply because there might be some unknown software company with an alternative. I just don’t know about it.)
Why is NTAP’s architecture disruptive? First, let me tell you a bit about that architecture. NTAP’s founders, Dave Hitz and James Lau, started the company based on a simple idea. That idea sprang from many years of experience by these two as software engineers working on Unix systems. They were intimate with the inner workings, strengths, and shortcomings of the Unix file system, including the Sun-built/published Network File System (NFS). Dave and James both worked for Auspex, arguably the first NAS company, for several years where the shortcomings of Unix file systems were attacked with more CPU power, more cache, more RAM, and a re-engineering of the UNIX OS so that file operations were handled on a separate CPU than the computing tasks.
Dave and James left Auspex to start their own software business in an unrelated field. That business failed rather quickly. On his way to his high school class reunion, Dave had a moment of enlightenment. Wisely, he pulled to the side of the road and captured on paper his idea for a file system based on a “Write Anywhere File Layout”. This new method of managing inodes and data blocks seemed to overcome the overhead and file management issues inherent in the Unix method of managing inodes.
(For an very good description of WAFL and SNAPSHOT written in 1995, see netapp.com .)
Dave and James worked on the details of the concept and realized that WAFL would reduce the head movement required to write data, obviate the need to create frequent server backups, be able to use the simpler RAID 4 parity and striping scheme, and, with the addition of non-volatile RAM, be able to recover all data transactions in even the most catastrophic failures.
Below are the abstracts for the patents which were filed in the 1995 timeframe to cover these new concepts: --------------------------------------------------------------------------------------------------------------------- Patent for SNAPSHOT: Oct. 6, 1998 / May 31, 1995 US1995000454921 A method is disclosed for maintaining consistent states of a file system. The file system progresses from one self-consistent state to another self-consistent state. The set of self-consistent blocks on disk that is rooted by a root inode is referred to as a consistency point. The root inode is stored in a file system information structure. To implement consistency points, new data is written to unallocated blocks on disk. A new consistency point occurs when the file system information structure is updated by writing a new root inode into it. Thus, as long as the root inode is not updated, the state of the file system represented on disk does not change. The method also creates snapshots that are user-accessible read-only copies of the file system. A snapshot uses no disk space when it is initially created. It is designed so that many different snapshots can be created for the same file system. Unlike prior art file systems that create a done by duplicating an entire inode file and all indirect blocks, the method of the present invention duplicates only the inode that describes the inode file. A multi-bit free-block map file is used to prevent data referenced by snapshots from being overwritten on disk.
Patent for Accelerated RAID recovery using NVRAM: Patent for Sept. 7, 1999 / June 5, 1995 US1995000471218 A method is disclosed for providing error correction for an array of disks using non-volatile random access memory (NV-RAM). Non-volatile RAM is used to increase the speed of RAID recovery from a disk error(s). This is accomplished by keeping a list of all disk blocks for which the parity is possibly inconsistent. Such a list of disk blocks is much smaller than the total number of parity blocks in the RAID subsystem. The total number of parity blocks in the RAID subsystem is typically in the range of hundreds of thousands of parity blocks. Knowledge of the number of parity blocks that are possibly inconsistent makes it possible to fix only those few blocks, identified in the list, in a significantly smaller amount of time than is possible in the prior art. The technique for safely writing to a RAID array with a broken disk is complicated. In this technique, data that can become corrupted is copied into NV-RAM before the potentially corrupting operation is performed.
Additional Patent for SNAPSHOT: Oct. 5, 1999 / June 30, 1998 US1998000108022 The present invention provides a method for keeping a file system in a consistent state and for creating read-only copies of a file system. Changes to the file system are tightly controlled. The file system progresses from one self-consistent state to another self-consistent state. The set of self-consistent blocks on disk that is rooted by the root inode is referred to as a consistency point. To implement consistency points, new data is written to unallocated blocks on disk. A new consistency point occurs when the fsinfo block is updated by writing a new root inode for the inode file into it. Thus, as long as the root inode is not updated, the state of the file system represented on disk does not change. The present invention also creates snapshots that are read-only copies of the file system. A snapshot uses no disk space when it is initially created. It is designed so that many different snapshots can be created for the same file system. Unlike prior art file systems that create a clone by duplicating the entire inode file and all of the indirect blocks, the present invention duplicates only the inode that describes the inode file. A multi-bit free-block map file is used to prevent data from being overwritten on disk.
Patent for Write Anywhere File Layout (WAFL): March 14, 2000 / May 31, 1995 US1995000464591 The present invention is a method for integrating a file system with a RAID array that exports precise information about the arrangement of data blocks in the RAID subsystem. The file system examines this information and uses it to optimize the location of blocks as they are written to the RAID system. Thus, the system uses explicit knowledge of the underlying RAID disk layout to schedule disk allocation. The present invention uses separate current-write location (CWL) pointers for each disk in the disk array where the pointers simply advance through the disks as writes occur. The algorithm used has two primary goals. The first goal is to keep the CWL pointers as close together as possible, thereby improving RAID efficiency by writing to multiple blocks in the stripe simultaneously. The second goal is to allocate adjacent blocks in a file on the same disk, thereby improving read back performance. The present invention satisfies the first goal by always writing on the disk with the lowest CWL pointer. For the second goal, a new disk is chosen only when the algorithm starts allocating space for a new file, or when it has allocated N blocks on the same disk for a single file. A sufficient number of blocks is defined as all the buffers in a chunk of N sequential buffers in a file. The result is that CWL pointers are never more than N blocks apart on different disks, and large files have N consecutive blocks on the same disk.
Patent for method to make RAID writes more efficient: Sept. 7, 1999 / Feb. 28, 1997 US1997000808396 The invention provides a method and system for performing XOR operations without consuming substantial computing resources. A specialized processor is coupled to the same bus as a set of disk drives; the specialized processor reviews data transfers to and from the disk drives and performs XOR operations on data transferred to and from the disk drives without requiring separate transfers. The specialized processor maintains an XOR accumulator which is used for XOR operations, which records the result of XOR operations, and which is read out upon command of the processor. The XOR accumulator includes one set of accumulator registers for each RAID stripe, for a selected set of RAID stripes. A memory (such as a contents-addressable memory) associates one set of accumulator registers with each selected RAID stripe.
NTAP’s first products were strictly for UNIX environments (during the “bowling pin” phase). Since then and with the growing popularity of Windows NT Server, NTAP has delivered support for the Windows file service protocol—Common Internet File Services (CIFS). NTAP also developed a cross-protocol security solution which allows a single file to be securely accessed from a Unix host via NFS and a Windows host via CIFS. This is very significant, as not only are the security models for the two systems dissimilar, but the NFS is a “stateless” protocol, and CIFS is a “stateful” protocol. I believe, but am not positive, that NTAP has filed for patents on these schemes. (If anyone has a site for searching patent applications that have not been granted, please let me know.)
Many software features of NTAP’s software are dependent on these concepts, including its Cluster Failover technology which let’s two active filers serve a failover partners for one another, remote copy, which allows any filer to keep a Snapshot copy of any other servers’ file system over long or short distances, un-interruptive backup which allows file backups to be executed without disrupting service.
Why is this technology disruptive? It is disruptive because users of this technology enjoy faster, simpler, more reliable file services, compared to traditional UNIX and Windows file service solutions. These advantages are a direct result of NTAP’s patented innovations and can only be met by competitors who create similar innovations.
As an example, here are two SPEC SFS benchmarks, one from NTAP and one from SUNW. I picked these two particular results because the performance of the two machines are roughly equivalent, with the SUN 3500 giving a throughput of 4295 NFS Ops/Sec with a response time of 9.6 sec and the NTAP F760 giving a throughput of 4380 NFS Ops/Sec with a response time of 9.0 sec.
spec.org spec.org
What is revealing is to compare the configurations of the two systems that gave these similar results. The details are on the URL pages cited above, but here is a recap:
NetApp F760 Number of CPUs: 1 (Alpha) Memory Size: 1GB NVRAM: 32MB Network: Gigabit E-net Number of Network Controllers: 2 Number of disk controllers: 1 Number of disks: 56 Number of file systems: 1 File System Config: 2 RAID groups of 28 disks
Sun 3500 ES Number of CPUs: 4 Memory Size: 3072MB NVRAM: N/A Network: 100Mbit E-net Number of Network Controllers: 6 Number of disk controllers: 3 Number of disks: 116 Number of file systems: 112
As you can see, Sun configured a very fat 3500 to hit these numbers, including 4xCPU, more network connections (6 controllers), twice as many disk drives, and an unrealistic number of file systems (1 per drive, more or less). Sun’s config did not include any RAID, as that would have penalized their performance significantly. I do not have price comparisons, but based on my experience I am safe in saying that the F760 is probably much less than half the cost of the Sun 3500..
I am presenting these numbers to illustrate a point—the NTAP NAS architecture is a disruptive technology. It represents a dis-integration of the application server model, with the application running on a platform especially designed for that purpose—the general purpose computer, and the file system running on a platform especially designed for that purpose—the NAS filer.
It is worth repeating that the NAS offerings from virtually all of NTAP’s competitors may not be disruptive technologies simply because they do not replace the file systems of UNIX and Windows. It isn’t just NAS that is disruptive—it is the specialized OS and file system technology that makes the NTAP NAS filer provide price/performance/reliability beyond what is possible using general purpose operating systems.
A word about the on-going arguments about SCSI versus FC versus Ethernet: Many argue, validly, that NTAP’s products do not scale in terms of storage capacity and that Ethernet is a poor media for large amounts of data in such applications as file copies, compared to FC or SCSI. NTAP is approaching this market from the bottom up, as most disruptive technology innovators do. Each of the issues are independent of the nature of NTAP’s disruptive technology. Scalability and I/O channel throughput issues can be addressed with the addition of FC, Gigabit Ethernet (GbE), and other state of the art technologies to the NTAP architecture. Likewise, storage capacities can be addressed. NTAP has in its laboratory now new innovations that address all of these scalability issues.
Update: 11/18/00 NTAP has re-engineered memory management in the ONTAP OS, upgraded its CPU architecture to Intel Coppermine, added support for larger disk drives, and tweeked performance issues to support up to 12TB on a single filer.
Direct Access File Support with Virtual Interface (DAFS/VI) patents and talent from a recent purchase are being implemented now for release in late 2001. These capabilities will overcome the packetizing and OS overhead inherent in TCP/IP. The result will be elimination of the here-to-fore limiting characteristics of network data transfers between filers and hosts, eliminating another scalability issue. GbE adds even more strength to the NAS architecture.
Future clustering capabilities, as part of the DAFS/VI architecture, will provide for remote SNAPSHOT mirroring, failover, and other synergistic filer relationships that will, to the astute customer, place NTAP head and shoulders above EMC in terms of scalability.
Future low-end products will complete the final phase of NTAP’s “end-to-end content management” architecture. Purchases of software companies are part of NTAP’s effort to put a layer of “content network management” (my words) software to schedule, monitor, and optimize the distribution of content from the central filers to the edge, including caching. |