Long article fueling the SAN v NAS debate - a discussion which inevitably leads to comparisons of EMC and NTAP. The link to this article is provided at the end of this post.
Part One - Written 4/18/00: By Eric Jhonsa
"All good solutions are obvious. That's the nature of solutions. If it isn't obvious, then it probably isn't a solution."
- Po Bronson
I know, I know, this issue's been discussed to death, but I think that I've reached a few conclusions related to this subject that, to the best of my knowledge, haven't been stated before, so it might be worth reading on nonetheless.
First, a primer, just to bring those not familiar with this topic up to speed (if you've been an active follower of an EMC or Network Appliance message board, you can probably skip the next couple of paragraphs):
Over the past decade, the price of a megabyte of storage has dropped by an average of 50%/year. Yet during this time period, the overall size of the enterprise storage market (the market for the boxes in which corporations store data within their LANs), both in terms of volume shipments and dollars, has grown tremendously, and the growth doesn't show signs of slowing down. This can be attributed to the fact that corporate storage capacity needs are growing at rate that makes Moore's Law look like a joke in comparison. The causes of this ever-increasing demand for capacity are numerous, ranging from the huge amounts of data being generated by enterprise applications made by companies such as Oracle, Microsoft, SAP, and Ariba, to annual increases in both the number of these applications that are deployed, and in the average size of the files they generate (it's a nice little cycle, more processing power leads to more bloated software, which leads to demand for more processing power, which leads to...all of us cursing Bill Gates and Andy Grove), to the truckloads of information being retrieved from the internet, the last of which is a trend that'll continue to increase in its importance as companies begin to rely more and more on the internet as a tool, and as cheaper access to bandwidth will allow more data to be downloaded within a given time frame.
To augment this trend of soaring capacity needs, there's also the demand created by the internet itself. After all, all those web pages, e-mails, e-commerce transaction files, and pirated MP3s have to be stored somewhere. Although I'm sure everyone not named Gary Kildall (sold DOS to Microsoft in the early '80s) can probably see the importance of this trend, its magnitude can often be overlooked. For example, consider how an article stated that Excite (only the portal, not the cable ISP) had purchased more storage capacity in the past year than Merrill Lynch. That article was written over a year ago. Now, as more and more internet users start getting broadband connections, this explosion looks set to intensify even further, not only due to the demand for more bandwidth- intensive content leading to larger files being stored, but also due to the need created by soaring amounts of internet traffic in general (see my 2nd Update on AOL for more on this subject) to store a given piece of content in multiple locations via load balancing and caching techniques allowed for by the products and services offered by the likes of F5, Mirror Image (Xcelera), and Akamai.
As of right now, the majority of all of this data, whether it happens to be stored on a corporate network or at an internet hosting center, can be found on storage boxes attached to the back end of general-purpose servers running either UNIX, Windows NT, Novell Netware, or Linux, the servers themselves being attached to ethernet-based LANs. In such situations, these servers are generally also used to run the applications I mentioned previously so that they can be used either by employees on their PCs or by internet users to access dynamic content. These storage boxes, meanwhile, are managed via specialized software installed on these servers, software made by companies such as Veritas (NASDAQ: VRTS) and Legato(NASDAQ: LGTO).
In the last paragraph, the phrase "as of right now" is worth remembering, as nearly everyone related to this industry, from the server manufacturers to the software vendors to the storage device companies, is calling for the end of the directly-attached server/storage era by doing their best to get their customers to ditch such storage architectures for ones that involve the implementation of torage-area-networks (SANs). In a SAN, all data requests are still routed through a server, the disks within the storage boxes are still treated as peripherals to the server (i.e. the same way the hard drive in your computer is considered to be a peripheral to its microprocessor), and the disks are still accssed via a high-thoroughput data-transfer protocol known as fibre chhannel. However, the storage boxes themselves are completely detached from any individual server, and may literally be miles away from them. Fibre channel adapter cards made primarily by QLogic (NASDAQ: QLGC) and Emulex (NASDAQ: EMLX) (these two have a duopoly of sorts in this field) are installed within the application servers to communicate over the SAN, and high-speed fibre channel switches, made by the likes of Brocade (NASDAQ: BRCD) and Ancor (NASDAQ: ANCR)(Brocade's the proverbial "800 lb. gorilla" in this burgeoning market), are used to route a given data packet to the server through which it was requested. When compared with the direct-attachment setup described previously, the benefits of SANs are three-fold:
1. They keep individual servers from getting overloaded with data requests. If the only way to access a number of extremely popular files was through a single server, it doesn't take a genius to realize that the given server will quickly be swamped with traffic, leading to both slow response times for requests, and a degradation in the performance of applications running on the server. Meanwhile, in a truly interoperable SAN, the data requests could be balanced between a number of servers.
2. All data traffic between storage boxes (i.e. moving a library of files related to a given application from one box to another), between a storage box and a server that requests a file on it for use with an application that its running, and related to file backup is offloaded from the primary ethernet- based network. Soaring internet traffic on corporate LANs is leading to bandwidth shortages related not only to numerous companies' internet connections, but also related to their network bandwidth itself. For these companies, offloading all of this traffic would definitely help to alleviate their problem.
3. The storage boxes are consolidated in such a manner that any application, regardless of the operating system it's running on, can be used to access any piece of data stored on the SAN (this promise hasn't been made real yet, but it should be within a couple of years). Right now, if a storage pool is connected to the back-end of a UNIX server, the only applications that can make use of the files within that box are those running on UNIX servers. Likewise, if a storage box was managed by an NT server, the only applications that could...well, you get the idea. Via their offerings, SAN vendors hope to allow their customers to put an end to such OS-related interoperability issues in the near future.
A number of critics, ranging from leading industry publications to market research firms, have stated that while SANs will flourish over the long run, the networks themselves will primarily use high-speed versions of ethernet rather than fibre channel as their disk access protocol of choice. Such critics point to the fact that ethernet-based networks are far cheaper and easier to install, and that the technology has already scaled to the point where it can allow 10 gb/s connections. However, while fibre channel doesn't differ much from ethernet in terms of the actual bandwidth a connection using the protocol will allow, the former does have an edge over in the latter in that the maximum size for an individual data packet that can be sent via fibre channel(125 MB) is larger than nearly any individual file you can think of, while the packet size for ethernet (1.5 KB) is quite low, giventhat it's a general-purpose network protocol. To see why this matters, suppose that a client on a corporate LAN attempted to access a 5 MB AVI video clip in a storage box two miles away in terms of cabling. If his company were to use ethernet connections for the SAN, upon coming off of the storage pool and onto the ethernet-based SAN, the data for the video clip would immediately have to be broken up into thousands of packets before reaching the server where the application using the file resides. On the other hand, were a fibre channel-based SAN to exist in this scenario, the video clip could travel as an unadulterated 5 MB file over those two miles,leading to much less processing to be done by both the server and the storage box. It's easy to see in which of the two scenarios the client is better serviced. This performance advantage related to reduced processing is a factor that will continue to increase in its importance given the rate at which storage demand growth (and thus growth for the network traffic related to it) is outpacing Moore's Law. Granted, this benefit is fairly inconsequential if the file being requested happened to be something along the lines of a 20 KB Microsoft Word document, but as the previously-mentioned trend of ever-increasing file sizes continues...
When the enterprise storage market first began to explode, the market for the storage boxes themselves was dominated by IBM (NYSE: IBM). Of course, expecting IBM to hold onto a lead in a high-growth business is like expecting a James Bond movie to end with the villain taking over the world and getting the girl. As most of you already know, over the past decade, through more robust offerings, superior software, and first-rate customer service, an upstart known as EMC slowly but steadily managed to take IBM's position as the king of the storage industry, leading to the company being awarded a $130 billion market cap as of this moment, and resulting in company CEO Michael Reuttgers' name being mentioned in the same breath as those of Bill Gates, Larry Ellison, John Chambers, etc. Along its road to glory, EMC's also flown past a number of other entrenched competitors such as Hitachi and Sun Microsystems (NASDAQ: SUNW).
Unfortunately for EMC, it appears that history has a good chance of repeating itself. No, I'm not saying that the company may lose its crown to another SAN-based storage vendor with a superior product, but rather to Network Appliance (NASDAQ: NTAP), a competitor with a radically different set of offerings altogether. For the five readers that don't already know this, Network Appliance is by far(60% market share and growing at the expense of also-rans such as Sun, Auspex (NASDAQ: ASPX), and EMC) the leading vendor in the market for storage-specific servers, better known as network-attached-storage (NAS). NAS servers, unlike their UNIX and NT-based counterparts, fully integrate the "server" (i.e. CPU, DRAM, operating system, etc.) and the storage pools being managed into a single box. Thus with NAS, the server manufacturer is also the storage device vendor, and since the only purpose of NAS devices is to manage and serve files, every hardware and software component placed within the server is done so with the intention of creating the best device possible for going through with the task of serving files. In the case of Network Appliance, this specialization has been taken to an extreme, as the company has also created from scratch a proprietary operating system called Data ONTAP to run its NAS boxes, better known as Netapp file servers. Meanwhile,they've also created a series of applications for Data ONTAP to aid in its storage management capabilities. Previously, NASboxes were relegated to the low-end and mid-range markets, but recently, Network Appliance has begun to scale its offeringsMinto the multi- terabit level, making their products head-to-head competitors with EMC's high-end offerings.
If you were to compare and contrast two storage offerings, one being a UNIX server (in my opinion, the most robustserver OS) hooked up to a fibre channel SAN containing EMC Symmetrix storage boxes (EMC's high-end product) on theback end, and the other being a standalone Netapp file server with internal fibre channel connections to the hard drives within it,even if SANs end up going through with their promise of complete interoperability in terms of file sharing/access, something Network Appliance achieved years ago with its WAFL (Write Anywhhere File Layout) architecture, the latter could still be considered superior to the former for the following reasons:
1. Netapp filers are much easier to set up, and require much less time to do so. This is due to the fact that unlike an EMC Symmetrix box, you don't have to configure it with a general-purpose UNIX, NT, Netware, or Linux server. All you have to do is hook it up to an ethernet-based network and perhaps fine-tune a few settings. Having a proprietary, "built for storage" operating system helps out in easing this process as well.
2. Since the filer was built from the ground up for storage management, it's much faster than a Symmetrix box. In high-bandwidth environments, the largest bottleneck in regards to speed is often the time it takes to access a file off of a server, making this a major issue that grows in importance with each passing month.
3. The fact that Data ONTAP was built for storage management results in benefits for the filer not only in terms of speed when compared to the EMC box, but also in terms of increased functionality and ease-of-use in regards to the OS itself, and the applications that Network Appliance has developed to run on it. For example, Data ONTAP has integrated within itself a patented feature known as Snapshot, which automatically creates one or more read-only file backups within a given file server, backups that take only a fraction of the disk space of the original. Thus if a file, or series of files within the box become corrupted or inadverently deleted, the files can be quickly and easily resored. With EMC's box, you'd have to go through the trouble of accessing the tape backup to get the files you need; and if the files were only recently modified, and backup hadn't been done since then, well, tough luck. Meanwhile, using the applications running on Data ONTAP for tasks such as replicating the contents of a given disk on another file server for file backup (SnapMirror) and restoring damaged or deleted files via Snapshot (SnapRestore) usually results in a much more user-friendly experience as compared to using the Veritas and Legato programs that do the same thing via general-purpose servers in order to manage SANs. Such proprietary software- basad benefits, of course, also give Network Appliance a huge edge when competing against other NAS vendors.
4. In spite of its speed advantage, and ignoring the costs related to setting up the SAN itself, the Netapp filer will still be well over 50% cheaper than the combined UNIX/Symmetrix solution. Talk about efficiency.
For these reasons, companies have been quick to embrace Netapp filers for their internet hosting and caching needs, as the devices fit perfectly into a market where speed, both in terms of file access and service deployment, are crucial. Pure-play internet companies, such as Yahoo!, AOL, and Lycos, have especially been voracious buyers of Network Appliance's products, as the needs described previously become amplified for them, given that a faster service combined with a shorter deployment time could become a major differentiating point between them and their competitors. Internet-related sales already constitute over 40% of Network Appliance's revenues, a number that's growing fast, making the company yet another member of that long list of businesses falling under the well-known "provides the picks and shovels for the internet gold rush" cliche.
Now it's that time again for me to ask some rhetorical questions that attempt to bring up flaws in my arguments. I've found that I'm now using this technique in just about every other article I write in order to strengthen my claims, probably a subconscious by-product of my reading a couple of the works of Plato, who did basically the same thing. Well, now that I'm conscious of this fact, I think it's best for me to do as Plato did and create a fictional character to voice these concerns instead. After all, the concerns really aren't my own. I've decided to name this fictional character Bob the Skeptic. Here's Bob's first installment:
Bob the Skeptic: Ok, I agree that Netapp filers are the way to go when it comes to internet storage; butwhat about the enterprise market? A large percentage of the files accessed from web hosting centers end up coming off of the network anyway, so there's no point in creating a dedicated, back-end network for storage access. However, in corporate settings, with a SAN, the accessed file goes from the back-end storage pool to the application server, and stays there, never coming onto the corporate ethernet. Using a standalone Netapp filer attached to the ethernet would get rid of that benefit for both file access and backup; and unlike the web hosting market, regardless of the type of storage solution they use, corporations still have to buy those UNIX and NT servers to run their business applications. Also, wouldn't an EMC solution be faster in this case, given accessing a file off of a SAN would be similar to a server accessing its own hard drive, while using a Netapp file server would be akin to using a whole other server. Furthermore, while I know that an end-to-end SAN solution costs far more than one based on NAS, given that they're buying application servers anyway, wouldn't it be cheaper for enterprises to buy just a storage pool (i.e. a Symmetrix box) rather than an entire server-storage solution such as a Netapp filer? These arguments would also hold true for accessing a web site's back-end databses and transaction- related content, since any request to access a web database or transaction-related content first has to go through the software that either manages the database or processes a transaction, software that runs on application servers.
And here's the response I'm giving to my fictional critic:
1. There's nothing stopping enterprises and internet companies (the latter for their databases and transaction-related content) from creating dedicated, ethernet-based networks that have Network Appliance file servers on the back-end for file access. Such networks would be just as beneficial as SANs when it comes to offloading network traffic, not to mention cheaper, faster, and easier to set up.
2. Believe it or not, Netapp file servers are not only cheaper than full-fledged SAN solutions, but cheaper than individual Symmerix boxes of equal storage capacity as well. This cost advantage can partly be attributed to the fact that hig-end EMC boxes ususally have mutiple processor arrays, while all Netapp filers currently run on a single Digital Alpha processor.
3. Nonetheless, they're also faster than Symmetrix boxes in situations where a server has the option of using either a SAN that it's hooked up to, or a Netapp filer. Black magic? Engineering and programming expertise? Maybe a little of both.
Bob the Skeptic: Good points, but how about using a SAN in conjunction with Network Appliance's filers for file backup? Snapshot's great, but you still need a backup device in case the entire filer crashes. Maybe opting for a Network Appliance box is a better idea if an application server acts as the primary server attempting to access a given file, but what about situations in which the Netapp filer itself acts as the primary "server," as it does when it attempts to retrieve files from a backup box? In such a scenario, wouldn't it be better for the filer to have a SAN storage pool connected to its back-end for tape backup, since this would be similar to accessing the filer's own hard drives, rather than hooking it up to another filer? Sure, the SAN solution would be more expensive, but as any IT administrator will tell you, when it comes to retrieving a large number of files from backup, speed is everything, and the SAN solution would definitely be faster. This isn't a small market either. Companies need to create a backup of everything they store, so even if Netapp boxes were used for everything else (something that's far from the case right now), as long as they end up being used for file backup, SANs would still constitute 50% of all storage solutions being implemented.
Based on what I've written so far, this is a valid argument. However, what I haven't mentioned as of yet is an innovative solution Network Appliance has come up with to address this isue. Known as clustered failover, it involves using two Netapp filers which on a day-to-day basis act completely independent of each other. Each has its own processor, operating system, and hard disks to manage, and data requests made to one filer consume none of the resources of the other. However, the disks of both filers are linked via a fibre channel connection, much like a SAN. Using SnapMirror, the primary filer (the one used for data access for employees/web users) can use this high-speed connection to send files to the secondary filer for backup, and if the disks on the primary server were to crash, this connection could be used to restore all lost data at speeds equal those that would be attained were a SAN to be used by the filer for backup. Clustered failover, of course, is also much cheaper and easier for a comapny to implement when compared to a regular SAN.
So are SANs dead on arrival? Not quite. Unfortunately, capitalism is far from being 100% efficient. Even though they might try to act in their best interests, people have a tendency make faulty choices. They eat TV dinners, rent Ernest movies, hire people that bear a strong reseblence to Homer Simpson, and yes, they buy storage boxes from the likes of EMC and Sun rather than from Network Appliance. I'm at a loss to explain the first three occurances, but the last of them can mostly be attributed to the fact that storage solutions managed by application servers are what could be classified as an "incumbent technology." Companies have been managing their storage for many years via such architectures, while NAS devices, although existing in one form or another for roughly a decade, are only beginning to be noticed by many major corporations. Combine this fact with the huge sales and marketing departments that companies such as EMC, Sun, and IBM have, and the deep inroads that these departments have made with dozens of Fortune 2000 companies, and it's easy to see that expecting Network Appliance to take over the storage industry overnight would be delusional, to say the least.
However, the tide does appear to be turning in the company's favor. Their annual sales growth recently accelerated into the triple digits, way ahead of the 30% growth the storage industry's experiencing; and for the first time, a large number of IT departments, both enterprise and internet-related, that previously would only consider for purchase storge boxes from EMC, IBM, etc. are now taking a close look at Netapp filers. Among these IT departments, those that make impartial purchasing decisions will opt for the solution based on a concept that at first comes across as a stroke of genius, but in the end appears very obvious to those that take the time to think about it. "
The text of this article can be found at the author's web site, at tsrec.com |