SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Son of SAN - Storage Networking Technologies -- Ignore unavailable to you. Want to Upgrade?


To: J Fieb who wrote (2323)10/8/2000 11:45:29 AM
From: J Fieb  Read Replies (1) | Respond to of 4808
 
VIA ++

October 02, 2000, Issue: 1119
Section: Feature -- Clustering
--------------------------------------------------------------------------------
Historic Advances: Scientific Clustering Paves the Way -- Researchers pioneer a new computing field to help their own studies-and fuel commercial interests along the way. Now Microsoft wants a piece of Unix's pie.
Steve J. Chapin

Scientific clusters are used for a broad range of disciplines, including biology (genome mapping, protein folding), engineering (turbo-fan design, automobile design), high-energy physics (nuclear-weapons simulation), astrophysics (galaxy simulation) and meteorology (climate simulation, earth/ocean modeling). For example, computer scientists at Syracuse University's High Performance Distributed Computing Lab in Syracuse, N.Y., are working with biochemists at the University of Washington on ab initio protein folding, which is a critical step in helping us interpret the results of Celera Genomics' recent sequencing of the human genome. To solve this problem, Syracuse's Orange Grove Cluster uses Condor from the University of Wisconsin to run thousands of independent jobs covering portions of a search space, using a process known as simulated annealing, which starts with a guess at a solution and iteratively improves that solution until it reaches convergence.

One of the first "modern" clusters was the Beowulf project at NASA's CESDIS, led by Thomas Sterling and Donald Becker (www.beowulf.org). Beowulf was inspired by physicists' need to analyze large data sets and the scarcity of computer time on their local supercomputers. The Beowulf team bought 16 off-the-shelf PCs and connected them with two Fast Ethernet networks. The emphasis of the Beowulf project was on using the cheapest available commodity components, so the team selected commodity Ethernet and Intel-compatible processors.

At the other end of the spectrum, the Computational Plant (C-Plant) project at Sandia National Labs in Albuquerque, N.M., is an attempt to build a true supercomputer from COTS (commercial, off-the-shelf) components. The Sandia scientists focused on distributed computing using message passing (that is, no shared memory between processors). They focused on performance and built their machine using Digital (now Compaq) Alpha processors with a gigabit-speed system-area network based on Myrinet. The scientists have developed low-latency message-passing software, called Portals, which lets them extract the maximum performance from the underlying network. The newest C-Plant cluster, Antarctica, is scheduled to be in place in mid-October and will have more than 1,800 Alpha computers connected by Myrinet.

The choice of hardware for scientific computing depends on the applications you need to run. For high-performance applications, the critical factor is usually the granularity, or ratio of computation to communication. Coarse-grained applications tend to send larger messages with low frequency, while fine-grained computations more often send smaller messages. Beowulf-class clusters, because of their slower networks, are best for coarse-grained applications. C-Plant-class clusters preserve the computation-to-communication ratio seen on past supercomputers, such as the Intel Paragon, and can handle more communication-intensive applications than the Beowulf clusters can. Commercial clusters offer better support for coarse-grained applications, typically using Fast Ethernet as the interconnection network.

The Orange Grove cluster at Syracuse University represents a midpoint on the spectrum of hardware choices. The Orange Grove has 48 dual-processor Intel machines and 16 Alphas, all connected by 100-Mbps switched Fast Ethernet. In addition, the cluster has 16 nodes connected via Giganet's cLAN, a native hardware implementation of the VIA. (See "Building a Faster Network Via Software," at www.nwc.com/1107/1107ws2.html, for more information on VIA, a gigabit-speed system-area network.)

Most scientific clusters use some form of batch-processing or work-sharing software, such as the PBS (Portable Batch System) from MRJ Technology Solutions (now part of Veridian Information Solutions), LSF (Load Sharing Facility) from Platform Computing Corp. or Condor. These software packages control the placement and execution of programs on the cluster automatically, freeing the end user from worrying about administrative details.

Commercial Offerings

Commercial vendors are offering complete scientific cluster packages. Linux NetworX's solutions contain both the hardware and the management software in the box. Sun Microsystems sells the hardware solution as a package called Sun Technical Compute Farm, and its recent acquisition of Gridware, a batch-processing software provider, implies a turnkey solution is on the way.

Research clusters are almost exclusively Unix-based, with Linux being the dominant operating system. This dates back to Don Becker's choice of Linux for the original Beowulf cluster; the rapid propagation of Beowulf established Linux as the default cluster OS. The cluster realm is one in which Unix and its variants have a substantial lead on Windows NT. Clearly, Microsoft has taken steps to address this, but for the near term, Unix remains the operating system of choice for cluster applications. There have been research clusters built based on Windows, including Andrew Chien's work at the University of Illinois and the University of California at San Diego, and the AC3 cluster at Cornell University. And many of the management packages for batch processing software have also been ported to Windows. Even though these researchers have established that it is possible, if not easy, to build Windows clusters for scientific research, there is a widely held bias in the research community toward Unix.

New Clustering Technologies Emerge

Two trends first used in research computing are now emerging and should soon find application at more businesses. First is the construction of clusters of multiprocessors, or clumps. These clusters use the same networking technology as plain clusters, but each node in the cluster is a parallel, shared-memory machine. While traditional clusters use message passing almost exclusively, clumps encourage a combination of shared-memory and message-passing programming known as mixed-mode programming. In the commercial world, this will allow the easy replication of multithreaded servers and speed up each individual server.

At the same time, we are moving toward a convergence of the system-area network (used to pass messages between processes) and the storage-area network (used to access devices). VIA (Virtual Interface Architecture) is an emerging standard in message passing. Another new standard for storage networks, called Infiniband, is also beginning to take hold (see "Infiniband for Christmas," page 280, for more on this standard). While Infiniband is the anointed standard, whether customers will adopt it in lieu of Fibre Channel remains to be seen (the contest will be akin to that between OSI networking standards and TCP/IP). If Infiniband pans out, the next logical step is to produce a "VIA++'', which merges system-area and storage-area networks. This convergence will help applications fronting large databases, and will make access to remote servers and remote devices seamless, enabling a new wave of applications.

--

Sharing the Load

In the commercial sector, the term load-balancing has been co-opted to mean solutions aimed at applications (such as HTTP) or IP-level switching. Load-balancing solutions are a good fit for applications that have lots of short-lived transactions. For this reason, load-balancing is often used in front of a set of Web servers to distribute the load among them. This can increase site scalability and availability. The simplest solution is round-robin DNS-a single DNS name is shared among several IP addresses. When a query is made to the DNS server, it rotates the returned IP addresses. Enhancements in software and hardware load-balancers provide more functionality than does round-robin DNS. For example, they can tell when a server is heavily loaded or not responding, and limit the number of Web requests to that machine.

Web servers are the most popular application for load-balancing, but any IP-based technology could benefit from the enhanced reliability and scalability afforded by this technology. The applications that need load-balancing are those your users or customers can't live without. Load-balancing isn't necessarily as expensive or complex as high availability, but it can provide the same kind of uptime to help you meet your SLAs (service-level agreements).

However, these solutions are not a godsend. If you're not careful, you may be adding scalability somewhere other than at your application's bottleneck. For example, an HTTP load-balancing switch with lots of Web servers won't help you if the real bottleneck is at the database they all rely upon. The queue to use the database will simply get longer, and you won't see the improvements you were expecting. If your load-balanced application contains a lot of dynamic data that can't easily be replicated between the load-balanced servers, this might not be the best solution. Just as important, adding a single load-balancing server is adding a single point of failure.

Microsoft provides 32-node software-based load-balancing in the box with Windows 2000 Advanced Server and Windows NT Enterprise Server. Microsoft's solution sends the client request to each server in the "cluster," which uses a sophisticated algorithm to determine which server will respond. Load and availability are taken into account.

We tested Microsoft's load-balancing solution in our Real-World Labs(r) at Syracuse University, with the help of some loaner equipment from Compaq, and found this solution works very well indeed. Setup is complex, and a single mistyped IP address can render the entire system unusable. Once we jumped those hurdles, however, the system provided load-balancing and reliability even amid intentional single and multiple system failures via the red switch.

The Linux Virtual Server Project (www.linuxvirtualserver.org) uses a single Linux server as an IP-based load-balancer and as a front end for many other servers. It provides a high-availability and scalability solution for the servers on the back end but introduces its own single point of failure. However, it can be used in conjunction with a high-availability solution to provide high availability in which a new load-balancer takes over if the first one fails.

More and more application server products, particularly those in the Web e-commerce market, offer some sort of application-level load-balancing and failover capabilities. Or they might use TP monitors/OTM (Object Transaction Managers) to do the job. Solutions such as BEA's WebLogic or IBM's WebSphere help maintain the state of any transactions-e-commerce purchases, for example-in case of a failure and spread out requests between multiple servers if needed.

nwc.com

Copyright ® 2000 CMP Med



To: J Fieb who wrote (2323)10/8/2000 2:05:52 PM
From: Gus  Respond to of 4808
 
Thanks J.

Many of the dozens of optical networking companies focused on the metro area are pitching storage as the primary justification for service providers to invest in DWDM gear for the access portion of their network. Take away SAN services, say many industry experts, and the case for DWDM in the metro network is considerably weakened.


Message 14535730

It seems to me that the louder the rhetoric from all quarters, the more apparent CMNT's strenghts become in terms of technology (already shipping) and installed base.

IMO, where storage networking diverges profoundly from traditional data networking is in the fundamental approaches to failure management.

There are two basic approaches to network congestion:

1) by increasing the amount of the critical resource -- network bandwidth.

2) by reducing the demand for the critical resource -- degradation, failure or off-peak scheduling.

In each of these instances, the RAID vendor, as the central player in the SAN, has to provide the critical means to account for the data in transit either by using automated third mirrors and resynchronization schemes or by inducing user intervention otherwise the level of data loss can quickly spiral out of control and result in multiple databases being riddled with unacceptable errors.