Web-based 'farms' let design teams plow ahead. Electronic Engineering Times, June 21, 1999 p92
Author Lee, Dwayne
Full Text Network-centric computing farms -which are also known as server farms or ranches-have emerged as an effective method for boosting productivity for large design teams. For example, Sun Microsystems has created computing ranches for IC design that serve as scalable computational resources for true 24-hour, seven-day-a-week global engineering.
As design groups become more dispersed and more engineers work outside their offices, it is only natural that those server ranches should be accessed from the Web. Recognizing the benefits of such a situation, Sun has implemented a Web-based interface for its server ranches that enables both users and support personnel to interact with the ranches through a Web-based browser.
To appreciate this marriage of Sun's server ranches with the Web, it is first important to understand what a server ranch is and how it operates. Simply put, Sun's ranches deliver the computing muscle necessary for creating the company's next-generation processor designs. They pool a staggering amount of computer resources, making them available to hundreds of integrated-circuit designers within the company.
For example, the computing ranch used for the microprocessor design group includes 750 multiprocessor UltraSparc systems, approximately 2,500 UltraSparc CPUs, more than 1 Tbyte of physical RAM and around 26 Tbytes of disk space. That mind-boggling array of resources is linked with 100-Mbyte Fast Ethernet switched networking throughout the ranch and to the desktops on Sun campuses.
The ranch provides the advanced computational power needed by Sun's microprocessor and server design teams. It performs more than a billion cycles of simulation in a given week. To maximize its usage, it is simultaneously used by 15 to 20 projects at different points in the design cycle, supporting roughly 600 to 800 designers and up to 200 different design and simulation applications, including approximately 50 mainstream EDA applications. By leveraging network computing, centralized administration and significant automation, the support staff for that ranch is low, averaging just one administrator for every 100 systems.
Keeping data home
Besides providing formidable computing resources, the ranch also houses the most valuable asset Sun has- the design data itself. Keeping the design database within the confines of the ranch considerably simplifies version control and the tracking of changes. It also is much easier to back up and protect critical design data from one central location and to provide effective security than if the information is dispersed across a variety of computing resources residing in different locations.
To ensure the best resources are available to the design team, the latest, fastest computing hardware is reserved for the ranch. Older equipment is rotated out onto the designers' desktops. That arrangement works well, since the designers rely on the ranch for all their computational-intensive tasks and therefore don't require as much computing muscle on the desktop.
To be effective, a ranch must be considerably more than the sum of the individual servers. Sophisticated network computing techniques and significant amounts of automation are necessary to ensure that the hundreds of users have easy access to the resources when they need them.
Designers should not be bothered with the details of what server to use. Nor should they have to chase failed jobs and track system-administration matters. Such tasks should be executed in the background so that designers can focus on their primary responsibility: designing.
The server ranches at Sun rely on custom resource-sharing software, called Dream (Distributed Resource Allocation Manager), that creates a seamless interface between the ranch and the designers. The software enables optimal use of the computing capabilities in the ranch without burdening the designer with the task of managing the resources. The software ensures that if a job starts, it can finish-that the job won't run out of memory, disk space, swap space or licenses.
The software also tries to meet the specific needs of the individual designers. Users may provide specific criteria with their jobs, including priority and the required EDA software. The resource-sharing software then matches submitted jobs with appropriate servers in the ranch and schedules the user's job for execution.
The software continually tracks all jobs under its control and is even able to restart them automatically in the event of failure. Users can check into the status of their jobs as they run. They are notified upon completion.
Thanks to that resource-sharing software, the server ranches at Sun approach 100 percent utilization. That high utilization within the server ranch is achieved by a steady stream of medium-to-large batch jobs 24 hours a day, organized and tracked by the resource-sharing software. As a result, in the past year the Sun server ranch for microprocessor design has averaged close to a million jobs a month, consuming approximately 800,000 CPU hours. At the same time, Sun has managed to keep average queuing time for short jobs to around three minutes.
That efficient use of hardware and EDA software saves Sun millions of dollars a year, enabling the company to continually expand and upgrade the ranch to meet next-generation design needs. Just as important, it has resulted in better designs. Being able to easily submit jobs means that designers don't think twice about running extra tests, which help them to more easily visualize trends and determine design parameters.
Access is key to success
One of the critical factors in the success of Sun's server ranches is the high degree of access designers have. But this was no easy accomplishment. Like most high-technology companies, Sun has geographically dispersed design groups that work at different times around the clock. But those different design groups don't have the usual localized computing resources; all share the same server ranch.
To justify that centralized resource and fully exploit its power 24 hours a day, seven days a week, it must be available to any and all Sun designers at any time from anywhere. The best way to do that is with a Web browser, which is exactly what Sun did.
A Web-based browser interface is a fast, intuitive approach that is immediately understood by most users. With a browser interface via the Web, a designer can be anywhere in the corporation or halfway around the world and use the ranch's resources as if via a dedicated line. What could be easier than simply going into a Web browser, typing in the URL and being instantly connected to the server ranch?
That browser-based Web interface to the ranch was created several years ago for use within Sun Microsystems' intranet. Just last year, in part due to the maturing state of Web security technology, it was opened up to Internet use for one of Sun's next-generation microprocessor design groups. The major impetus behind creating the Web interface was to give designers one place where they could go for the latest information on their design.
Let's examine how a typical microprocessor designer would use the server ranch via the Web browser. Say this designer is working from home late one evening and is worried about a troublesome race condition in the clock distribution. After successfully entering the Web site with a valid password, she pulls the latest release of the microprocessor design from the ranch. As this information is being made available via the Web, the designer is assured that she is accessing the most current design data.
Now, let's say our designer examines the clock circuitry and suddenly realizes that her verification test had some inappropriate timing variables that could have inadvertently injected the race condition. She changes the variables and resubmits the verification of the clocking circuit to the ranch via the Web. The site informs her that the job is accepted and that it will keep her posted on the status of the verification task. The next morning, the Web page tells her that the verification job has passed. The results are displayed and, to the designer's relief, the race condition has disappeared. But now there seems to be a problem with the delay circuitry in one of the branches of the clock tree. With a sigh, she starts creating a new suite of verification jobs that should help get to the source of that new dilemma.
As you can see, the Web interface to the server ranch greatly facilitates the use of this powerful resource. Without direct access, our designer would have had to wait until she got to the office to explore the problem.
Support made easy
The ease of access offered by the Web also greatly benefits the system administrator's support team in maintaining, troubleshooting and monitoring the ranch. The ability, day or night, to observe all the relevant details on server-ranch functioning enables support personnel to keep the ranch operating at an optimal level all the time. And the Web-based interface is an important tool for keeping support engineering informed, in real-time, concerning vital details about the server-ranch operation.
Most of the administrative tasks can be conducted over the Web. For example, system administrators can install operating systems and administer patches through the Web site. They also can monitor and manage activities and system status. They can even access database servers that assist in managing the server ranch.
The administrators can monitor the ranch at a glance, seeing immediately when critical systems are compromised or down. The graphic Web site gives multiple views into the network management database so that various aspects of the ranch can be monitored simultaneously.
Besides keeping things running smoothly, the system administrators are responsible for the long-term viability of that important resource. Automated tracking and reporting let them continually evaluate the performance and configuration of the server ranch and justify current and future expenditures.
In Sun's ranches, important statistics and usage information are compiled and summarized. Much of this data is presented dynamically on the Web site, both for administrators and designers and in monthly reports that summarize the data graphically.
Internal and external security were of the utmost importance when this portal was devised. It is vital to keep the details of these next-generation microprocessor designs completely protected. Sun took aggressive steps to ensure that only authorized personnel could gain access.
For internal security, Sun built an access database that incorporates information both from human resources on the division level and from the corporate-wide network access database. That way, people from the microelectronics and other divisions can access information and the server ranch. This access database is updated daily to ensure that any and all employee changes are immediately reflected.
To link the server ranch exterior to the corporate intranet, Sun issues token cards to employees granted external access. The user activates the token card by typing in a password. If all is correct, he will be connected to the server ranch via Sun.net, which provides a secure private connection over the Internet. Sun.net builds a secure tunnel that encrypts all the data moving across the connection, in essence creating a protected causeway from the browser to Sun's intranet.
Already, almost all Sun users exclusively utilize the internal Web interface to access the server ranch when working on campus. External access will be expanded to accommodate all follow-on processor designs. |