More from IDG. --
for genomics research by George A. Chidi Jr., IDG News Service\Boston Bureau January 19, 2001, 14:39
Compaq Computer Corp. has signed on to provide supercomputing technology to the Sandia National Laboratory in New Mexico and Celera Genomics Corp. in a four-year cooperative research and development agreement announced Friday by the U.S. Department of Energy. Scientists hope to develop software and computer hardware specifically designed for the demands of computational biology and applied life sciences research.
Researchers are aiming for a supercomputer capable by 2004 of processing 100 trillion operations per second -- 80 times faster than the computers Celera used to sequence the genome and about eight times faster than the fastest supercomputer currently working. Ultimately, they hope to develop a petaflop supercomputer -- that is, 1,000 trillion operations per second.
Compaq executives see supercomputing in biological research as a very fast growing part of its business. "All of the supercomputing technology we've developed in the last ten years is beginning to be the very tool that our biologists need," said Bill Blake, Compaq's vice president of technical computing.
Blake said the supercomputer deal could mean hundreds of millions of dollars in revenue for Compaq, without being more specific. In the long term, the research could be worth billions to the company. Blake estimated the technical computing market to nearly double over the next 10 years, and said that genomics supercomputing is the fastest growing part of that increase.
Compaq could take the technologies developed in the project -- the software, optimized hardware and perhaps consulting services -- and sell it in a somewhat smaller package to biology research startup companies.
The supercomputer developed with Celera and Sandia will be based on Compaq's Alpha microprocessor, and will contain as many as 10,000 to 20,000 processors, Blake said. One key to the research for Compaq will be for the team to develop a genomics software architecture that will work effectively with a supercomputer of 100 processors as well as one of 10,000 processors.
Until biologists applied supercomputer technologies to researching the human genome, nuclear physics remained the greatest challenge for computer science. With the human genome sequence complete, biologists are looking to the computational expertise of nuclear weapons researchers to build algorithms for genetics research on computers.
"Now that the genome is sequenced, we're entering the era of holistic biology," said J. Craig Venter, Celera's president and chief scientific officer. Holistic biology, as he describes it, refers to a blurring and blending of scientific disciplines in pursuit of biological research.
Venter and other scientists cite the small number of researchers skilled in bioinformatics -- the science of applying information technology to biological research -- as a bottleneck potentially slowing genomic research. Bioinformatics is complex and multidisciplinary, requiring facility with physics and arcane mathematics as well as biology and computer science.
"What attracts people to biology is not what attracts people to physics," said Blake. "There are too few people that live at the intersection of biology and computer science and engineering."
One of the primary goals of researchers is to develop visualization technologies for analyzing the massive quantities of experimental data, simply to be able to look at the information as something more than streaming lines of code or seemingly random strings of C's, G's, A's and T's.
"We in the nuclear weapons community believed nothing could be more complex than the development of nuclear weapons," said Bill Camp, Sandia's director of computation, computers and mathematics. He conceded that genomics is more complex, but called that complexity itself a problem. "The science must be simplified," he said.
Writing programs for supercomputers requires an understanding of parallel system architecture, which is different from computers with single processors. Data entering a supercomputer must be broken up into bite-sized chunks and distributed evenly among the many processors in the computer. Sandia's scientists will apply the same computer skills used to model nuclear explosions to develop algorithms for breaking up genetic data for parallel processors as well as visualization tools for genomics with Celera.
U.S. government researchers have been instrumental in developing genomic algorithms, in no small part because the fastest supercomputers have been in government hands. And both genetics and IT companies have partnered extensively with government agencies to make use of the information.
For example, last month IBM Corp. and NuTec Sciences Inc. announced plans to build a 7.5 trillion calculations-per-second supercomputer to investigate genetic expression and disease, using the "multivariant analysis of gene expression" algorithm developed by the National Human Genome Research Institute.
While the Los Alamos and Lawrence Livermore national laboratories are involved with genome research, Venter said he sought to partner with Sandia because their researchers have particular expertise in massively parallel supercomputing. "Biology can't proceed without high-end computing," he said. "We need the expertise in both communities to proceed."
Camp said genomics and proteomics -- the study of the physical structure and function of proteins in cells -- will require more powerful computers manipulating more data than was necessary to sequence the genome itself.
Compaq can be contacted at compaq.com. Celera Genomics can be contacted at celera.com. |