Pirate: is this trend of collecting biotech data a growing part of NTAP's biz?
Future Boy
The New Biology
By Erick Schonfeld ecompany.com
With the first phase of the Human Genome Project recently completed, the real work of figuring out how genes influence disease can finally begin in earnest. But one truth that is already apparent is that biological research will become impossible without the assistance of advanced computer technology. There will be so much data to sift through -- trillions and trillions of possible gene and protein combinations -- that there will be little meaningful work done in this field without computing.
An announcement on May 30 by IBM and others that they would create a joint venture called Blueprint Worldwide highlights the growing link between the two disciplines. Blueprint aims to bring together all publicly available information about proteins (their structure, function, and location, and how they interact) from all available biomolecular databases, as well as to cull data from 200,000 published papers. During the conference call, Caroline Kovac, the vice president of IBM's life sciences division, explained the point of the not-for-profit project: "Blueprint is a part of what we see as a convergence between biology and computers. The new biology is a science that no longer can be practiced without computers." There has been an exponential growth of biological data represented by genomics, and an even vaster amount of data will be created by proteomics (the study of proteins).
For many computer scientists today, there are few challenges as exciting as attempting to delve into -- or even replicate in silico, as they say -- the events that occur in a biological environment. Kovac mentioned BlueGene, an IBM project to create the fastest computer in the world by clustering together more than 1 million microprocessors; when BlueGene is completed in 2004, its first task will be to try to figure out how proteins unfold. "Supercomputing today is being driven by biology," she pointed out, "whereas 10 years ago it was driven by physics."
But Big Blue isn't helping to create this free repository of data purely out of the goodness of its heart. It needs to learn how to better apply computer technology to the new biology. Bioinformatics represents a potentially huge market for IBM's hardware and software. Kovac estimates that life sciences will be a $40 billion industry segment by 2004, up from $22 billion in 2000. As scientists move from mapping out the human genome to mapping out the protein interactions that take place within human cells, "we see a tremendous opportunity to bring information systems to these scientists," Kovac said. IBM will learn from Blueprint by using it to test novel ways to, say, cluster computers or come up with software to better annotate data about gene splices.
But IBM is not the only company preparing for the onslaught of data that the proteomics revolution is about to unleash. Housed in a seven-story, green-marble, art-deco department store building in downtown Oakland, Calif., is a startup called DoubleTwist that is tackling the same challenge. "What is bioinformatics?" asks CEO John Couch. "It is capturing information coming out of the genome." But capturing that information is easier said than done. "Every biotech company is struggling with its information systems," he says. A veteran software executive from the early days of Apple, Couch sees a great need to simplify database computing for scientists, just as Apple simplified personal computing for non-programmers.
Large drug companies today employ bioinformaticians with roles similar to those of management information systems employees three decades ago. Back then, an MIS manager would take queries from executives about things such as inventory levels or sales patterns, run them through mainframes or minicomputers, and then report the answers to the people who actually needed the information to make business decisions. The same thing is happening today with bioinformaticians and the biological data that scientists need to do their research. Couch is addressing this problem by making DoubleTwist an information resource for bioinformaticians and scientists alike, and even going so far as to take away the headache of managing the databases and server farms needed to do cutting-edge biology.
At first glance, it may appear that DoubleTwist is trying to charge for the same thing that the Blueprint joint venture proposes to give away for free: a one-stop digital portal for gene-related data. Indeed, there will be some overlap, but Blueprint is focused only on publicly available sources of such data. While that is a rich vein, it is far from complete. In addition to offering access to public databases (such as those that contain a map of the human genome, as well as information about proteins and nucleotides), DoubleTwist pulls together proprietary databases from other companies. These may focus on specific sets of genes, such as those related to the human brain, or allow for comprehensive patent or literature searches. DoubleTwist also has its own gene indexes and an annotated version of the human genome (which companies such as Merck and Hitachi pay good money for). "I want to be the Bloomberg of this space," Couch says. All in all, he already integrates data from about three dozen sources, and he sees Blueprint as simply another one to draw from.
"Data is data," says DoubleTwist president Rob Williamson, "but information is what people want." It takes computers and computer know-how to find those valuable nuggets of data that all of a sudden become "information" -- the elusive philosopher's stone of our age. DoubleTwist, for one, is making this transformation easier by adding an intuitive graphical interface that allows scientists themselves (with a little training) to interact with the slew of databases available to them. The company runs multiple software algorithms against multiple databases, and provides scientists with one comprehensive view of their search results. If anything related to a gene sequence that a scientist is studying changes in the public or private databases, the literature, or the patents, the software informs that scientist automatically. DoubleTwist also makes it easy to integrate all of this outside information with a drug company's own closely held internal data (DoubleTwist's customers can either host all of this software on their own servers or get access to the external databases through an ASP service on DoubleTwist's site.)
What both the Blueprint and DoubleTwist efforts point to is an era in which scientific discoveries are just as likely to come from parsing trillions of bits of data that already exist somewhere in a computer as they are to be discovered through finely tuned experiments in a wet lab. The efforts also signal a much more collaborative type of science than has ever before existed. As Kovac put it, "Biology is no longer about researchers in their laboratories, but a more dynamic community living and breathing in an age of interconnectedness." If Kovac, Couch, and others can pull off their grand bioinformatics projects of connecting the most brilliant minds in biology, they might turn out to be right that the impact of this new technology during the next decade will be greater than what we have seen so far from the Internet.
For more information and related links, see the online version of this story at: ecompany.com |