Data glut
As gene research yields information counted in terabytes, researchers struggle to visualize and process it while technology businesses scramble to profit from it.
By John Dodge, Globe Correspondent, 2/24/2003
The sequencing of the human genome over the past decade was supposed to help revive the flagging fortunes of the information technology industry.
To some extent, the life sciences market, which relies heavily on computational biology, has lived up to the promise. Research centers in both the private and public sectors placed orders last year for thousands of servers and storage systems capable of handling terabytes of the new genomic, proteomic, drug, and health care data generated hourly.
That's the good news. The bad news is that, during the past year, companies that develop software tools for managing and exploiting all of the new data struggled mightily. Red ink, consolidation, and layoffs were the norm. Welcome to the tumultuous world of ''bioinformatics,'' the underachieving wonder child of a genomics revolution-in-waiting.
Bioinformatics is where computing intersects with biotechnology. The field is made up of software, database, visualization, and other companies, often funded by pharmaceutical giants, whose technology handles and analyzes genomic bits and bytes along with other biological and clinical data. (The field is sometimes known simply as informatics.) But selling life sciences software, services, and information has proven tougher than anticipated, so much so that many purveyors have sought to transform themselves into drug companies, betting that tangible products will be their salvation.
Incyte Genomics Inc. closed operations and pared its work force by 37 percent in November as it lessened its reliance on technology and moved deeper into drug discovery. Celera Genomics Group went through a similar reorganization in June. Others like InforMax and NetGenics were lucky enough to find merger partners in Invitrogen and LION Biosciences AG, respectively. The late Doubletwist Inc., a once hot start-up, closed its doors last spring.
''Pharmaceutical companies are realizing they are not making as much progress as they thought investing in genomics, proteomics, and informatics,'' says Phillips Kuhl, president of the Cambridge Healthtech Institute research firm. ''They're not getting the returns on investment.''
Investment has been shifted to obtaining promising drug candidates from which pharmaceutical companies, desperate to replenish depleted pipelines, see a quicker payoff. Johnson & Johnson just agreed to pay $2.4 billion in cash for the biopharmaceutical company Scios Inc. Pharmacia Corp.'s powerful lineup of cancer drugs lured Pfizer Inc. into a merger. Bristol Myers Squibb Co. has paid dearly for its alliance with ImClone Systems Inc.
''You see very large sums of money being paid for later-stage assets [drugs in development],'' says David Block, chief operating officer at Celera Genomics. ''Big pharmas are more cautious in spending on informatics.''
Few dispute the long-term potential of mining genomic data to attack the root causes of disease, but the commercial glow from sequencing the human genome has dimmed. Now, the intense pressure to come up with revenue-producing drugs has spread from big pharmaceutical companies to the smaller companies and the once-promising start-ups whose fortunes were to be made selling technology to bring genome-derived drugs to market quickly and inexpensively.
''We've got to figure out ways to reduce time and cost of drug development,'' says Eric Neumann, vice president of bioinformatics at Beyond Genomics Inc. in Waltham. ''Informatics is the key. The question is: How do you shrink-wrap great insight into commercial systems?'' The company is advancing the notion of systems biology, which investigates entire biological systems instead of just individual molecules and cells.
Another barrier is the advances made in genomic and proteomic research themselves. Researchers have found that the more they know, the more they realize they don't know.
''We are in a data-rich environment, but the fact is we are information poor,'' says Peter Sorger, an associate professor of biology at MIT and co-chairman of the school's budding systems biology initiative. ''You look at biological systems with much more complexity than before. Bioinformatics has concentrated entirely on sequence information, but it's only a tiny piece of the puzzle.''
The myth that drug discovery is on track to becoming primarily computational hasn't helped either. ''It sounds wonderful if you can do everything on computers,'' Neumann says. ''You don't have to pay as much [as when conducting real lab experiments]. We really have a poor understanding of what a gene actually does and where and when it should do it. You can understand the entire genome and [still] understand less than 1 percent about what is going on in a cell.''
Celera Genomics, which used its now-legendary ''shotgun approach'' for sequencing the human genome several chunks at a time, changed gears last year to focus on drug discovery as it became clear that sales of sequencing information alone would not sustain the company. Marketing for the Celera Discovery System, a Web-based source of public and private biological data, was transferred to sister company Applied Biosystems in June. Celera trimmed 132 jobs as a result, while Applied Biosystems later trimmed 400.
Companies today have to offer a broad spectrum of integrated biological, chemical, and clinical information, not just novel sequencing data, says Tony Kerlavagh, senior director of bioinformatics applications at Celera. ''Research scientists and bioinformatics groups want complete unencumbered access to every bit of information.''
As a customer for such products, Beyond Genomics's Neumann concurs. ''There's been a shift from just data to knowledge extraction,'' he says.
Compounding Celera's problems, its exclusivity for information about the human genome lasted only a short time as sequencing information moved into the public domain.
''There was very little information that could not be extracted from the public databases. The six- to nine-month jump on it Celera had was not enough to make a big difference,'' says Sorger, whose bioengineering lab produces a terabyte of data in a typical month. In the future, Sorger says, personalized medicine could mean each individual's electronic medical records will amount to several terabytes of data.
The economy, of course, has depresssed technology buying. But more than that, off-the-shelf bioinformatics are often not what the doctor ordered.
''Everybody says they have a total information package and none of them actually do,'' says William S. Hayes, a bioinformatics scientist at AstraZeneca Group's research and development site in Waltham. ''Applications are cobbled together from incompatible systems. There's so many mergers taking place [that] it takes quite a while for products from smaller companies to get integrated, if they ever do.''
The irony is that AstraZeneca would rather buy off-the-shelf software instead of developing it internally. ''Anything we can buy costs a lot less than anything we can develop,'' Hayes says. ''Some say if we don't develop it in-house, we don't have an advantage. That's ridiculous. The only times we do internal [development] is when we can't find something externally.''
The problem is twofold: The bioinformatics industry often fails to give researchers what they want and, when it does, incompatibilities between databases, tools, and whatever else is in the picture can be overwhelming.
Sorger and his MIT colleagues have used many informatics applications off-the-shelf only to shortly afterward put them back. His biggest complaint is that they tend to be scientifically lacking.
''We bought a lot of software and noticed the underlying science was simplistic'' he says. ''So [researchers] would choose to use buggy public software instead.''
When the software doesn't work, scientists conducting experiments in both commercial and academic labs simply squirrel away their findings in a paper notebook or Excel spreadsheet, says Sorger.
''Think about experimental science,'' he says. ''You're trying to adapt your experiment to Mother Nature and it's extremely difficult. Once it's working and your database is not set up, you ignore the database. It's the most demoralizing part of all of this. After two to three years of installing large database systems, people ignore them because they don't really help solve the problems they have with their experiments.''
So it's not uncommon for valuable data to be sitting in Excel spreadsheets, accessible only by a handful of people if not a single researcher. ''You end up with 5,000 Excel spreadsheets instead of one consistent data source,'' adds Sorger.
These problems don't necessarily preclude a healthy informatics industry at some point, according to Hayes and the others. He sees the XML programming language as a positive step toward reducing incompatibilities and the ability to easily disseminate data.
''XML is a start,'' he said. ''If we can transfer the data, we could deploy new [biological] algorithms very quickly.'' A standards body called ''IC3'' was formed in 2001 to facilitate universal data exchange in the life sciences, though it's not always easy to reach consensus. Progress comes slowly.
Despite the optimism that bioinformatics will prosper some day, the numbers describe the current situation. Significant bioinformatics deals, from mergers to alliances, last year dropped 41 percent to 194, from 331 in 2001, according to research from Cambridge HealthTech.
''The role of informatics is going to grow astronomically, but the kind of things that have to be produced are not clear,'' says Neumann. ''Nobody has a clear understanding of the tools that will be needed. We have hunches.''
John Dodge is executive editor of BioIT World, a monthly IDG publication about technology in the life sciences. He can be reached at johndodge@bio-itworld.com.
This story ran on page C1 of the Boston Globe on 2/24/2003. |