To: Jim Willie CB who wrote (19957 ) 6/4/2003 2:22:05 PM From: stockman_scott Read Replies (1) | Respond to of 89467 The ‘bioinformatics’ gold rush ___________________________________ The public GenBank holds sequence data on more than seven billion units of DNA, while Celera Genomics claims to have 50 terabytes of data in store, equivalent to 80 000 compact discs. The raw sequence data consist of monotonous strings of four letters - A, T, C and G -that make up the 3 billion or so bases in the human genome. It is impossible to access the data or to make any sense of the sequences without special software. Some software are developed and made freely available in the public domain, but the databases of private companies are provided to paid-up subscribers only. Incyte launched an e-commerce genomics program in March that allows researchers to order sequence data or physical copies of more than 100 000 genes on-line. Subscribers to the company’s genomics database include drug giants such as Pfizer, Bayer and Eli Lilly. Celera's gene notes, similarly, will cost commercial subscribers an estimated $5 to $15 million, and academics, $2000 to $15000 a year. This first wave of the human genome goldrush, BIOINFORMATICS, is a fusion of information technology with biology that promises to turn the raw genomic base-sequence data into knowledge for making even more lucrative new drugs. Bioinformatics is already a $300 million industry expected to grow to $2 billion within 5 years. One of the most basic operations in bio-informatics is searching for similarity or homology between a new sequence and one in the database, which allows researchers to predict the type of protein encoded and its function, thus enabling the sequence to be patented. However, sequence homology is no guarantee of homology in function, as we have seen. With the understanding of protein structure, it is possible to conduct searches for specific inhibitors and activators before carrying out actual biochemical experiments in the laboratory. Only 1% of proteins so far has had their structures determined (by X-ray crystallography). Some bioinformatics companies cater to large users, aiming their products and services at genomics, biotechnology and pharmaceutical companies by creating custom software and offering consulting services. Lion Bioscience, in Heidelberg Germany, has a $100-million contract with Bayer to build and manage a bioinformatics capability across all of Bayer’s divisions. Other firms target small or academic users. Web businesses such as Oakland, California–based Double Twist, and e-Bioinformatics in Pleasanton, California, offer one-stop internet shopping. These on-line companies allow users to access various types of databases and use software to manipulate the data. Large pharmaceutical companies have established entire departments to integrate and service computer software and facilitate database access across departments. Close on the heels of bio-informatics, and possibly part of bio-informatics, is PROTEOMICS. Its focus is on when and where genes are active and on the properties of the proteins the genes encode. It attempts to make sense of the complex relationships between gene and protein and between different proteins (10), and has so far also attracted hundreds of millions in venture capital. According to Mark J. Levin, CEO of Millennium Pharmaceuticals in Cambridge, Mass., large pharmaceutical companies need to identify between 3 and 5 new drug candidates a year in order to grow 10 to 20 percent – the minimum increase shareholders will tolerate. Right now, they are only delivering a half to one and a half a year. Millennium has a relationship with Bayer to deliver 225 pretested "druggable" targets within a few years. Celera is in negotiations with GeneBio, a commercial adjunct of Swiss Institute for Bioinformatics in Geneva to launch a company dedicated to deducing the entire human proteome. As the number of human genes could be as high as 100 000, it is estimated that the number of proteins could well be in the region of 1 million. Up to the mid 1970s, scientists had assumed, wrongly, that one gene codes for one protein. Instead, the relationship between genes and proteins are complicated by many layers of processing and editing starting before the genes are even transcribed. Proteomics has spawned a number of technical innovations, among which is the Gene Chip, developed by Affy-metrix in Santa Clara, California. It consists of glass microarrays coated with cDNAs (complementary DNA) to identify which mRNA species are made (and hence which genes are expressed). One microarray allows researchers to identify more than 60 000 different human mRNAs. The US National Cancer Institute has been examining the mRNAs produced by various types of cancer cells in a Human Tumor Gene Index project involving government and academic laboratories as well as a group of drug companies including Bristol-Myers Squibb, Genetech, Glaxo Wellcome and Merck. So far, more than 50 000 genes have been identified that are active in one or more cancers.i-sis.org.uk