Thanks again, Seng. R U a genomic scientist, and/or just a CRA investor? Here is DoubleTwist Chrm 22 gene news, quite recent (maked "new" on their site) w/o any date: doubletwist.com
And a nice web demo doubletwist.com
High Throughput Analysis of Chromosome 22 Computational Genomics Group, DoubleTwist, Inc. Summary DoubleTwist, Inc. has re-analyzed all finished genomic sequences from chromosome 22, using our proprietary genomic processing pipeline. This analysis has generated a comprehensive list of genes that are present on this chromosome. Using a multi-dimensional combination of DNA and protein homology-derived methods and gene prediction algorithms, we have predicted with high confidence approximately 1,485 genes and 2,700 alternative spliced forms deriving from these genes. In addition, using less stringent criteria, we estimate that the total number of genes within chromosome 22 may reach 1,900. These values are higher than those reported by previous analysis, but remain consistent with the 1,600-1,900 genes predicted to be within chromosome 22. We have also developed a sophisticated data mining and visualization tool that will empower scientists worldwide in their efforts to rapidly and efficiently identify all genes, not only in chromosome 22, but also in the 3.3 billion base pairs of the human genome.
Introduction Chromosome 22 is the second smallest and one of the most gene-rich human chromosomes. Several diseases are associated with chromosome 22 mutations. Approximately 97% of the long arm of chromosome 22 (22q), which is believed to contain the protein coding regions, has been declared sequenced. This sequence consists of 12 contiguous fragments covering about 33.5 megabases separated by 11 unknown size gaps. DNA for the small gaps (<150 kilobases) was not sequenced due to its instability in common cloning vectors. The sequence was obtained by a consortium of sequencing centers including, The Sanger Center, Cambridge, UK; Department of Molecular Biology, Keio University School of Medicine, Tokyo, Japan; Department of Chemistry and Biochemistry, The University of Oklahoma; and the Genome Sequencing Center, Washington University School of Medicine, St Louis. Dunham et al. reported that during their initial large-scale analysis, the authors identified 545 genes in chromosome 22. The Computational Genomics Group at DoubleTwist, Inc. used a high-throughput multi-step pipeline that includes multiple gene prediction algorithms and multiple homology-based database analyses in order to annotate chromosome 22.
We chose to re-analyze chromosome 22 for two main reasons: first to compare and validate our genomic pipeline approach and second to test and prepare our infrastructure for the annotation of the whole human genome that is predicted to be finished and publicly available in June 2000. Our initial goal is to generate the most comprehensive and complete list of human genes by mid-summer. In addition, we are dedicated to providing the most sophisticated data mining tools to empower scientists with their efforts to rapidly, efficiently, and accurately decipher all genes in human genome.
Results and Discussions It has been our observation that a combination of multiple gene prediction algorithms in conjunction with rigorous alignments between cDNA and protein sequences provides the most effective way to identify genes in the human genome. Thus, alignments between the DoubleTwist Human Gene Index Database and the 12 contig DNA fragments of chromosome 22 revealed 1,235 unique clusters that represent distinct loci. The total number of transcripts that mapped to chromosome 22 was 2,701. These additional 1,389 sequences represent alternative splice forms of the 1,235 transcriptional units identified by the DoubleTwist Human Gene Index. By combining the results of several gene prediction algorithms we identified an additional 400 putative genes that were not contained within the DoubleTwist Human Gene Index. These results suggest the presence of additional genes for which we do not have available EST information. At least 250 of these potential genes were designated as high confidence loci, indicating that they have substantial supporting evidence. Furthermore, each of these putative genes have homologs in human or other species. We therefore conclude, that we have identified 1,485 distinct gene loci with a high degree of confidence.
Similar alignments using the Unigene database produced 1,408 high quality alignments with chromosome 22 sequences. The difference in the number of unique alignments between the two databases could be the result of how these databases were constructed. DoubleTwist's Human Gene Index database consists of consensus sequences derived from at least two sequences (ESTs or mRNA, learn more about the DoubleTwist Human Gene Index). Unigene currently includes a large number of singleton ESTs. Based on our current analysis and past experience, singleton ESTs frequently represent partially processed mRNAs and/or genomic contamination that is present during the process of generating EST libraries. However, at this point we cannot exclude the possibility that some of these singleton ESTs may represent very low expressed genes, yet to be identified.
Finding genes and analyzing genomic sequences in human DNA is a very elaborate and tedious process that requires the generation of much data. Analysis of such complex data is extremely difficult. Therefore, we have developed a highly sophisticated data mining and visualization tool that facilitates analysis and gene discovery. You can view a demo that shows the Genomic Viewer used to validate and accurately annotate all the genes within chromosome 22. It is apparent that such an application is crucial in the process of annotation and curation of the whole human genome.
To find out more about obtaining access to the DoubleTwist Human Genome Database email Info-Sales@DoubleTwist.com or call 1-877-DBLTWIST. |