Thanks Jim for the insight.
It seems to me your view, if I understand it, does not necessarily contradict the Lander/Venter (surrogates for HGP and Celera) view.
This also seems the position taken by INCY's Whitfield:
fool.com
Whitfield: You know, I've gone on record as saying that the No. 1 issue that's been misunderstood by everyone, the rest of my family, my neighbor across the fence, throughout this whole year is the difference between a gene and gene transcript.... There's this great quote I was just looking at in the Nature collaboration. Let me just read this to you: "There appear to be about 30,000 to 40,000 protein-coding genes in the genome." Notice how they say "protein coding." In other words, each gene can actually code for more than one protein.
TMF: Sometimes 5 to 10, right?
Whitfield: Yes, exactly. So on average, there are about three, maybe a bit more, gene transcripts per gene.
This number has been almost grotesquely misquoted by everyone today because everybody is saying, "Oh, there are all these predictions that there's going to be 100,000 genes, and now it's only 40,000." Whereas, certainly to my knowledge in our case, everybody -- and we -- have been saying there are over 100,000 gene transcripts and that's totally consistent with a number of 35,000 to 40,000 [protein-coding] genes. NB Drosophila's DSCAM is a dramatic example of this one-to-many "gene"-to-protein situation:
ncbi.nlm.nih.gov
(one "gene" that can encode 38,000 theoretical isoforms)
Now, Haseltine, I believe is going a bit farther than this saying that there are a lot more "genes" (loosely defined as sections of the genome with protein encoding capacity) in the genome proper. Mark Bong's quote from the CC suggests there may be something to this, when he refers to transcripts present in HGS's database with no counterpart to a region defined as "gene" in the Lander/Venter genome. The more of those HGS or anyone else can produce as evidence the more we can start to believe that the 30-40K "official" number may indeed be a gross underestimate.
Hope this makes at least some sense, and thanks again for sharing your insights here.
PB |