A nice Fools post about genome size of diff spices. Human is not "smart" if u judge from the genome size :) =========================================================== This is what I like about the CRA board and TMF in general. Everyone thinks they are an expert, and we all are on some subject, but we can still exchange ideas in a civil manner and try to learn something from each other!
Well, as my not-so-innocent comment on "junk DNA" ( boards.fool.com spawned some 25 posts, I guess there's enough interest in the subject for me to put a little effort into a response. Actually, in the process I learned a few things myself!
Here's my inflammatory statement again:'Incidentally, for those of you convinced that the so-called "junk DNA" is actually a gold mine waiting to be discovered, he [referring to Sydney Brenner, a famous scientist] also has a pretty definitive argument as to why it really is "junk". But I'm saving that for the next time that discussion comes up.'
My basic point is that I think a lot of CRA investors feel that (and Venter encourages this) the 97% of the genome that is classified as "junk DNA" is comparable in value to biomedical research as the 3% that is present in INCY's and HGSI's databases. Well it isn't. Sorry to let you down. ;) Some fraction of it is clearly valuable, some of it may someday turn out to have value, but the overwhelming bulk really is JUNK. This is not my pet theory, but is pretty well accepted in the field of genomics, especially by bright guys like Venter. It's just not in CRAs interest to say this too loudly. [Besides HGSI's Haseltine say's it often enough!]
What I'm going to say here shouldn't by itself cause you to question the value of CRA as an investment. But the more you understand about your investment the better position you will be in to judge its future prospects.
Anyway, to start off, let's define "junk DNA." Despite sounding like a catch phrase dreamed up by a writer for Time magazine, this is actually a term that is in wide use in the genomics field. It is based on the current belief that approximately 3% of the human genome represents genes that "code" for proteins. Remember that old DNA makes RNA which makes (codes) for proteins idea. The rest of the genome, 97%, doesn't code for proteins. [Actually, this 3% number is a guesstimate, and you hear other numbers quoted sometimes. But give or take a few percent, it's probably fairly accurate.]
What is the remaining 97% thought to be good for?
Well some of it, as JefOfool pointed out ( boards.fool.com ), are the "regulatory elements" that turn the protein-coding genes on and off. These bits of DNA, usually next to the coding genes, are what determine that your eyes produce eye proteins and not liver proteins, and visa versa. They are what turns insulin production on and off in response to your eating a meal. Obviously VERY important. Actually, I've spent 2/3s of my career as a scientist working on these bits of DNA, so I'll be the last to argue that they are not important.
To the best of our current knowledge, the average amount of DNA involved in regulating a particular gene is approximately the same amount of DNA as that coding for the protein. This is a gross generalization, but a gene of 5000 basepairs (remember the human genome is 3 billion basepairs) of DNA might have another 1000, 5000 or maybe more basepairs of DNA that turns it on and off. Although some genes are 1 million basepairs long, their regulatory elements might still only be several thousand basepairs in size. Some tiny genes have regulatory elements bigger than the genes themselves. The bottom line is that, as a crude approximation, these regulatory elements might make up another 3% of the genome. Perhaps it is only 0.5% or as high as 6%, but 3% is close enough for this discussion.
So now we're up to 6% of the genome that is "important." What's left? Well there are other types of regulatory elements that don't control genes but do play important functions (things like telomeres and cetromeres and other stuff I won't bore you with). However, these elements make up a very tiny fraction of the genome (I can't put a number on it, but it is small).
While we are talking about regulatory elements, let's discuss what they might be useful for from a biomedical standpoint. After all, CRA & HGP are providing us 10s of thousands of new regulatory elements to look at (actually they are not easy to recognize in the sequence, but they are there somewhere). There are clearly some applications that this type of information can be used for today. But that is dwarfed by the value of the coding genes themselves. This may not be true in 10 or 25 years, but it is today. One example of work being done in this area is Tularik, a biotech company that recently IPOed. They focus on developing small molecules drugs that specifically affect the control of genes via these regulatory elements. So far the results of this approach haven't reached the clinic, but it is still the early days for them. Along the same lines, my own company recently announced a drug discovered by this same strategy. Gene therapy is another possible application for this information. But as we all know that field still has a LOT of growing pains to go through.
Anyway, that 3% of control elements has some potential value yet to be realized, and obviously we wouldn't have access to it without the sequencing efforts of CRA and HGP.
What's left now? Try this on for size: an amazing 30% of our genome is squandered on "retrotransposed" sequences. This is what happens when a gene makes an RNA, which is then accidentally converted back into DNA and stuck back into the genome in a non-functional way. These sequences can also be remnants of ancient retroviruses that infected our ancestors. One class of these, called Alu elements, is represented in some 300,000 copies in our genome (compared with 50-100,000 real genes). One of these "retrotransposition" events is estimated to occur once in the lifetime on one in every 100 people. As JD said, "?our junk DNA does indeed contain many remnants of our evolution?" It doesn't sound like much, but over millions of years that results in 30% of our genome being this sort of JUNK.
Anything else? Well there are odds and ends like the non-functional "pseudogenes" that JD alluded to (post 15191), but I can't put a number on these (possibly several percent). So let's say that we've accounted for 6% functional and 30+% useless. That leaves a lot for CRA investors to get excited over! However, working in this field I've seen the sequence of most of this remaining "junk DNA" and I can't say I'm that excited. Sounds egotistical, doesn't it?
Here we arrive at an argument that several of you put forth. 1946dodge put it best:
'Just because we don't know what the "junk" is for, doesn't mean it has no purpose. I cannot conceive of anything as complex as life having random or unnecessary information coded into it. That would be absurd. God doesn't put things there that have no purpose. I have never heard Venter say that this stuff is useless random information. A real scientist would say: We have a large part of the genome that we do not understand YET. One of the critical mistakes scientists make is to assume something about which they know nothing about. It is an arrogant and fruitless attitude for any scientist to have. Many discoveries go on languishing unexplored because of this behavior in "experts". NO one is an expert until all the stuff makes sense. That may take a very long time, but mankind will hopefully be here longer."
Actually, 1946doge, this is a common trap both for scientists and non-scientists. Einstein himself refused to believe quantum mechanics basically because it didn't make sense at the "gut instinct" level. As current research is demonstrating, it may not make sense, but it is true. I hate to think about all the Science and Nature papers I missed out on writing, but someone else didn't, because I was so sure I "knew" how things worked. However, you are also falling into the same trap. Just because you can't imagine that functionless things exist, doesn't mean there are not lots of functionless things around. There are. As Spinality said, there are vestigial structures all over the place. Your appendix and tail bone are two trivial examples that are commonly mentioned. Sure, an appendix might have a function we don't know about. But a lot of people die from appendicitis, whereas I've never heard of someone dying from lack of an appendix. Doesn't sound like it should be selected for during evolution to me.
However, more to the point of our current discussion, lets talk about Nature's own evidence that tells us that most of the genome has little or no function. This is where Sydney Brenner (SB) comes in.
karljon wrote: "Here's another way of looking at it. Simple systems tend to have more stream-lined genomes. I know this is a gross generalization but, for the sake of argument, it works. The genome of an amoeba is not as complex as that of the human."
So, in other words (I'm good at putting words in people's mouths!), let's hypothesize that there is a direct correlation between the complexity (or more amusingly, the intelligence) of an organism and its genome size. Obviously this makes intuitive sense. We're bigger, smarter, and more complex than a single-celled amoeba swimming around in its little pond. This is how science works: we hypothesize and then we test our hypothesis.
Here's some actual numbers on the size of various genomes for our little thought experiment (note these are approximations). You'll have to take my word for it that this is not an exercise in data mining!
Species Genome size
Human 3,000,000,000 DNA basepairs (We all new that number!)
Cow 3,651,500,000 (Maybe your dinner was smarter than you thought? But at least it's a mammal like us.)
Chicken 1,200,000,000 (Pretty good for such a DUMB animal!)
Carp 1,700,000,000 (That makes sense. Carp are pretty dumb too.)
Zebrafish 1,900,000,000 (JD's favorite model, which CRA will sequence)
C. elegans 100,000,000 (The first fully sequence animal, an almost microscopic worm)
Fruit fly 180,000,000 (Well, obviously a fly is more complex than a worm with only 1000 cells. But wait, the big surprise is that CRA says the fly has fewer genes than the worm, despite its greater complexity and larger genome. That's weird!)
House fly 900,000,000 (I guess that makes sense. At least it's bigger; probably smarter too. I still don't get that worm thing.)
Rice 400,000,000 (Insects and plants. Yeah, they're both pretty dumb.)
Tomato 655,000,000 (Yawn? Ever seen "Attack of the Killer Tomatoes"?)
Soybean 1,115,000,000 (Hmm?pretty smart plant. Almost a chicken's IQ. Must be all that inbreeding.)
E. coli 4,639,221 (Lives in our guts. Fully sequenced 2 years ago.)
HIV-1 9,750 (Given how nasty it is, I'd have thought it was bigger.)
OK, so what's the big deal? Most of that makes sense doesn't it? Well, here's a little more:
Warty newt 20,600,000,000 (I hate to tell you guys, but a Warty newt has a lot more DNA than you!)
Corn 5,000,000,000 (Even worse, so does corn!)
Paramecium A 8,600,000,000 (The one-celled denizen of pond scum from high school biology.)
Paramecium B 190,000,000 (Smaller? That's more like it! But they look the same to me.)
Onion 18,000,000,000 (The next time someone calls you a vegetable, take it as a complement!)
Salamander 81,300,000,000 (Damm!)
Lungfish 139,000,000,000 (What the hell's a lungfish?)
Fern 160,000,000,000 (I'm starting to get pissed!)
And the winner is:
Amoeba 670,000,000,000 (A mere 200X more DNA than your or I, all in one cell.)
Well, surprising as it sounds, the largest known genome is actually an amoeba!!! Great call karljon!!!
What does this all mean? It means that the size, complexity (and number of genes as I discuss below) has nothing to do with the size of the genome.
Now Sydney Brenner and the Pufferfish (that famous Japanese delicacy).
As I mentioned, Sydney is a living scientific god. JD will be interested to know that the C. elegans worm that biotechs like Exelexis and all those academics perform functional genomics on was developed as a model organism single-handedly by SB. He was already a Nobel candidate prior to this. He must be about 80 years old now, but for the last several years he has been championing the Pufferfish genome project. Why? Well not because he likes sushi, but because Pufferfish have two interesting attributes. One, it's a vertebrate (has a spinal cord) like humans, and two, its genome is only 400 million basepairs long. Which as we now know is really TINY.
Well, besides being a really quick sequence for CRA, what's so interesting about that? Well for one, based on the sequencing that SB's done already, Pufferfish have the same number of genes as human, give or take a few. Not just the same number, the same genes, period. After all, they're vertebrates. So do mice and frogs and all the rest. They have most of the same organs as we do, and thanks to all the work done on Zebrafish (a popular genetic model in academia), we know that most of their fundamental processes are very similar to ours. But yet they get by with only 1/8th the amount of DNA we have! In fact, when SB sequences their DNA, what he finds is that pretty much all that remains are the genes and their regulatory elements. That 6% or so plus some other stuff that we talked about above. Most of the "Junk DNA" is gone. But it's still a pretty happy little fish. [This is why sequencing the Pufferfish is so important in SB's mind. Because its sequence is more diverged from human than mouse, and because all that's left is the important stuff, a comparison between human and Pufferfish can be used to rapidly identify all those "regulatory elements" we were talking about. The mouse, fly, worm, and Zebrafish can't do that.]
Well, draw you own conclusions. But I'm here to tell you that the genomics field thinks this means that most (not all, but most) of our "junk DNA" really is JUNK. We even have theories about why this is so. [Of course, that junk is a source of SNPs, though SNPs within genes are still more valuable than SNPs within junk DNA.]
Again, this is not an argument that sequencing the genome is a waste time. It's not. It's our generations Kitty Hawk or Tranquility Basin. But I do think it's a good idea for an investor to be able sift the strands of truth out of the tales Collins, Venter, Scott and Haseltine spin for us. And these are my 2 cents worth towards that end.
Fushi
P.S. Ken, you can make your witty rejoinder now.
boards.fool.com |