Human Genome Research
- >> David Shalloway
I'm David Shalloway from the Section of Biochemistry and Molecular and Cell Biology and I want to welcome you to the Second Annual Ef Racker Lecture in Biology and Medicine.
This lectureship was established last year to commemorate the contributions made by Ef Racker to biology in Carnell during his long stay here up to his death two years ago.
Ef was the Albert Einstein Professor of Biochemistry and played an important role in the development of the Section of Biochemistry and Molecular and Cell Biology as Chairman in the late '60s and his research following that time.
His studies of oxidative phosphorylation are probably the best known but he studied a wide range of catalytic proteins always following his maxim, "Don't waste clean thoughts on dirty enzymes. " For many of us, Ef was best remembered -- is best remembered for his incredible energy and the excitement he brought to discussions of science.
The spirit he brought in some ways is summarized by this quotation from him that I'd like to read.
"I'm not a white knight riding on a high horse.
I shall not even know whether I'm riding a horse or an ass until, as the Chinese say, the dust settles.
I also do not know why we search for truth and I like to probe into the biological origins of this drive.
And I believe that it is among the best gifts nature has given us. " After the love for his wife Frances, daughter Anne, children and science, Ef's love was painting.
A number of his paintings adorn the Biotechnology Building where we work, and some have been exhibited in the lobby.
Although the paintings in the lobby are not for sale, other paintings by Ef are, and all proceeds from the sale of his paintings are used to support the Racker Lecture Fund, and anyone who might be interested in purchasing a painting or otherwise making a donation to the Fund can find information regarding this on the front of one the easels out front.
We're very pleased to have as our speaker tonight, Dr. Sydney Brenner, a pioneer in the study of molecular biology and development.
Dr. Brenner will be giving a second talk of a more technical nature tomorrow at 4:00 p.m. Different literature states different locations for that talk.
That talk tomorrow will be in the Biotechnology Building, Main Auditorium on the ground floor.
So that will be in the Biotechnology Building tomorrow at 4:00 p.m. Tonight, Dr.
Brenner will be introduced by Professor Jeffrey Roberts from the Section of Biochemistry and Molecular and Cell Biology.
[ Footsteps ] >> Thank you, David.
Sydney Brenner is one of the founders of molecular biology.
He's best known, and has been recognized formally in other ways for the work he did in the '60s in understanding how DNA works actually; how proteins are encoded in DNA; the nature of the code; and the signals that determine proteins; and also for the discovery of messenger RNA.
He's also known now largely as the person who initiated the study of modern molecular studies of development using the small worm Caenorhabditis and it's really due to his choices, this organism, that hundreds of people are now studying the worm as a model for development.
A good number of these people, of course, are doing work on the genome of Caenorhabditis which is one of the major models for the Genome Project which I think Sydney's going to talk about.
Besides his formal positions, and he was Director of the MRC Laboratory for many years where I was a visitor for a couple of years as a Post-Doctoral period, his major characteristic is that he was the sort of leader and center of any conversation that went on.
So that generally there was a crowd of people and Sydney was in the center and Sydney was doing the talking and everybody else was listening, which is what we ought to do right now.
So, Sydney Brenner.
[ Audience Applause ] [ Background Sounds ] >> Well, may I begin by thanking you for your invitation to return to Cornell after an absence of more than 20 years to deliver a lecture in honor of Efraim Racker, which I'm most pleased to do.
I was told that this lecture is for the intelligent layman and I can see that there are a lot of university professors here, so I will address myself to them.
[Background sounds] I hope that you will be able to follow what is, I think, one of the most exciting periods in biology.
Now my talk is on the Human Genome Project and, of course, many people come to me and say, "Are you working on the Human Genome Project? " And I always say, "No, I am working on the human genome.
See, the human genome is science and the Human Genome Project is a mixture of politics and psychiatry. " [Audience laughter] So, as you know there is a project to do this and what I'm going to tell you tonight is a very personal view of this and also what contribution I'm trying to make on my own account.
So, I will try then to put this into a rather different perspective from the ones that you've, no doubt, heard many times over; namely, what great benefit we will bring by discovering the genetic basis of human disease.
I've always felt that we're always looking for bad genes, and what we should do is do what I call "The Uncle Fred Genome Project. " Of course, everybody's family has an Uncle Fred.
Who is Uncle Fred?
Uncle Fred smoked 60 cigarettes a day from the time he was five years old, [audience laughter] he drank two bottles of whiskey every day, he had four wives and innumerable girlfriends, and he was killed on his way to ski jumping at the age of 92.
He got run over by a motor car.
[Audience laughter] So, Uncle Fred had good genes, and I think we should go and find out what are the Uncle Fred genes.
[Audience laughter] So -- but I think the perspective you should view this from is the following.
There is only one biological science and that science is genetics.
Everything else is subsidiary.
[Audience laughter] Why?
Why do I say this?
Because genetics tackles the central biologic, the central question of biology.
Biological systems are the only complex natural systems that carry within them an internal description of themselves.
That is, a biological system is described, is specified, is programmed as we sometimes say, by the genes that are within it, and what is passed on from organism to organism is, in fact, the genetic description.
And the job of the biologist, the most fascinating job of the biologist, is, as everybody says, to understand how this works.
But let's put it in a more definite thing.
It is to try and learn to compute organisms from their genome descriptions.
They can do it, so why shouldn't we be able to do it?
That is the nature of the understanding of biology.
And, of course, it's a very deep subject because we can only ask three kinds of questions in biology.
How does it work?
That's physiology, and, of course, the genes set up the physiology.
How does it get built?
That's development or embryology and, of course, that's one of the most important thing that the genes do.
And then, the deepest question of all, which is, how did it get that way?
How did systems of such complexity actually evolve?
So I think that these are the most profound scientific questions we can answer.
And the good news is it's going to take a long, long time to find out.
So, might as well start now.
Now, when you look back at genetics, what you will find is that a geneticist in the old days -- see, I like to say that this whole period is divided into two periods.
One is called B.C. That stands for Before Cloning.
And the other's called A.D., and that stands for After DNA, you see.
[Audience laughter] So, in about 25 B.C., when all of this work was burgeoning -- the '50s and so on -- then many geneticists, and for all time, geneticists could only assert the existence of a normal gene in an organism if, only if, they could find a mutant toleo [assumed spelling].
First Mendel could only say there was a gene for tallness in his plants when he found dwarf mutants.
That is, organisms showing a lack of tallness.
So it was only by the deficiency or of the modification of some property that you observed about an organism that allowed you to deduce there had to be a factor or a gene for the character that was no longer present.
And that is why all classical experimental genetics began and still begins with a mutant front because there was only way of working which is find as many mutants as you can and study them as deeply as you can.
And, of course, geneticists did one other thing to classify this genetic material of any organism.
They crossed these organisms and they saw whether the genes complemented or not.
Of course, geneticists also made maps.
But, I want to say that from the strictly purest point of view, the maps are irrelevant.
They just happen to be a useful way of indexing the mutants.
They don't tell you anything about how the genetic program is and, of course, I may well be wrong.
We may find that there are important things in the arrangement of genes which has to be -- but to a first approximation.
Maps are only useful if you wish to operate with organisms in the real world.
That is, they're only useful if you want to breed animals or plants, or if you want to find out about genetic diseases in man.
Otherwise, it just happens to be a mouse where scientists can communicate with each other about their index of mutants or genes in a given organism.
Now, of course, geneticists in classical experimental genetics were bound to buy experiments that depended on breeding animals and, of course, this is not something you can do with everything.
For some organisms like rhinoceroses, it's rather inconvenient to try and keep -- tend to the seven rhinoceroses in the laboratory.
For others, it will take too long because the life cycle is too long, and you will have forgotten by the time you got the result what experiment you actually set up.
And, of course, for organisms like man, it's actually illegal to do [audience laughter] this.
And, of course, man is the organism we are most interested in.
Now, in the mid-1970s, two techniques were developed, and these techniques can now revolutionize what we can do in this field because they liberate us from the tyranny of breeding cycles.
And the two techniques are the cloning of DNA; that is the fragmentation of genomes -- like getting all the pieces, putting these pieces into other pieces of DNA, and then propagating these in micro-organisms usually, so that one can have a library of DNA.
Well, it's not a library, it's a heap.
It's more like my desk than that of many other colleagues.
And, of course, so what you really want is an ordinate library of DNA.
And the other most important is the characterization of these pieces of DNA directly by determining their chemical structure; that is, their sequence.
So, cloning and sequencing.
Now I give you access to the genomes of everything.
In fact, you can make a good argument that you could even do it for an organism that's extinct if you can get enough DNA.
And that, of course, is the theme of a recent movie, which is actually wrong because the best place to get dinosaur genes I believe is in living alligators.
They're easier to get.
You don't have to buy expensive amber and so on and so you don't have to patch it with frog DNA either which was a mistake.
Anyway. [Audience laughter] Ah.
So, in principal, we could take the entire genome, sequence it from one end to the other, and in principal, we could do this.
But in practice, there's an argument -- of course, there's the obvious argument that it's actually -- is impractical to do now.
And I won't say -- I make an argument that, in fact, it's not worthwhile doing.
And that's the argument I'm going to make tonight.
Now, why is it so difficult?
The human genome is enormous.
It is three times 10 to the nine bases pairs -- three billion base pairs.
That is, if we had a machine, or machines, that could give us a million base pairs of sequence every day, it would take us 20 years to do the human genome.
Of course, the intelligent laymen amongst you have calculated that it should only take you 10 years.
But, of course, I'll point out that we would like to publish the result [audience laughter] and the referees won't let us, unless we sequence both chains of the DNA.
So we'll have to do six times 10 to the nine bases.
So it's impractical to do.
The technology is not really there.
And in order to explain why I think it's not worthwhile doing I'm going to give you a little course in the new genetics.
It's very simple.
It's all on one piece of paper.
You don't have to -- I notice now that there's an ever increasing supply of extremely thick textbooks.
A thing, you know, the molecular biology of the cell, the gene, you know, and so on.
And these usually either break your arm or if you sleep with them in bed they damage your face, [audience laughter] so I strongly recommend that -- here is the new textbook called Genetics.
[Audience laughter] Okay.
[Noise] I'll throw it in.
It's the whole of biology there.
What you do in this thing is we classify organisms into three classes.
They are viruses, bacteria, and this organism called ratman.
[Audience laughter] I'll explain it [inaudible].
Rat and -- we can measure the amount of DNA in each of these, so you'll see I've chosen these numbers just to make it easier.
Viruses are the order of 4,000 base pairs.
That's four to the six.
Bacteria, the order of four million base pairs.
That's four to the eleven.
And ratman is of the order of four billion base pairs.
That's four to the 16.
Okay? So our scale covers one million fold.
We go from these simple viruses to these complex metazoans covering this range.
Now this column is quite interesting because this column invents a new word called "equipped, " and "equipped " is a unit of sequence information.
So it makes sense, although you may not agree, to say that viruses have six quits of sequence information.
By this I mean, clearly, that because they have four to the six nucleotides, on the average each and every hexanucleotide will be represented in this gene now, right?
And that that so-called unique sequence is 11 in bacteria and at 16 in ratman.
Now, if you are going to try to line up pieces of DNA, that is the minimum information you must get to prove that two pieces of DNA overlap each other.
Or you must get the equivalent of that.
And so, underneath here I've put the conversion into bits which you multiply by two.
So it says you cannot overlap DNA from a human genome unless you have at least 32 bits of information.
And where you get those pieces of information, for example, if you run a gel, all you have to know is how many lines can I resolve in the gel?
If it's 32, you get five bits of information.
If it's 256, you get eight bits of information.
And if it's my hotel room number 512, you get nine bits of information from that.
Okay? Now, this is a very interesting column because this column defines an element called an NMBG, and an NMBG is a Naïve Molecular Biologist's Gene.
[Audience laughter] See?
Naïve molecular biologists believe that every protein has exactly 333 amino acids.
[Background conversation] And of course some of them can actually multiply by three and so they come to the conclusion that an NMBG nuclear base, 1,000 base pairs.
It's a very good unit.
Now, let me now to show you what a remarkable set of things this is.
In the old day, were I to ask the question, "How many genes has this virus got? " I'd have had to make mute mm-hmm and I've had find there were four complementation groups or five complementation groups, then I could say, "It takes five or six or whatever it is just to make the virus.
These days I just do a chemical analysis, divide it by a thousand, the base pairs by a thousand, might tell you the number of genes, okay?
[Background voices] And I'll go over here -- I'll tell you how many genes E. coli's got.
It's got 4,000.
Of course, I go over here and I say, well, fratman has got four million.
And, of course, so does this thing here -- I'll show you how many genes we know these things have.
Of course, from viruses we know this because we sequence them entirely and four to six.
And bacteria are likely to have about 4,000 because already in E. coli you can count more than 3,000 from the whole of biochemistry genetics, etc. So up to here every organism is full of NMBGs, mm-hmm, and it's a fair approximation to say that if you measure the DNA content you can get a good estimate.
But now, how good is this estimate of the number of genes in man?
Ah, let me just say that it takes 40 years to characterize an NMBG.
You're saying you have one student that purifies the protein, publishes a paper.
Another student does the sedimentation constant of the protein, publishes a paper, and so on.
Genes and their product take a lifetime, each one.
If this is true, then just by this analysis we can predict there will be four million professorships of biochemistry [audience laughter] in the human genome.
That's good news.
But unfortunately, it's not true.
And it's not true because it can be shown that if there were four million indispensible functions in men -- in man, this lecture should not have taken place because we should have all been dead a long time ago.
[Background voice] And, in fact, Muller calculated the number of genes in man in the '30s, more than 50 years ago, and he worked out from this argument of genetic load that the number of genes in man was 30,000.
Now what I find is people don't like that because, if that is true, they're only eight times more complicated than E. coli and they consider that insulting.
[Audience laughter] So in the interests of amity, I'm prepared to raise that to 100,000.
10 to the five sounds better than 30,000.
So we will take 10 to the five.
I actually think it's much less than this.
10 to the five Gs, 100,000 genes in man.
Now, if that is true, then the average size of a gene is 40 kb, 40,000 base pairs.
You just get that by taking this number and dividing it by that number.
But the NMBG component of this is the same, whether you -- I mean, all proteins -- eukaryotic proteins are not 40 times bigger than prokaryotic proteins.
So we come to the conclusion that most of the DNA in the human genome is junk.
And I want to point out to you that there are two kinds of rubbish.
One is junk, and junk is kept.
And the other's garbage, and garbage is thrown away.
[Audience laughter] So it is not garbage DNA because if it was it wouldn't be there.
It's junk DNA.
Now, of course, you have to get used to the side here that most of the DNA isn't doing anything.
It's completely useless.
[Noise] All right?
It isn't doing anything bad either.
But, -- and I don't -- we can go -- perhaps there will be questions about this at the end.
We can discuss it.
But what this teaches us here is the following.
That, by saying that the essential information -- that is, the information to specify the organism -- is contained in a small part of the genome of only a few percent of the total, means at the very least that we should not sequence the junk now -- I mean, we'll sequence the other stuff first, right?
And that means we have to take the following attitude in this game; that what we do now is what will give us the most information, rather than waste our time on junk.
So, I've always answered this question.
There's always a student in the audience who says, "Aren't you worried if you don't sequence the junk you will miss something? " And my answer is very clear.
Since I strongly believe in leaving a lot of work for our successors, I do not believe in trying to do everything in this generation, my argument is I'm not worried at all.
But you might well be.
[Audience laughter] So I propose to leave the junk to the next generation.
[Audience laughter] And I'm sure all the intelligent laymen amongst us will agree that it's a very sensible way if we have to do it.
The correct attitude to this, as indeed it is true to many problems in biology, is to treat them like income tax.
Namely, it is criminal to evade them but there are legal means of tax avoidance.
[Audience laughter] And there are legal means of sequence avoidance.
Right? In fact, I have often said that the only way we will ever systematically sequence the human genome is, in fact, to combine it with that other great issue in biology; namely, scientific fraud.
And what we should do is put a lab in a very unpleasant place like Chicago [audience laughter] and anybody found guilty of doing it should be sentenced to go and sequence 24 megabases of DNA.
[Audience laughter] By the way, another calculation that I've done shows that had all the committees in the world and all the scientists gathered there and if all the time spent in discussing the human genome, had they each run a few gels instead of going to this committee, we might well be about halfway through the human genome.
[Audience laughter] Okay.
So, now I want to turn to a subject -- I mean, how can we deal with this?
Well, of course, as you know, people are setting up factories to do the sequencing and, in fact, I think it's a total waste of time at this.
There are ways of doing this, one of which is to sequence cDNAs.
But, of course, that's the expressed part of the genome.
Still it is a massive amount of work and you will miss all the information about control regionals.
So the question I ask myself -- and it was very easy to ask it because I knew the answer in advance -- is, isn't there a way, isn't there an organism, somewhere, that may not have accumulated all this junk?
And, of course, the answer is yes, there are organisms.
And also, I wanted a vertebrate; that is, a true model for man.
In 1968 -- that's a long time ago -- a man called Hinegardner measured and published the DNA contents of fish.
Over 200 species of fish were done.
He used very simple techniques.
He counted red blood cells in a hemocytometer.
Fish have nucleated red blood cells.
Counted the red blood cells.
Did chemical estimates of the amount of DNA, and came up with very accurate estimates of the amount per cell.
That's the diploid amount.
And, of course, many of these diploid estimates were checked by doing the same thing on sperm where he got half the amount.
And his numbers have stood the test of time.
And he noticed, or as indeed the thing that fish DNA content covered a vast range, but that one particular group of fish, the tetraodontidae, which includes the pufferfishes, had very small contents of DNA.
The haploid content of man is -- take anomaly as three picograms of DNA per haploid genome -- these fish were of the order of point four picograms of an order of magnitude less.
Now, when you consider that, then -- and if we hadn't had all this early discussion, and I put it to you that they were fish that had eight times less DNA, the natural thing would be to say, "Well, that's just about what they deserve. " [Audience laughter] Yeah?
And eight figures for the relation between me and a fish.
I must be at least eight times more complex than a fish.
So the question that we asked in my lab is, could we do a set of experiments in which we could estimate the total number of genes in the fish and compare them with the total number of genes in man?
Because if these two numbers were the same, what?
Then we would truly have a compact genome.
What -- and I want to now introduce you to a new technology which is very simple.
It's called "statistical genomics, " right?
So what you do is this, and what this was -- what is done, and this work was initiated by Greg Elder and Dick Stanford who work in my lab in Cambridge and we decided to do this.
This isn't, I'm pleased to say, funded by anybody.
It's funded on certain proceeds on certain activities which we needn't go into here, all right?
[Turning pages] And what you do is you take a large number of random fragments of a genome and you sequence them.
So we took 600 fragments, and we've got 130,000 base pairs of sequence, and proceeded to analyze these in a computer.
[Noise] All right?
So, I'll just go through this very quickly.
And so what we found, [noise] and I usually do this so people can enjoy my pointer, you see.
[Audience laughter] Very good.
[Audience laughter] So, what we did is just put these through a computer and we found that they were fragments that had repeating sequences on them.
In fact, this sequence here with 118 base pairs, we found 13 recurrences of it and basically, we could find that all of this accounted for about 2-1/2 kb of sequence.
You just add up the total amount of sequence within this.
And here are all kinds of repeats and over here are what are called "microsatellites " which go down to dinucleotide repeats.
Here are the ribosomal fractions.
And you can add up the total repetitive DNA.
It's 9.7 kb.
7.4% of this DNA is repetitive.
90% is unique.
That's good news already.
This, if you like, is a little bit of a cot-curve done in a computer.
We then took the rest of it and put that through -- translated the sequences [background sounds] into protein, which you can do according to the rules of the genetic code [noise] and [noise] searched a database.
And here what we found were 10 genes -- 10, nine genes, many of which are old friends, you see.
Well, they have to be because they're known.
Fibroblast growth factor of a center -- we found a fragment which showed us that we had a coding sequence and we had an intra.
There are other fragments here in which we had -- with this one, this was a three prime untranslated region because we happened to hit the end of this gene, okay?
So we can add up all of this.
It's about a kilo base of coding sequence, known coding sequence.
And we can go through the calculation and we found that we can account for .8% of our sequence as non-coding sequence, all right?
Now what we want to do is discover what that is in man, all right?
And this is known as the ratman database, right?
So these are genes, fully sequenced genes, that have either been found in man or in one of the other organisms that has been extensively studied and, therefore, mammals, likely to be found in man.
And the ratman database contains about 25,000 genes -- 26,000 genes, of which about 1800 -- 2700 genes -- about 1800 have been primarily found in man and the rest either in the cow, the rat, or the mouse.
I would have called it mouseman but that doesn't sound as nice.
And then I discovered that in Chinese language a mouse is called -- is a -- really a little rat.
And in Japan, Japanese, and they make notice the tree between them the same word.
So, that's why we called it a "ratman. " Okay? And ratman, then, the ratman database tells us that there are in this database three million base pairs of sequence.
There are a million of codons, three million base pairs of sequence.
Therefore, if you did this experiment in man and you took a random sample of the three billion base pairs, okay, you would have only got .1% of that sample as non-coding sequence.
If these have the same number of genes, then the fish with .8% would be one-eighth the size of the human genome which we can compute here as 319 megabases, 400 megabases.
That's computed from the information content.
Now what we will do is we will check out how does that correspond to the physical size?
And, of course, we know from Hinegardner's measurements that that corresponded to 390 megabases of .39 picograms.
And we decide to do one more experiment just to check it.
[Noise] So we made a library [noise] here and we decided to take a whole lot of single gene probes and find out how much DNA do we have to look through in this library in order to find a single gene recurrence.
And what we discovered is -- I won't bother you with a picture.
It's somewhere here.
But, basically, we had to look on the average through something like 23,000 lambda clones of average size 16 kb, and that worked out to be 385 megabases.
So here we have a gene, here we have an organism with a genome, have the same number of genes of man -- that is, all the basic vertebrate genes because fish have lymphocytes, they have immune systems, have our body plan, and in which the genome is now very compact, right.
If I could have the lights dimmed [noise] I will just show you what is the -- what is the consequences of this.
Oh, dear. What have I done with that?
[ Silence ] The slide changer.
Did I leave it some [noise] -- Perhaps you can do it upstairs.
Could you change -- put on the slides, please.
Well, there's our fish.
>> It's the fugu.
>> It's the fugu?
[Audience laughter] That's the puffer fish of Japan.
That's of a great, a great delicacy, very expensive.
And I chose this one because there are reliable sources of it, and as I can always go back and get more of these.
They can't be bred in the laboratories.
Can't be bred in nature.
They can be cultivated but not bred.
Okay. [Noise] Next slide.
So here I want to show you -- taking three organisms and show you what this all means.
Below we have C. elegans.
The size of the genome is a hundred megabases.
The black part is the estimated coding sequence content, and as it's estimated that C. elegans will have about 15,000 genes and the average gene size we know, and so the black there says what the [noise] total coding sequence content will be.
And it's of the order, as you can see, of about a fifth of the total amount of DNA.
Now when we go up to the fugu, we estimate from various considerations which I'll mention, that we have four times the number of genes of C. elegans in a genome, four times the size.
That is, our gene density should be exactly the same.
That's effectively what the argument on which this is based, all right?
And I'll show you what I mean by that in a moment.
And, of course, in man, the coding sequence remains the same, but you can see what you have to pay in tax of the things that are non-coding sequence.
Or put this in another way, fugu gives you a big discount on the human genome sequencing -- 87% discount.
So, it pays then, we argue, to use this compact genome to tackle this question.
May I have the next slide, please?
Well, so what we have done is we -- Dick Sanford has fully sequenced, and others have done this, several genes for which we know the genome structure in man.
So here's a gene called phenylalanine hydroxylase -- we have cloned the same gene in fugu and we have looked at the sequence of the exons.
Of course they are the same, the same proteins, and the exons are the same then.
But look at the difference in the size of the entrance.
Okay? These intervening sequences are enormous in human; and look how small they are in the fugu.
Not all are small but you have a large number...
[ Silence ]
- >> Dr. Brenner
The eighth fold we would expect if we were uniformly meant to compress the human genome.
Next slide shows you a similar case, perhaps not so extreme, but where there is a considerable reduction in the size of the gene because very large introns are very small in this organism.
Okay? And, of course, we would want to predict that the sizes of the genes are the distance between the gene is reduced, you need a lot of work to get the gene density so I've only anecdotal the results which come from sequencing a few lengths of DNA, and one of these is very striking.
We have found one lender which has 3 full genes on it.
That means that's it's a small land does well, I should add 15 kb, so the average size of those genes is 5 kb.
And someone has just done G6PD in the Fugu and the size of that gene is 3-1/2 kb, with a lot of little tiny introns.
Right? We are working on these enormous human genes, like dystrophin, we're sequencing them in the Fugu, and we have now sequenced, Greg Elgar has sequenced the equivalent region that corresponds to 350 kb in man is only 17 kb.
And the Fugu is a reduction of 20 times.
So we think then that we have this compression.
We have no disbursed repetitive DNA.
We have a very compact genome.
Could I have the lights please?
Now, all through this, as one is developing these newer approaches, one always has to fight hard to escape from all the traditional ideas one has in one's head.
And you can see what this means when people, when I give this talk, people say, can you do genetics on Fugu?
See? But I want to tell you that don't have to do genetics in Fugu.
In fact, as far as we're concerned, Fugu could go extinct tomorrow morning.
We've got all the DNA we would ever want to use, or maybe should go extinct from the 24th of December because I'm going to collect some more on the 22nd, so.
What does that mean?
So let me just try and explain what I think will now become the basis of the new genetics.
We can claim.
We can sequence.
We can get over all of these genes.
We need to now have a method of evaluating the sequence.
That in finding the value of the sequence, and I don't mean in monetary terms.
Now, the correct experimental way to do this is to substitute genes from one organism into another.
Right? So let us just give you, if I took myself, and I took my gene for a protein like triose-phosphate isomerase, my coding sequence, and I substituted the coding sequence of e. coli into me, am I then compared the two Sydney Brenners?
The one with his triose-phosphate isomerase, and the other with the e. coli one.
And if I can see no difference between them, then I would say the two sequences have the same value.
And most people would agree.
That's very likely to be the case if I just use the coding sequence.
But, of course, if I put in the e. coli regulations, you'd certainly receive the difference.
One would be lying flat on the ground and just wouldn't work at all.
So if we can do that, and geneticists will realize this is something they've been used to.
It's a recombinant.
That's a funny kind of recombinant.
It's a recombinant between a genome and one gene.
It's, in fact, a rescue recombinance as we used to call it in Fage [phonetic].
It gives us now an objective manner of evaluating it.
Right. Now what does this allow us to do that you could never do in classical genetics?
What you can now do is to analyze evolution.
Suppose I were to assert the only interesting notations that have ever existed of the ones that turned the fish into a mouse.
Naturally, no one would ever give you a grant to do this because you'd need to have one for a few million years, and they're not doing that these days.
But, of course, it's all lying there for us to analyze.
So suppose we took out a gene from the Fugu?
Now when I say a gene, I mean that part of the DNA that lies between the end of the gene on the left-hand side, and the beginning of the gene on the right-hand side of the genome.
I don' t mean coding sequence.
And suppose I were to substitute this, and naturally we don't do that experiment, but we do an analogous one into the fish.
I can ask myself whether this fish gene expresses correctly in the mouse.
That is like the mouse gene.
Or perhaps it only expresses in the fishy part of the mouse?
Okay? That is as you know, as soon as quite a lot of you that's a little fish.
When you were very little, you had gills, all right?
And you were fishy.
And if, as you developed, you can't imagine that grafted onto that there's the whole product of a half a billion years of evolution.
Okay? And that gave a lot of mousy stuff in addition to the fishy stuff.
And all this gene knows is how to be expressed in the fish.
Right? So that then gives you the knowledge of what had to happen during the course of evolution.
Right? Now let us suppose that it works just as well in a mouse as it does in a fish.
And that tells us that the regulation must have happened at a lot higher level.
So it gives us a means of decomposing in an experimental way, right?
The very elaborate changes that must have happened in the course of evolution.
Now, of course, you might ask have you ever done this.
And I would say, is, yes, we have done such experiments.
We work largely with homeobox genes, and they, in fact, work.
The fish gene works exactly in the mouse, but the wonderful thing is, you can sequence both.
And if that's the case, you can find where the common elements are, and those must be the control regions.
Okay? So we have a synthetic method.
We have a method which I call analysis by composition, rather than by decomposition.
Because now what do people do?
They have to take out a piece and see what happens.
We don't take out anything.
We simply look at the two sequences, if they have the same value, then the common parts must be exerting that value.
And that's the difference in the methodology.
So let me just say that, of course, there's a little bit of a sly.
I've given you the impression that I'm moving genes upwards through 600 million years.
Of course, the mom fish is as far away in time from the common parent, as indeed the mouse is, from that same parent.
And, of course, we would not know in the absence of any other evidence what that common parent really was.
Could it be in the fish?
Could it be in the mouse?
Could it have been something in between?
A mesh if you like.
But we have independent evidence from the fossil record that that was much more like a fish then it was like anything else.
Therefore, we can so to speak get a measure on the complexity space.
So although both have come through the same time, the [inaudible] far more, far richer, complexity space then indeed the fish.
Fish complexity space has been, we believe, quite simple.
It's remained almost invariant.
Now, of course, if we can do this, then we can begin to start to tackle some really deep biological questions.
The whole question of how did all of this complexity arise, and how can we get it to go continuously without having to cross impenetrable abysses and so on.
Well I think, then, that that tells you also that there's a third part of the new genetics.
It's not only cloning, and it's not only sequencing, but it's also transgenesis of one form or another moving genes up and down, between organisms, which is, of course, what the technology allows us to do, and which you could not do by ordinary breeding experiments.
That would have been impossible.
Well I actually think then, that we now have a totally different picture, at least I do.
I just see all of this DNA rambling around in existing organisms.
All the molecules connected with each other through time, over a very long period of time.
And I have the capability now of taking pieces of this DNA from one living thing and moving it to another, and then start to do a real rigorous analysis of this complexity space.
And that is what I consider to be the absolutely very exciting part of genome analysis.
But, of course, were I to right a wrong for this, I would never get fun with it.
In fact I've come to the conclusion that even God wouldn't get to braunt [phonetic].
[ Laughing ] And I would say, well, you know.
Well, you know...
interesting experiments, but no one's ever repeated them.
And the second would say, he's done it, he got this all done a long time ago.
What has he done recently?
And the third would say, and to top it all, he went and published everything in an un-refereed journal.
[ Laughing ] Third, I think ...
[ Laughing ] So I think that what we have to do is therefore see this, that this will be the silence of our future.
And so under the bear now of the human genome, this will be pursued because man is the interesting animal that we want to find out if the one end of the spectrum, Fugu will be used as a tool to analyze this.
And if at the same time we can do good for people like discover the causes of genetic diseases and so on.
So be it. Who couldn't ask for anything better.
Well if anybody in the audience would like to speak to me afterwards and sign up for doing a few small sequences, I think that this is the kind of challenge which we will meet over the next few years.
Thank you very much.
[ Applause ] >> Dr. Brenner will take your questions.
>> Did I understand you to say that you can replace the necessary homeobox gene with one that's lethal to the mouse with the fish homeobox?
We haven't found that we do that experiment directly now.
But we have proof that the expression of this is the same.
And effect we have turned the mass element on with a fish elements.
So, those will be the experiments.
That will be done.
I don't know.
It's as interesting to know if you can and if you can't.
All I'm trying to say is, I'm most unwilling to predict any given answer.
But what I can say is we have a means of finding out, and so if the mass has accumulated something, which consists of having furry paws, rather than that scaly fins, then we should find out what that something is.
[ Pause ] >> Human veins have accumulated so much junk, what is the [inaudible]?
>> Why haven't they?
Well let me tell you now.
You see. This is something you believe everything that, you believe in the perfection.
Hm? And see the odd genomes.
Genomes are not made in that way.
Right? They sort of opportunistic structures.
Why? Because you can't go back to the drawing board.
Okay? And the [inaudible] can so, oh, you know.
We made a terrible mess on those [inaudible] got five finger error.
Can't stand it.
[ Laughing ] Let's go back and start again.
You can't do that.
You have to go by layering everything on what there is.
And junk should be looked at in the following.
There are processes in genomes that lead to the accumulation of DNA.
And the processes that lead to its elimination.
The most important process now that we know that that leads to expansion of DNA is transpositions; either through reverse transcription, or through extra DNA or replications.
Okay. Now if you're a single celled organism, then the cost of replicating the DNA bears instantly, immediately on reproductive fitness because it's the one cell.
There's no separate germ line.
So you can count your latent sure that if you had an e. coli, that could synthesize its DNA.
So likely foster in another one.
It will [inaudible].
Therefore, there are selective forces in single celled organisms to streamline the genome.
Such forces do not act with organisms like us where we have a germ line and a very complicated and long developmental pathway.
That is the speed of replication of DNA or the efficiency of it would not affect our reproductive finesses, let us say, the speed at which we could run away when being chased by a big animal.
So if the DNA expands and it does no harm, mainly DNA's don't get inactivated.
Genes don't get inactivated.
The thing will expand.
Twenty-five percent of your DNA consists of one sequence repeated millions of times, the [inaudible] sequence, and disburse throughout your genome.
So you don't have to take it.
It could mean the death of the species in 10 million years from now.
We don't have to worry about that do we?
[ Laughing ] So that, I think, is the questions that you have to do.
You also are forbidden in this game to think about genome's planning their own future.
See, which is ludicrous.
I mean you can imagine this bug in the primitive soup 2 billion years ago, saying, gee, I better not make that a ominOS change in Cytochrome c because it'll prevent me from having ears.
When I grow up to be a chimpanzee.
There's no foresight in this.
On the other hand, if the germ does get to use, it's okay.
It's not going to be turned down.
So some junk might get their use.
Right? It's our mechanism.
Or even if there is, the transposition mechanism is required to make one useful thing, but has, as they gratuatuous outcome making 10 million useless things.
If the tax isn't high that'll be good.
>> So what active level of fish to make it [inaudible]?
Ah, you see, you, it hasn't dumped any interims at all.
It's got all the same number.
They are small.
There's no disbursed repetitive DNA.
So this we believe is not a contracted genome, but an unexpanded one.
And that's clear.
It's better never to get all the stuff then to go and try and proofread a doubt of everything right or wrong.
So we think probably the best interpretation of that, which I'll just throw out is that reverse transcriptors in this one or other organisms of similar kind, is probably not expressed at a time when the germ line is accessible.
Okay? Because genetic studies germ lines.
We don't give a damn what's going on in the summer.
We're only interested in what persists in the germ line DNA.
[ Pause ] >> [Inaudible] Well that's the beauty of having a compact genome.
Because if I have a case, right?
Where I don't have this repetition, and I can show for any given region of DNA, that it works in the same way I can look forward to ignore the repetition.
At least I have grounds for ignoring it.
Because I can evaluate your question.
And the big thing you have to do is, you have to have three values.
Things are good.
Things are bad.
And for many, things that just don't care.
Good, bad, and indifferent.
[ Pause ] >> In the case where they rattan [inaudible], they contain a regulatory element, have you looked in Fugu to see whether [inaudible]?
We've done this in one case of the homeobox gene.
And it contains the element.
In fact that's how we found the element.
People have been looking for it by making deletions.
We just found that by doing it diagonal.
And, of course, we would look for these things in all the same places.
And this applies also to the five product of trial elements as well.
So I have very high hopes that, in fact, by doing this, you can so to speak, you know, divide the mouse by Fugu, and throw the remainder away, basically.
>> [Inaudible] >> Dr. Brenner: Well the byforex [assumed spelling] region of drysolfili is [assumed spelling] exactly the analog of the regions we are looking.
We are looking at in the vertebrates.
- >>Dr. Brenner
Well, you know, it's different, but it's there.
All we're interested, see you ask a much more difficult question, which is one I would love to tackle, as well as these methods can be used to try to trace the invertebrate line let led to the vertebrates.
That's a much tougher question.
But I'd really like to know that.
[ Pause ] >> [Inaudible].
- >> Dr. Brenner
>> [Inaudible] Yes.
>> [Inaudible] >> Big.
>> [Inaudible] Why is it that, and the [inaudible] does not have more introns.
Is it lethal?
No. No. No.
Introns are very old.
And basically they don't change at all.
In the whole of the vertebrae lineage.
What do you mean why?
[ Laughing ] Why do I have to answer your question?
[ Laughing ] So it's a ridiculous question.
And these are not flexible things.
They're quite hard to get out.
In fact, there will be some changes.
Their change is much slower down.
You can only remove an intron cleaned.
>> [Inaudible] Oh no.
Well I don't believe that that happens.
That's very common.
I wouldn't exclude it.
You have to have a transposition element.
Okay? That carries all the recognition to insert it.
And as far as I know, such things exist in yeast.
But as far as I know, invertebrates, it doesn't exist yet.
>> Possibly you can take further questions in private.
I just remind you that Dr.
Brennan will be speaking tomorrow at 4:00 p.m.
in the biotechnology building [inaudible] and thank you once again.
[ Applause ] >> Well I hope it was not too ...
About the speaker
University of Cambridge and Scripps Research Institute
He was born in Germiston, South Africa and received an MSc degree from the University of Witwatersrand in 1947, a MB.B.Ch. from the same university in 1951, and a D.Phil. from Oxford University in 1954.
Dr. Brenner became a member of the Scientific Staff of the Medical Research Council in 1957, joining what later became the Laboratory of Molecular Biology in Cambridge, and was Director of the laboratory from 1979 to 1986. He left the laboratory in 1986 to start the MRC Unit of Molecular Genetics and was its Director until 1991 when it was closed prior to his retirement from the Medical Research Council in September 1992. His laboratory continues with private support in the Department of Medicine in the Cambridge Clinical School.
Dr. Brenner is known for his research in molecular genetics and particularly for his work on the genetic code and on the transfer of information from DNA to proteins. Particularly noteworthy were his use of genetic methods to demonstrate the triplet nature of the genetic code, the existence of nonsense codons and mechanisms of mutagenesis and the discovery of messenger RNA. Subsequently, in the 1960's he was one of the first to turn his interests to the molecular biology of multicellular organisms and his pioneering research established the nematode worm, Caenorhabditis elegans, as a prime model for the genetic and cellular basis of development and behaviour. That work is now being carried on by over 700 researchers around the world.
He presently works with a small group on the small genome of the Japanese Pufferfish.
Among his many honors, Dr. Brenner has been presented the Albert Lasker Medical Research Award, the Royal Medal of the Royal Society, and the Harvey Prize.