The Human Genome Project
>> Everyone has to try and find a seat because there's a fire marshal and there's a rule that everyone has to be seated.
No one can be standing.
So, as quickly as you can please try to find your seats.
I have one announcement before we begin and that has to do with the usual Meet the Speaker that the graduate students are invited to after the seminar.
Graduate students I guess in biochemistry, and cell and molecular biology.
And I just wanted to reiterate what already was said last night regarding what a pleasure it is to have Jim Watson as the first speaker for the Efraim Racker Lecture Series in Biology and Medicine.
I think Jim is an especially appropriate choice not only because of the obvious impact that he's had on biology and biomedial research, but also because I think that Ef really had a soft spot for him.
I can remember many times in his studio when he would point to a sketch that he made of Jim from a summer long past at Cold Spring Harbor with the legend under the sketch saying, "Jim Watson, a promising graduate student. " [Laughter] And I think this simply reinforced what all of us knew who knew Ef well and that is his remarkable ability not only to predict what was likely to become important biomedical research, but also who the principle players would be.
And in this particular case he certainly was right on target.
Now while protocol requires that I introduce Jim, I feel a little bit self conscious about this since I'm sure virtually everyone knows something of his scientific contributions.
If anyone needs to be reminded these, of course, include the elucidation of the basic structure of DNA for which he, together with Francis Crick and Maurice Wilkins won the Nobel Prize in medicine in 1962 as well as his leadership in directing the Cold Spring Harbor Laboratory, which has certainly become a superb center for basic and biomedical research.
And more recently his strong influence to say the least on the human genome initiative.
And his lecture today will be on that topic, the title being The Human Genome Project.
I really don't know if there's any way that you can top last night's entertaining lecture, although we all are very much looking forward to your talk Jim.
And again on behalf of myself, David Shalloway, Ef's family and everyone who's been involved I'd just like to say what a pleasure it's been to have you as the first speaker for Ef Racker's lecture series.
[ Applause ]
- >> Jim Watson
I think they've turned up the microphone as it sounds okay.
I tend to mumble.
I'll talk about the Human Genome Project, which the NIH component of I directed for about 3-1/2 years until I was fired in early April.
And so, but most of what's happened up to now was-- I had some influence on and so-- and a few of the people who are doing it lent me some slides of their recent work.
So, I'll try and give a progress report where it stands.
But I think I will-- should start with a little history.
There are really two origins of the Human Genome Project.
One was essentially let's get better and better technology for dealing with DNA and as we improve the technology we'll reduce the cost so that we can sequence the whole human genome, which contains three big in base pairs.
So it was perceived as sort of a project by DNA jocks who just wanted to do something bigger and bigger and have fun at the same time.
The second was that with the event of DNA markers human genetics suddenly became possible in the then 1980s.
And it was possible from studying from a map location of a diseased gene to go and clone the gene.
And case in point is the work of Francis Collins at the University of Michigan, where essentially they started with a map location, found DNA corresponding to markers and moved along the chromosome and ended up with the cystic fibrosis gene.
Now, this was a big project involving lots of people.
And it was pretty clear that there are thousands of diseased genes and efforts of that same size wouldn't be possible, yet you wanted the disease genes.
And you wanted to get the genes as fast as possible for really two reasons.
One, genetic diagnosis, you would want to know whether you yourself possibly were going to come down with the disease, which if you could do one look at the DNA you would want the-- to be concerned with whether your children would have the disease and so on.
And so, genetic diagnosis is a very important matter to those families where-- which are known to carry genetic diseases.
And the second reason is that if you actually clone a disease gene you and often from the sequence can guess the protein and know something about the protein function.
And so this will be of help if you want to try therapy [Inaudible] So, case in point, if we hadn't cloned a gene responsible for familial breast cancer, than you might know which receptor or which hormone or which bichemical system is not behaving correctly.
And so you could think much more sensibly about how you might finally develop a drug to work.
So, you want the gene really for immediate practical ends and also for drug design.
So, and there are a lot of diseased genes, which are proving hard to get.
So, that was the sort of NIH initiative.
Now, the first initiative which started was really the high technology and that really came out of the fact that when Sanger and Gilbert came up with their methods for DNA sequencing around 1976-77, the first DNAs which were sequenced were those of small DNA viruses say containing 5000 base pair.
Within three or four years the DNA molecule is roughly ten times that size for a sequence phage lambda and about three years ago a gene responsible for cytomegalovirus, the cytomegalovirus, a virus, a herpes type virus, was almost a 1/4 of a million base pairs per sequence.
So, they were going in sequencing longer and longer DNAs.
Now, as you got toward longer ones you became aware of the costs.
So, ten years ago when we wrote the little book, "Recombinant DNA: A Short Course " I put in that I thought that DNA sequencing at that time was say 10 dollars a base pair.
So, at 10 dollars a base pair and if you have a 1/4 of a million of them, that's 2-1/2 million dollars to the site of a megalovirus sequence.
So, once you have an objective that big, you have to figure where is the money going to come from to do the job?
And so, when people thought well we should go beyond a virus and go to a cell and do a bacteria, then you've got E. coli.
And if you say that's about 4-1/2 million base pairs that could be 45 million dollars if you were doing it.
And once you start talking in terms of millions, that sort of scares people, because if someone gets 45 million some other people aren't getting their R01 grant.
So, big project like that you have to ask, is it really necessary?
So, when the proposal was made to sequence the human genome it was first seriously put forth by the Department of Energy, there were a lot of people opposed and I was opposed.
Now I wasn't opposed because I was against the objective, I just didn't think they were capable of doing the job.
And because essentially they were not representing the world of DNA, it was a proposal from a physics oriented body, which also makes weapons which wanted to do something with the national labs if they weren't building bombs.
So, the proposal was to sequence the human genome partially at Los Alamos and partially at Livermore, but both bomb building labs.
So, I was a bit suspicious.
[Laughter] And also really because if you said well really this is an expensive project, then it ought to be done by people who really you know are good.
Theses labs weren't really strong in DNA.
And if you really looked at-- all these people worked for physicists.
And physicists are very bright when they think about physics.
[Laughter] But my experience about them is when they think about biology they're slightly off base sometimes.
Not entirely, I mean we've had you know someone like Leo Szilard and there have been many really great physicists, but I'll never forget a conversation I had with Carlo Rubbia.
Now he's regarded as a good physicist.
He said he didn't believe in evolution.
You know we didn't have enough time for all these things to occur, so you know he-- well good physicist.
[Laughter] But I would hate for him and you know the other thing about physicists is that they think that you know even a third rate physicist is brighter than the best of biologists.
So you know, so I was-- I was very uncomfortable with the thought that it would be done in the laboratory.
So, I was among those who lobbied to see if we couldn't get the program taken up by NIH for the reason basically of getting-- being able to clone the disease gene.
And the chief obstacle for NIH doing it was essentially the NIH community, that is the people who receive NIH grants except for the human geneticist they thought you know we don't have to do it and its going to cost too much money and you know everyone lives with enough anxiety about getting a grant and they suddenly see this big project.
So, to get it going I worked in two directions: One, to make it respectable we got a-- several you know high level committees of so called you know wise people to look over the project and to say whether it should go forward.
One was set up by congress itself.
I had nothing-- the OTA, Paul Berg was on the report and the report came out to do it and it was just-- it was going to really help push forward biology and medicine to find out all the human genes.
The-- and then we also got a National Academy report, a National Academy Committee; I was a member of that but we were careful to put on the committee people who had been loud mouthing against it, like David Botstein, very intelligent guy.
But you know David was maybe just suspicious for a lot of reasons, but he thought it's you know big science is bad, little science is good.
Well, we all know a lot of little science is crap.
[Laughter] And so you know big or small isn't really the thing.
I mean is it worthwhile.
So-- but it had been caught in this thing you know, being too big is bound to be inefficient and you know is Lee Hood's lab with 100 people efficient?
Well, I'm not going to answer that one.
So, there was a lot of fear of you know megalomaniacs getting a hold of a big project and running out of control.
So, to sort of put it in perspective we got a committee and Bruce Alberts who had written a sell article against big science to head the committee.
And we put on mouth people and tried to put it a good committee and it came out there was no opposition to it at all.
Do it; it looked like the project could be done in maybe 15 years.
It's been roughly 200 million dollars a year and at the end of it we would be able to identify most of the human genes.
So, that was the first way of-- the academy put its stamp of approval.
So, it wasn't you know just proposed by people who didn't know what to do with Los Alamos, which was the way [Inaudible] And the second was to get some money for it.
So, [Inaudible] you get your money from congress.
So, one afternoon in May, David Baltimore and I just toured some congressional offices and talked to congressmen and he said there should be more money for AIDS and I said there should be a little money for the human genome project.
And as a result of that was successful in part because one congressman didn't want DOE to do it, so he denied some money.
Since then he's sort of been against NIH, but anyways, a case he was in favor of.
And so we got 20 million dollars in the budget.
That was in the spring of 87.
The first talks were-- that was about a year and a half after there was all the publicity about the project.
But it sort of peaked in about 86.
A lot of the people and you know everyone under 40 was against it.
And only those who were you know fearing their own deaths were in favor of it.
[Laughter] I mean that's the sort of way of look-- you know the wise men against the old men were in favor of the young guys, so what a boring task and people like Sydney Brenner confused the issue by saying you know it should be done by convict labor because it was [Laughter] I always get-- there's another way which you might think besides convicts and that's student labor.
I mean they're equally [Laughter] cheap and, in fact, that's been tried which is really a failure.
And so the money got in the budget.
Jim Weingarten, who is a human geneticist with an interest at NIH when the money was put in, had to spend it.
And so he asked me to come to Washington and get a-- have a program which would have an increasing budget.
Okay, so there were a number of ways we could proceed, but it was pretty clear that the project was never going to be done by small groups of people at his lab, with two or three people and hundreds of these throughout the country, but that you're going to have to set up some groups.
So, you'd get some economy in this game.
So from the start it was clear we'd have to have some centers, but we never thought of one center because we didn't know who should run it.
And so we set up effectively about 10 places where there could be about 30 people working on the project, with the thought that some would get bigger.
It's as those who really knew how to map and isolate DNA and ran good shows would get bigger and the others would probably drift down with time.
So, we set up centers and then the other thing, another decision, which was made was that it would be very silly to sequence the human genome without at the same time getting data sequence, E. coli genome sequence, [Inaudible] yeast genome sequence, [Inaudible] sequence, [Inaudible] so we tried to make it a program which was just not the human genome, but would-- and the reason for that was not really the [Inaudible comments] world to like us but that, in fact, the [Inaudible] genes, as you get to the human genome we-- the introns get bigger and bigger and you know really knowing the ends of the human genome would be hard and it would be a lot easier if you identified the first of the genes in a simpler organism so you can spot them if they appear later on in evolution.
So, that was the real reason.
It wasn't a matter of choice whether we should do the model organisms but that we had to do them.
And for reasons that were very clear DOE said we will not sequence any model organism.
So, that was a-- because they thought you know-- I don't know if they weren't thinking at all.
[Laughter] So, we-- the National Academy said model organisms and so I went there in the fall of 88, appointed-- got Norton Zinder from the Rockefeller to be head of the Advisory Committee and then with time had to get a study section, a genome study section to pass out the big money.
And one thing you learn when you see a study section in operation is that it's a pretty scary affair.
I mean in 20 minutes they decide whether you're going to get some money.
And the people who decide it are often dimwits.
[Laughter] Now not always, but I mean, so that's why we have so much scare and why doing science is slightly scary.
Because your fate can be decided in these 20 minutes, but two people read your thing and sometimes they just come to the wrong conclusion.
And then you can appeal, and you know eventually if get a good mix you'll get your money.
But if you have a project that you want to get done you want to be sure the study section votes yes, not no.
So, you had this set up here.
Now this sounds likes packing the Supreme Court with your own people, but a really key thing was to try and get a really intelligent group of people to pass on these big applications and we chose Eric Lander from MIT after he'd got a grant award for him, a big one then we put him in charge of the big study section.
So, he's really very sensible and very, very bright.
You know like our President elects he was a doctorate, you know he's an American mathematician.
So, you know, so he's bright.
So, I think you really can't say that the money being passed out is you know in bad hands.
So, we had a sort of sensible committee and good study section and the program has really been going now for about three years.
We officially said it started two years ago, because the money had to go and even when you give the money out some of it will go to renovate a lab.
So, you got to sort of expect a big sum of money, you're not going to accomplish anything in the first year.
You just build up a group and then if nothing comes up by the third year you-- something you know, you've made the wrong decision.
So, we can go through some slides just to get an idea of where we stand.
[ Inaudible comments ] [ Silence ] >> Can we have the lights soft?
There's just a computer graphic of DNA.
That's-- okay well this is you know the main point which you could say came out that you find a single DNA molecule for a chromosome, whether it's viral or go all the way up to a human chromosome.
The difference being that a human chromosome on the average is about 120 million base pair.
So, they are, indeed, long, long [Inaudible] And so-- okay and then this what we expect to find.
You know, oh sorry.
[Laughter] If we can, if we can turn that, it got in the wrong way, apologize.
[ Silence ] >> Okay, so you can you know see the split genes.
The conventional guess, conventional wisdom is they'll be 75,000 to 100,000 or such things that we'll find in humans and they will be much more intron the next time, because there are going to be big gaps and a very large number of exons that produce together in the genes, and then regulatory elements all over the place.
So, my guess is the number of human genes will be about 250,000 because we always underestimate I think a couple.
Okay well that's-- a genome there is one from a downs trial, which has three 21s.
So, just for those who don't know it the biggest chromosomes were called-- the biggest one supposedly was called chromosome 21-- chromosome 1.
The smallest is-- was chromosome supposedly 22, but 21 now seems a little smaller than 22.
And the average size chromosome, the biggest are over 200 million base pairs.
The smallest may be somewhere around 15, a factor of 4 or 5 [Inaudible] So, so you've got to go from a chromosome finally down to a DNA sequence and you go through different stages in the project.
First you want to get a genetic map and then you want to get the pieces of DNA which correspond to all of your genetic markers.
So, you want a lot of genetic, genetic markers.
And it was toward getting a large number of genetic markers was the first aim of the human genome project because after David Botstein had his idea around 1980, the project-- David I think makes a mistake.
He just didn't get it himself but he got a company that he was associated with collaborative research that produced the markers, with the thought that these markers would be commercially useful and diagnostic in doing genetic diagnosis.
They spent about 10 million dollars to get their first genetic map.
And then basically ran out of money and many, many more markers were needed.
So, and in the process they applied to NIH to a study section to get some money to help the company and the study section turned it down.
At that time getting markers was too dull to warrant a research project and even though it was to a company.
So, and then the Howard Hughes Medical Institute was spending money getting markers at Utah with Ray White.
But, they didn't want to spend 200 million dollars a year for the project and Hughes did not want to take on the human genome project.
So, it was really left for the Federal purse to take it up.
So, okay this just shows you the way money went up.
DOE has their program.
NIH has their program.
There has been very close cooperation so they haven't been fighting and the money which has gone to NIH seemingly is twice that which goes to DOE, but DOE a lot of their salaries aren't included.
In fiscal 93, the one which congress has just voted, the budget didn't go up.
It went up by-- it didn't go up at all.
In fact, it went up just slightly less than inflation.
So, the genome money seems to have plateaued unless there's a new support for the public.
Okay, there is a mistake on that slide.
Under yeast that should be 14 million base pairs but that gives you that so called comparative size.
This also may not be 165.
A lot of that may be for the just repetitive DNA.
So, the [Inaudible] of genome may be a little bigger than the C elements one, because we're talking about 100 million base pairs.
Okay, the first objective was the linked map, the genetic linkage map and when we started the program, maps were like on top and sometimes your gene was next to a marker, but that gene could have been in the middle of a region where there are no nearby markers.
So, we wanted to get a lot of markers which were highly polymorphic, which meant that if you looked at the two parents, they would have a different set of markers and there's the so called CA repeats, which fit this criteria.
So, it's now possible to fairly routinely get a lot of markers and the goal is to get a marker every, at worst, every million base pairs.
And we started out by saying we want to get a marker for every 10 million base pairs.
And if you put it in perspective, if you're trying to find your gene, and you want to get-- clone it, you ought to sort of be down to about a million or two base pairs.
If you've got a 10 million its just so many genes that it's not a very good odd.
So, you want a marker that brings you to about within a million base pairs of DNA.
So, that's-- and in say the breast cancer situation, they're getting markers about that often.
So, the bottom is the goal.
And it's been speeded up tremendously.
We had about 10 places in the United States during-- the French decided to create a super lab where they have 200 people doing mapping and not surprisingly they have succeeded.
And using private money they never got the French government to do it.
They somehow subverted muscular dystrophy money for it.
So, there was a telethon with Jerry Lewis and it all went to the genome.
So, it's wonderful.
So, they-- there are a lot of French markers and there are a lot of American markers.
So, the markers are almost here and there's an example of getting the markers is now a good set of markers for chromosome 14.
And about two weeks ago in "Science " magazine there was an announcement that-- well it's actually several groups now have cloned one gene, which is probably responsible for about 70 percent of familial Alzheimer's, two are spotted on chromosome 14.
About three years ago with an incomplete set of markers had said it was on chromosome 21.
So, the same families which they have not very convincingly used to prove a 21 location are now used to absolutely prove a 14 location.
So, now it really is a marker so that if you've got good families and the disease is due to a change at a single spot, you ought to be able to map it.
Okay, this is just a case.
You can see this is chromosome 1.
The markers are really appearing.
And in "Science " magazine about a month ago the results of the first American efforts and you can begin to see we're beginning to get markers for all of the chromosomes.
The French published their paper about two weeks ago in [Inaudible] So markers are out there and the most important thing about the markers is one they are highly paramorphic and second they will be available to everyone.
They don't belong to a company.
They don't belong to someone who doesn't want to share its results.
So, that's a key thing.
Okay, now the second is to get the DNA, which means a physical method.
The way that has been basically defined is you want to get a series of DNA fragments whose ends overlap with each other so that you can go from one end of a chromosome to another and this was done for say C elements and they use fragments of DNA which could be cloned in the so called cosmic vector of say 40,000 base pairs of DNA and searches into.
And-- but for the human chromosomes, which are so much bigger, really major advance-- what made possible getting the thing done was the development of yeast artificial chromosome vectors in St. Louis and by Maynard Olson at Washington University.
Maynard Olson and his student David Burke and these are called [Inaudible] that's sort of a pretty picture taken.
And what they've been able to do fairly routinely is to get pieces of DNA say around 250,000 to 500,000 base pairs.
Now some of these turn out to be [Inaudible] so it's not as pretty as it seems.
The French claim now they can get pieces of DNA, about a million base pairs in some lengths.
So, if you-- say that if you could actually clone pieces of a million base pairs and your chromosome was only 50 million base pairs in length, you might only need, say 150 such acts to overlap to get the map you need.
And, in fact, now that it's a paper published about a month ago, which much of the data came from the French, but those are really throughout the world.
And so they've got overlapping DNA pieces for virtually all chromosome 21.
And at MIT, David Page actually was a more efficient, more efficiently but as many show a people has got a good map of the Y chromosome.
This just shows you can do it.
And so, one's now really being able to-- if a disease is chromosome 21 you've not only got a large number of markers, but you've got your DNA corresponding to all those markers to search for your disease gene, which may be in that region.
Now, a very favorite gene of the chromosome for the human genetic system has been the X chromosome, because of the fact that its only one copy is present in male that many disease genes express themselves in males preferentially and therefore you have know for a long while they're on the X chromosome.
So, many more disease genes have been identified on the X chromosome than any other and probably by now about 50 percent of the X chromosome is present as overlapping pieces of DNA.
This is true work done largely in St. Louis by David Schlesinger and his group.
And this is just one of the sort of results, sort of bazaar factor that as you go along a chromosome there seems to be a rough jump in the amount of-- the ratio of GC to AT that is GC rich DNA.
For some reason GC rich, seem to be rich in genes particularly at the end of the X chromosome there's just a large number of genes in there.
So, if you're going to-- if you wanted to randomly sequence DNA just from one end to the other, you'd certainly start from the end of the X.
And you're going have a high density gene, particularly where you've got that high, GC rich region.
No one understands why this works.
Okay, well-- so I think our goal was to get all of the mapping done in five years.
And your goal I would have said may take seven years, but now it will be done in five years because of the French building this super lab.
So, really size is best.
Now, if we go to sequencing what we did was three years we said we'd give out grants of between 1 and 2 million dollars a year to people who would sequence either model organisms or a piece of human DNA, but the objective of the work was not to get the answer but to show that you-- the kind of technique which could reduce the cost of sequencing DNA to about 50 cents a base pair.
Because if cost could be at 50 cents a base pair then you could do the whole human genome for 1.5 billion once you-- just in new sequencing component and so the human genome project could come in under cost or at least wouldn't cost anymore than three billion dollars.
So, the-- we went out and said can you do it.
Is there a way in which you can do it cheaply and I want to talk about the E. coli.
So, the E. coli was an obvious one to do because actually half the E.
coli sequence is known already through the work of people who've just been studying all its genes.
So, the E. coli genome would slowly converge without a human genome project at all.
But Fred Latner in Wisconsin, you can see there are two Japanese groups, [Inaudible] Harbor Group, but Latner said that he would do the E. coli one and his secret weapon was to be students at the University of Wisconsin that would work for almost nothing.
And here you can see a group of Wisconsin students and they're busy at work running gels or you can see trying to work out the sequence.
Well there's been a complete flop.
You know I cannot really tell you but among other things the students lose interest.
You know they're not really good dependable workers.
I mean they would be enthusiastic for three months but they're being paid almost nothing and you know they're girlfriend leaves town or you know anything.
[Laughter] So, we do have some sequence but really the thought that you can motivate this type of work onto a student audience doesn't work.
You know, one group can run the [Inaudible] You bring in someone else and they don't work and you know the efficiency really goes down about 10 percent.
So, Fred claims he can get it to work, but I think it's a losing battle, I mean doing it this sort of way.
Now, the alternative way-- I'll show you some of the results.
He's got about a 1/4 of a million base pairs of contiguous sequence and you can see that he's identified about 200 [Inaudible] about 100, more than half of these they know what the genes are.
And you can-- in E.
Coli it's different.
I mean you get [Inaudible] so you get messenger RNA molecules, which code for a number of proteins and so you can see lots of promoters.
You can see it says too many.
And so this is sort of [Inaudible] summary has and there's what it sort of looks like.
I really liked it.
I mean I just wish it would be more efficient, but you can really go along there and you can actually see that all genes don't side with AUG.
Do I actually see what they start with you know and see the other things.
And so I think even with all this inefficiency probably the project will be finished within 4 years from now.
I mean if it doesn't get any good flack and there's money, I don't have any control over it anymore, but you know we went to Wisconsin a year ago and said, "Get efficient or no money. " And I don't know what they're telling him now, but he's got this amount done.
And so, at the same time we gave money to Wally Gilbert to sequence the microplasma of the million base pair, much smaller.
And Wally wanted to be really clever and use direct sequencing and that has taken a while to get started and I don't know whether that will finally work.
>> Mycoplasma's nice.
It's a parasitic cell, but probably it has only one port, the genome size of E. coli, and yet it's a real cell.
So if you picked the mycoplasma and did it with machine sequencing, in which you just readout like this, it could have been done by now.
That is this mycoplasma is only a million base pairs.
Now, this ABI machine sequencing is being done by a consortium of two groups, one at Washington University, Bob Waterston, and another group in Cambridge, England led by John Sulston.
And we gave them money with the thought that the first year they would get 200,000 base pairs each, the second year 400,000 base pairs each, and the third year they could do a million base pairs each.
And by the end of the third year we wanted the cost down somewhere between 1 and 2,000 -- that is heading downward as they got the bigger labs.
And here you can see for the first four cosmids, they were sequencing some cosmids and each one of these like purple pieces is a different gene.
You can see the genes as expected are coded by most of the strands of the DNA, and there are large apparent gaps where there are no genes, and some of the genes can be identified by either you knew them from genetically fell C. elegans, or they were homologous to a known sequence in another organism.
They're almost on schedule, and it seems to be going well, and they both want to move from doing 1 million base pairs to say 5 million base pairs a year, and if they could move up to that level you would be getting 10 million base pairs a year, and so the whole job could be done almost by the turn of the century, if there's the money to spend for it.
[ Background noise ] This was the results of about a month ago of the combined efforts, and each of those dots represents a reading frame identified.
So there looks like there's probably about one gene on the average for something like 7,000 base pairs.
So if you take the results up to now, C. elegans will have about 15,000 different genes.
It may be biased, because they started to sequence a region of DNA where they thought there would be genes, so there just may be some sections of DNA which are gene poor and the number will come down.
On the other hand, some of the gaps between genes may turn out to contain genes that they just have failed to see, because of the exon-intron business.
So it wouldn't be surprising they've missed a lot of genes in the first time around.
So this project is going well enough so that they're now going to create in Cambridge England the lab with the big sequencing lab called the Sanger Lab.
This will be supported by the Welcome Trust, the British Foundation, which is twice as wealthy as the Howard Hughes Medical Institute.
So probably the first super sequencing lab in the world will be created in England where they could do maybe 5 to 10 million base pairs, and they're going to start in C. elegans, start in some human DNA, they'll do some yeasts and they'll probably get involved with the plant Arabidopsis with genome size and roughly equivalent in size to C. elegans.
That is roughly 100 million base pairs.
So I'd say the project is going very well, and what isn't clear now is whether there's money now in the United States to set up a sequencing lab big enough to sort of be equivalent in size to the English one.
The French are also talking about setting up a super sequencing lab.
Now, the technology they're use is no big advance, it is more of scale and intelligence, and in fact the limiting factor is not the machines but just sort of the time to annotate the data and prepare it for publication.
That's the slowest aspect of it now.
So you could say it's the sort of informatics aspect of it, which is probably limiting in cost right now.
The drosophila world, which was indifferent to the human genome project, if I could say many of them being totally antagonistic -- at least among themselves, not to me generally -- now have got very worried that the C. elegans world will have all its genes identified and drosophila will be left behind.
So Harry Rubin at Berkeley is now working with people at the Lawrence Berkeley Laboratory, and hopefully there will be a drosophila equivalent to this project, to the worm project going in about two years.
And we can hope they will be very competitive and see which can finish first.
Now, what you'll have I think will be of immense value to the developmental biologists, because you'll just have the whole sequence, you'll have the genes there, and so when you think your gene, which makes you stupid or something like this in drosophila, you'll have all the candidate genes in front of you and so it'll be a lot easier for the developmental biologist to find those genes.
I should say that identifying the genes where you have all these introns will clearly be speeded up if you have good cDNAs form.
So there are cDNA projects going but no one thinks that that can replace ultimately doing the whole thing.
Once you have a good physical map, then you can take any cDNA and locate it, and if you get cDNAs from the 5 prime end or 3 prime end, it'll begin to make the identification of the genes more certain.
So I'm sure there has to be cNDA.
The cDNA is a relatively cheap project compared to the final sequencing.
Now, there's been a couple bits of human DNA sequence, and human DNA, you just expect many, many - more and more introns.
And this is worked on in Salt Lake City, not by the conventional machine sequencing, but I won't go into it, but by working [inaudible] Bob Weiss's group, using a model flexing.
But they took part of the neurofibromatosis gene, and where the gene had been found -- as you can see, it has 50 exons.
And initially they had failed to find the gene, because in fact within the gene were several other smaller genes whose cDNAs had been isolated.
And so the cDNAs just identified actually introns within the big chain.
So this has been sequenced at about 100,000 base pairs, which sequenced in a relatively short time.
So you can see -- forgetting what the genes look like.
I think my own thing, is it's a bit scary, to be sure you've got them, if you don't have the background from identifying probably the similar genes first and C. elegans or drosophila.
So I think most of those sequencing efforts, unless you're going for a specific gene, should be focused still on the microorganisms.
You just will get a lot of frustration.
They sequenced a little bit in a region where the Huntington's disease might be, but they were too uncertain as to -- they don't have enough genetic markers to really know where to sequence.
And then a little bit of 19 has been sequenced.
So I personally think one shouldn't get too much into human DNA sequencing for a couple of years, except where you may have to sequence it -- say the breast cancer gene, which was first mapped to chromosome 17 by Mary-Claire King about 2 years ago, a year ago, they had mapped it to within about 20 million base pairs.
Now it's down to 2 million base pairs, that is the gene must rely on [inaudible].
And they're trying to isolate lots and lots of cDNAs, and to go across those and see if they can find any differences in the gene.
They are given the importance of the gene if there was a laboratory which would sequence 2 million base pairs right away.
If I were running things, I would just put out a contract to get it done as fast as possible, because once you've got the genes, it would be infinitely easier to do a human diagnosis.
So say there's a real rush.
So right now we're behind in sequencing.
Mapping is going very well -- behind in the sense we don't have enough people doing it.
I want to sort of conclude -- it was obvious at NIH when we set up the program, we spent 3% of the money on ethical, legal and social issues, which would come out from this new genetic knowledge, that is, from the ability to do genetic diagnosis.
That we had to do something was clear from the past misuses of genetics, which went into femeugenics, when a large number of women were sterilized in the 1920s and '30s, of being genetically unfit to have children on the basis that they were either poor or prostitutes or uneducated.
So knowing whether the defects or these women we would do the genetics or environments was totally unclear, but nonetheless there was a vast sterilization program and so everyone knows that you just have to once you get genetic knowledge.
One, you have to first get real facts of this thing from prejudices, and then when you get real facts, it's not obvious in many cases how we're going to proceed.
So we set up a committee headed by Nancy Wexler, and Nancy was really the person who drove the effort toward finding the map location of the disease for Huntington's disease, that's often a neurological disease.
And her incentive for doing it is that her mother died from it, so it was in her family, she's at risk to it.
And so by putting her as head of the committee, you sensed the problems of someone who fearing you would have the disease, do you tell your employer, do you ever get married, do you dare have children, lots of very sticky and awful problems when you have a disease like this in your family.
So she has been head of the committee.
We appointed I think a very good person, Eric Young, to run it.
We tried to get the committee broadly based and then we got John Beckwith, a well-known and very good bacterial geneticist, who has been prominent and so the luring you could say from the viewpoint of science for the people, sort of group on the left, I think you might say, he's a member of the committee.
So we tried to -- I even suggested we put Jeremy Rifkin on the committee, but that would have probably been irresponsible, because I think Beckwith actually is a very useful member of such committee, because I think when you have a disease in your family, ideologies vanishes.
You just want to solve your family problem.
And I've been trying to get there -- I was trying to get the program to focus on specific issues, not on broad philosophical principles, for which you can argue forever.
Now, I found that in my job I was spending -- certainly every time I met the press or I was in the public, the only thing people really cared about were the ethical issues, what are you going to do about genetic privacy.
One way of illustrating this is there are lots of cartoons.
This is just a magazine cover.
Here you can see DNA just illustrating the point of gene libraries, which you can see that's a big library.
Since this term libraries has been in existence, and we will, with a library which contains -- in fact, it can be quite a small library where you have all the human genetic information, it would take about 300 New York telephone books, if you wanted to store it in a conventional book form, which some day it'll all be on computers and on mine.
And this, if you can see there is DNA, is a time bomb -- you can see one of the base pairs, and you just need 1 base pair long to essentially have a bomb go off later in your life if it leads to a cancer or the Huntington's disease or an episode of manic depressive disease.
So this is the way some people are going to look at it.
This is sort of seeing that every child about to be born, one will think about screening his DNA to see if that child will be born with the disease that his parents can't really cope with or society can't cope with because it's too -- say Sach's disease or Duchenne's which is right now, you can't really do anything about.
So the real issue in the -- and I said at [inaudible], the human genome program will give us the possibility of diagnoses of an increasing number of diseases, and what are you going to do with this information, that is, do you want to just randomly do it and tell people what their fate is going to be?
Do you ask their permission?
I certainly think you should ask their permission, but if they don't know what DNA is, do they know what they're saying when they say they want to know?
And certainly in the case of Huntington's disease, a lot of people, including Nancy Wexler, really don't want to know the answer.
She sort of lived now more than 45 years not knowing, whether she should, and she's certainly hoping the best and it's a 1 chance out of 2.
So until people know what DNA is, and do parents have a right to screen their children, or should a person have the right to wait till he's 21 or she's 21 and make the choice themselves, say, as to whether they were be at risk to breast cancer.
So these are issues.
I think it's quite clear that everyone should be free to do it, whether they want genetic diagnosis.
Certainly it shouldn't be done without permission, so when you give your blood to an insurance company for a cholesterol analysis, I think it should be against the law for them to look to see whether you might carry an HbA allele, which may make you susceptible to diabetes, because you might put down your father had diabetes, and they'll say, well, does that really mean that this person is at risk?
So we right now don't have laws to protect us, and one of the aims of spending this 5% of our money on ethical issues, I think, was to really talk the matter, trying to get it discussed enough before we go and pass laws without -- genetic diagnosis is not yet honest in a big way, so we still have a couple years.
Congress is getting a little less [inaudible], so saying they want to appoint their own committee, which I think means in the next 2 to 5 years we will have laws.
We may see privacy.
And of course, once a test has been done and you know what the answer is, does that mean that you can keep it private; that is, if you know you have a strong probability of dying at a young age, does that mean you have the right to probably get an insurance policy for $5 million without your insurance company knowing that the odds are against the insurance company?
So this really will lead us to I think, real questions as to how insurance should be handled.
And again, the answer isn't very simple, and for this case, I don't know the answer.
What do you do if you're an identical twin?
Because if you find out about yourself, you've found the answer of your twin.
So do you need your twin's permission, do you need them both before you can do this sort of thing.
And if you know you're at risk for premature Alzheimer's, should you reveal this to the person you're going to marry?
It's hard to say there should be a law, but there are these issues, and I think many people, rather than face the issue, would rather not be tested.
And of course, you probably only want to test yourself in many cases if you can do something about it.
Such a case might be breast cancer, because now that they've mapped a gene, they can do some genetic diagnosis, and in a few of the families they've prevented women from having prophylactic mastectomies.
That is, they did not carry the gene, and yet these women were going to have their breasts removed and possibly their ovaries.
So in that case, you can do something about it, and therefore, it would be a reason to go ahead and have the test, or they may be.
I think the main thing is there should not be any form of coercion of any form.
But of course, any people not knowing what to do is just going to ask their doctor, and the doctor then is supposed to say something.
Then there's going to be all the insurance consequences.
The doctor tells you you don't need genetic diagnosis, and then your child has fragile X, can you use your doctor?
And then if you have a genetic diagnosis, when will the insurance companies pay for it?
So it's a big issue, which we're going to have to face.
This just is a cartoon which shows a doctor on top of the pile of the lawyers.
So the lawyers can be in there, and how do you keep us from drowning in even further legal costs?
And this is this question.
Here's a nice cartoon, I think.
Will we be branded by our DNA?
As you walk around, be known as someone who's going to get Alzheimer's, at risk to cancer, or at risk to a mental disease.
And clearly, the answer is we've got to fight very hard to prevent branding.
And you've got to go out of our way, because if we don't, and those who are sort of the victims of what I like to call genetic injustice, not only get the bad genes but then get their life being totally wrecked by this knowledge being publicly available, we just can't -- so real privacy is an important thing.
And up here, I think this one is pretty obvious.
[ Laughter ] So this gets down to genetics and behavior and a lot of people would like to avoid the problem by saying that genes don't affect their behavior.
Well, obviously it's not true.
If your brain isn't wired right, you might get violent, but that doesn't mean that you know, when you see violence, that there's necessarily any genetic -- it just could be -- you have a right to be violent.
If your environment is such that people have been hitting you, and you hit them back.
So -- or you hit society back or something.
This is really there, I think we have to be very careful not to go around talking about genetic causation, behavioral things without evidence.
This isn't going to say that -- we know that many cases say Fragile X. People are severely mentally retarded.
Genes can affect our ability to learn.
It doesn't mean that everyone who fails chemistry is stupid, because you just may not like chemistry.
So it's very complicated of trying to assign these things, and to decide between whether racial group, whether the Japanese produced better cars because they're more intelligent or because they're more docile.
You can give lots of -- it's useless, I think to use these arguments, and we've got to be responsible.
On the other hand, you can't say that there's no connection between say alcoholism and genes.
We know it is from 20 studies.
So some people are more susceptible to really getting alcoholic if they drink beer while in college.
And so it would be wonderful if you could identify it.
So anything which says that we can't do research, where there's a valid reason if you get an answer for positively affecting human life.
But there are people who would like to stop all, any research.
On the other hand, I wouldn't be too enthusiastic about -- and not enthusiastic at all, trying to relate IQs to genes, because there are many, many reasons, which affect your ability to perform an intelligent act.
So we just need a little wisdom, and just pushing those politics would really make sense.
And I think staying away from politics which would just make people feel they're victimized by the wrong set of arguments.
It's very easy to believe that you're rich because you're bright and you're poor because you're stupid, because then you can feel happy or smart, whatever it is, and probably to equate genes with success and things like that, it's easy to do but it smells of the bigotry of the '20s or the bigotry of what we'll find in the '90s.
So you've got to decide why you're doing this sort of thing.
But I guess my final message is we're going to have to live with genetics, as painful as it is to face up to some of the consequences to us individually.
If we've got a gene that we don't want, there are good and bad throws of the genetic dice, and as a society we probably have to always favor the underdog.
If a person essentially hasn't got a good throw, if we don't go out of a way to help them, this information will just make us a meaner society, not a better one.
[ Applause, inaudible voices ] >> Dr. Watson would be happy to answer some questions.
- >> AUDIENCE MEMBER
- >> DR. WATSON
[ Inaudible voice ] Well, getting the gene, you know, the knowledge that the gene responsible for cystic fibrosis is involved in ion transport.
There are people now really trying to come up with drugs which might affect that channel, thinking in terms of it.
So knowledge of the protein, which has gone wrong, really might -- I can't say whether the odds are good or bad, but it certainly makes much more rational, the way you're going to try and treat the disease, because you can't just go around saying, you should have been aborted.
You've got to do something about the person with the disease.
And of course, that's why lots of people are interested in gene therapy.
I didn't talk about it because right now I don't think it's a major answer right now to the majority of genetic diseases, which we now have afflicting society.
But I think we know we will win by diagnosis.
It doesn't help those people who have the disease, and of course, to the extent we can do anything, we've got to spend our money wisely in trying to get drugs that will help.
But we haven't been able to do much about sickle cell anemia, even though we know what it's caused, so I'm not saying that knowing the cause will cure the disease.
- >> AUDIENCE MEMBER
- >> DR. WATSON
[ Inaudible voice ] Well, we're not trying to do that at all.
We just want to have an overview of what the structure is, how many genes there are, try and identify, give function to as many as possible.
Then those indirect to the human diversity can have a field day because then they can focus in on their thing, but it's going to be difficult enough getting one genome.
We're not sequencing the DNA to one person.
It's really a mixture of many, depending on where the DNA came from.
- >> AUDIENCE MEMBER
But then is there much overlap in terms of the different centers and so forth?
- >> DR. WATSON
Well, right now, pretty quickly, the task is so big there won't be much overlap; on the other hand, there's a real race to find the gene responsible for human breast cancer, the [inaudible], so there are always going to be scientific prizes, medical prizes, that we want more than one group, because if more than one group is doing it, that's inefficient.
They may just do it more than one way, and society wants the answer fast.
Right now there's so much work that overlap isn't -- there was a race to see who will get the first physical map, and groups were trying to get 21.
The French really put themselves in a powerful position, but even they realized, well, they can't do it all themselves.
So getting really the maps you will need for the sequencing project will be a refined task -- will take a lot of our centers.
Most of the groups we set up, I think have every reason to continue doing.
Yes, back there?
[ Inaudible voice from audience ] I suspect that's not a very common occurrence, but it's a nice horror story, and there are a lot of lawyers around, who I'm sure, if they saw that sort of thing would put such person out of business.
So I think any society, you've got to on the whole, assume that you're not going to be filled with that sort of person.
But without the genome program, there are a lot of shits around.
[ Applause, inaudible voices ] What I meant is, there are some things that aren't recessive.
Alzheimer's is a dominant, so it doesn't just depend on who you marry; there's one chance in two that your children will carry the disease.
I think it's a real problem, because you've got this insurance business, you know, and every time you go to insurance, who sees it?
It's some person in the office, and they might blabber it.
So how do you -- so I think it's got to be the force of law, and a pretty strong law, that if you go around revealing it, you could end up in jail.
[ Inaudible voices, applause ]
About the speaker
James D. Watson
Cold Spring Harbor Laboratory
Born in Chicago in 1928, James Watson's introduction to biology was through watching birds with his father. At the age of fifteen, he received a scholarship to the University of Chicago, where he majored in zoology and received his B.S. in 1947.
Dr. Watson then enrolled at Indiana University, because his interests had turned to genetics and Indiana at that time had three of the leaders in the field: Hermann Muller, Tracy Sonneborn, and Salvador Luria. His dissertation was on the effects of x-rays on replication of bacteriophage, a project that echoed the watershed research taking place at Cold Spring Harbor. He did his dissertation under Salvador Luria, and received his Ph.D. in 1950.
During a postdoctoral fellowship to continue his phage research in Copenhagen, Watson began to be interested in the structure of DNA. In 1951 he went to the Cavendish Laboratory at Cambridge University to learn the techniques of studying the three-dimensional structure of proteins. There he met British physicist Francis Crick, whose interests had gravitated toward genetics. With x-ray crystallography data from colleagues Rosalind Franklin and Maurice Wilkins, Watson and Crick put together a model of the now-familiar DNA molecule, with its spiral staircase double helix shape and it’s A,C,G,T base pairs. This work was published in 1953, and the three men were awarded the Nobel Prize in Physiology or Medicine in 1962 (Franklin died in 1958).
Dr. Watson then went to the California Institute of Technology, where he worked as a research fellow and then, following another year at the Cavendish, he moved to Harvard University as an assistant professor in 1956; he was appointed associate professor in 1958 and full professor in 1961. He became director of Cold Spring Harbor Laboratory in 1968, though he retained his appointment at Harvard until 1976. From 1988 to 1992 he directed the National Center for Human Genome Research at the National Institutes of Health in addition to Cold Spring Harbor Laboratory.
In addition to the Nobel, Dr. Watson has been honored with the Lasker Award of the American Public Health Association (1960), the John J. Carty Gold Medal of the National Academy of Sciences (1971), and the Presidential Medal of Freedom (1977). He is a member of the Russian National Academy of Sciences, the American Philosophical Society, the Danish Academy of Arts and Sciences, and was a senior fellow in the Society of Fellows of Harvard University. He holds several honorary doctorates, including degrees from Harvard, Notre Dame, Rockefeller University, The Albert Einstein College of Medicine, Indiana University, and The University of Chicago.