About AE   About NHM   Contact Us   Terms of Use   Copyright Info   Privacy Policy   Advertising Policies   Site Map
Ads on AE
Custom Search of AE Site
spacer spacer

Winding Your Way Through DNA Symposium

San Francisco, California
Friday Evening, September 25, 1992<

Paul Berg, Ph. D.

Introduction by Harold Varmus:

On June 19, 1975 a headline in the Rolling Stone Magazine said,

"140 scientists ask, 'Now That We Can Rewrite the Genetic Code, What Are We Going to Say?'"

Beneath that headline, the reporter, Michael Rogers, described one of the most contentious meetings in the history of science: the Asilomar Conference to debate the risks of newly discovered recombinant DNA techniques and to consider placing restrictions on them. Now although that meeting and its consequences remain controversial, its intrepid organizer, Paul Berg, our next speaker, won wide respect for his courage and articulate leadership.

We are again calling upon both his bravery and his clarity to review the central tenets of molecular genetics, in a sense the intellectual fallout of what you've just heard from Jim Watson. Now this a field in which Paul has played several prominent roles as documented by many honors including the Nobel Prize in chemistry. Like many of our speakers Paul is a professor at the Stanford University School of Medicine where he directs the newly constructed Beckman Center for Molecular Medicine. A fabled lecturer with a deep commitment to scientific literacy, Paul organized the Stanford Centennial Symposium on the Human Genome Project last year, a hugely successful public meeting that inspired our own symposium. We are happy to have him here tonight to explain molecular genetics in 30 minutes.


In his own inimitable way, Jim Watson has just given you a glimpse into a historic discovery. I hope he has also impressed on you some sense of the exhilaration that comes from some breakthrough insight such as the elucidation of the structure of DNA. Now following on the discovery that genes consist exclusively of DNA, it transformed every aspect of biology in the most profound ways, many of which you will hear about this evening and tomorrow. Now two immediate insights emerged from these breakthroughs.

First, the central role in the heredity assigned to chromosomes could be attributed to the DNA that they contained. It was then only a small step to accept that genes were discrete stretches of DNA arrayed in a linear fashion along the chromosomes DNA.

Second, it was apparent that genetic information, that is the presumptive instructions that are used to define, construct and maintain each living organisms, is encoded in the chemical and molecular structure of its DNA.

  • But how? How can an organism's traits be specified and stored in a DNA double helix?

  • What is the form of the information? What does the message say?

  • How is it read out?

These questions formed the agenda of molecular biology for the 40 years since the DNA structure was solved. Today the answers to the questions I posed are largely known, but we are only beginning to learn how the interplay of the many thousands of messages in an organism's genes account for the complexities of that organism's developmental and metabolic behavioral characteristics.

Let's return to the structure of the DNA double helix to consider what feature might serve as the source of the genetic information. You'll notice from this diagram that as Jim described the DNA consists of two helical chains, or two chains wound around each other in a helical way, and if we look at the backbones of those chains they are composed of sugar and phosphate in a monotonous structure over the entire DNA sequence.

However, if one examines the DNA you see that in fact it consists also of pairs of bases, A on one strand, T on the other, G on one strand, C on the other. And therefore it seems evident that the only structure of this sort can be the order of the bases along the DNA sequence.

Now can a sequence of base pairs represent information? Why not?

Most of you are familiar with the Morse code by which various combinations of two symbols, dots and dashes, can represent all the letters of our and other alphabets, and therefore can be used to encode information as words and sentences.

In a more contemporary context, consider the means we now use to record and play back sounds and images. Now you're all familiar with the fact that sound can be encoded by magnetic variations, laid down along a very thin plastic tape enclosed in this little box, and that this information can be retrieved by passing the tape through a transducer which reads these magnetic symbols and transforms them back into sound. Similarly, compact discs can encode information and that information can be encoded in the form of digital information, 0's and l's. This is a message which can be read by the proper transducer and it represents a meaningful series of sentences. Indeed, stretchers containing variable arrangements of about 15 0's and l's is sufficient to specify all the words and symbols of all the world's languages.

Contrast this format with the genetic language in the next slide. This is the sequence of a viral gene. As you can see, it consists of nothing more than A's, T's, G's and C's. I only represent one of the two strands of the DNA because we know that the other can be described by the base-pairing rule. Mutations in a gene are simply changes in one or another of these A's, T's, G's or C's. For example, the replacement of an A by any of the other bases or the loss of one or more bases are mutations. And these may destroy the gene's function.

Now the first clue as to what genetic information specifies actually emerged at the turn of the century from the astute observations of a British physician named Sir Archibald Garrod. Garrod noted that certain forms of metabolic diseases were inherited. And because proteins were known to provide the metabolic machinery, he speculated that genes might be involved in creating the proteins that carry out the chemical reactions of metabolism. Thus, Garrod surmised that genes would define an organism's traits through their involvement of making proteins.

Now I show this colorful slide only to emphasize the fundamental role of genes: that is, to make proteins. Make this statement here indelibly in your mind. Now by the mid 1930's George Beadle and Edward Tatum studied the genetics of a fungus called Neurospora . Their evidence confirmed that genes were involved in the production of that organism's proteins. Moreover, they showed that mutations in certain genes prevented cells from growing because of their inability to make particular proteins. Each protein studies seemed to be governed by a different gene. Beadle and Tatum's discovery suggested that every protein in a cell results from the expression of a gene. This notion which became known as the one gene one protein theory supported Garrod's suggestion about how genes create traits.

Today we know that nothing happens in living systems without proteins. When a protein is not made, a function is lost. There is also no doubt that our physical, metabolic and behavioral characteristics are dependent on the array of proteins our cells and tissues make. So, too, our susceptibility to diseases such as heart disease, diabetes, rheumatoid arthritis, among others, depends on the variety and amounts of proteins we make. It is therefore no exaggeration to say that we are what we are because of the kind of proteins that we make.

To understand how genes control the formation of proteins, we need to digress briefly to describe proteins. Proteins consist of amino acids strung end to end to produce chains whose length varies with each protein. [This is a single protein and from this angle it is hard for me to trace its beginning, I think it is there, the chain goes around and they are numbered so you can follow them along.]

Now some proteins are composed of as few as 50 amino acids and others of hundreds, thousands, and even more. Most proteins contain about 20 different kinds of amino acids. But the order of the amino acids along the chain is different in different proteins. So there is virtually an infinite variety of proteins depending on the order of amino acids in the chain.

Now proteins normally do not exist as just stretched out unidimensional chains. Instead, the chains fold up into specific three-dimensional structures depending on the order of amino acids in the chain. This is a diagram of a protein which is known to be associated with human cancer, the ras protein, and it is composed as you can see, as the chain forms a particular kind of structure called the alpha helix, the chain goes around and forms a three-dimensional cleft in which an important element of that gene's function must bind. That cleft will not be formed if the order of the amino acids is changed so that the protein cannot fold up into its functional form. That means that any errors in assembling the correct order of the amino acids in a protein chain will cause the protein to fold improperly and cause it to fail to function.

Later, I will mention examples of mistakes that produce a change in the amino acid sequence that causes the protein to fold improperly and consequently to fail to carry out the function it is designed for. The consequences of such mistakes is very often serious disease. Now the solution of the DNA structure, and the realization that discrete stretches of DNA constitute the genes, change the focus from how genes make proteins to questions of how base pair sequences in DNA specify the enormous collection of proteins made by cells. Indeed this became the central problem in molecular biology during the 1950s.

There were two problems to solve. One was to discover the informational relationship or code that relates the linear sequence of base pairs in a gene to the amino acid sequence of the proteins specified by that gene. The second was to learn how the decoding process actually occurs in living cells.

The readout of DNA information into protein is referred to as translation. However, DNA itself is not involved directly in making proteins. Only its information is used. And that information is first transferred to a related type of molecule called RNA. This occurs by process called transcription.

Now in transcription the two strands of the DNA are transcendently separated and a new copy of one of the two strands is made by using the same base pairing rules that holds the two strands of DNA together. The only exception is that instead of T, which occurs in DNA, there is a derivative of T called U. So we form a molecule of RNA which is in fact a copy of the sequence which has been displaced using this strand to direct the assembly of the so-called complementary or messenger RNA. Now RNA therefore is a copy of the DNA sequence because it contains the same order of units along its chain except for the U replacing T. Therefore RNA contains all the information of the gene which is why the RNA copies are referred to as messenger RNAs. You might want to think of RNA as a dialect of DNA.

Throughout the 1950s there were many attempts to solve the genetic code. That is, to discover what the correspondence is between the order of bases in DNA and its RNA copy and the order of amino acids in protein. Recall that the DNA and RNA languages consist of only four different letters, while the protein language contains 20 different letters. Thus, there must be some arrangement of A's, G's, U's, and C's in RNA to signify each one of the 20 amino acids in a protein. Although the task was considered exceedingly formidable, and most people believed that the genetic code would not be solved in our lifetime, a chance discovery in 1961 opened the way to cracking the code. And only three years later, in 1964, the entire genetic dictionary was known.

We can write the dictionary as a series of triplets, three bases each specifying a discrete amino acid. There are 64 possible triplets to in fact specify each of the 20 amino acids. So for example, the amino acid tyrosine is specified by the two triplets TAT and TAC, valine is specified by four, and there are four signals which serve as punctuation, one which is ATG, which serves as the start signal for each and every coding unit. That is, every protein begins with the codon ATG which specifies the beginning of the protein coding sequence. And every protein coding sequence is terminated by one of the three triplets or "stop codons" shown here, which serve as stop signals, which is the position at which the decoding process terminates.

Now to show you how the correspondence actually goes, I show you here a very short RNA which begins with an AUG specifying methionine, the next triplet specifying histidine, and so on until it reaches one of the termination signals, and the process is terminated here in the protein chain. So this is in fact a manifestation of this coding relationship, a sequence of bases specifying a correspondence sequence of amino acids.

One of the remarkable features of the genetic dictionary is that it is, with minor exceptions, universal. That is, the same codon triplet specifies the same amino acid in all known organisms on our planet. You will see that this is an essential feature of the genetic engineering. Thus, the coding sequence of a human gene will be translated into the same protein in humans or yeast or a bacterium and vice versa.

Given the coding relationships between DNA and proteins, it is quite straightforward to understand what mutations are. A change in a gene's DNA sequence may lead to the production of an altered protein. Often the change affects only one base pair in DNA. But this may be sufficient to cause an incorrect amino acid to be put into the protein.

Here is the sequence of bases specifying the beginning of one of the proteins needed to make human hemoglobin. Sickle cell disease is a consequence of a change in the triplet GAG to GTG. This causes a change in the coding so that instead of glumatic acid the amino acid valine is inserted in this particular position. Here is a three-dimensional view of the sub-unit of the hemoglobin molecule which is affected. Here is the amino acid which has been altered. Instead of the normal amino acid of glutamic, valine has been inserted here and that causes this particular protein to change its structure when it delivers oxygen to the tissues. That change in structure causes the red cell to assume a sickle shape which causes clogging up of the small blood vessels. Normal hemoglobin does not do this. This one change, out of approximately 120 amino acids in his protein, has produced this devastating disease.

This [picture] is the change which occurs in approximately 70 percent of the cystic fibrosis mutations. It results from a loss of 3 bases from the coding sequence TCT producing a new coding sequence ATT which eliminates one amino acid from a protein of approximately 1500 amino acids. This change prevents the protein from functioning properly as a channel that allows ions to pass in and out of cells. Here, too, a small change in the coding sequence of the gene has profound consequences for the organism.

Many if not most human diseases are the result of inherited mutations affecting proteins with essential physiological or structural functions. Now knowing the genetic code itself does not tell us how the messenger RNA is actually translated into a protein. This process is complex and involves a very large number of repetitive steps. We now know that the assembly of proteins is carried out by small particles in all cells called ribosomes which themselves consist of another kind of RNA associated with a large assortment of proteins. Here is an electron micrograph showing these particles which are obtained from a microbe here, and there is a sort of representation which I'll use in my succeeding slides to represent these particles, these machines which assemble protein chains by using messenger RNA as the source of the information.

How does each amino acid pair with its appropriate codons? There is no known chemical basis for a direct match up between an amino acid and three bases. Instead, this is achieved by special adapter RNAs called transfer RNAs to which amino acids become attached, each different adapter or transfer RNA carries a specific amino acid and it has a sequence on it of three nucleotides which are complementary to the codon. We refer to it as the anticodon. The anticodon forms base pairs with the triplet codon in the message. Base pairing between the codon and the messenger RNA and the anticodon in the transfer RNA positions each amino acid at its proper location and facilitates the joining of amino acids into a protein chain. The next slide shows how a transfer RNA bearing its amino acid is actually used during protein assembly. So here we see one amino acid attached to its transfer RNA and it's held in opposition to the messenger RNA by base pairing between these complementary triplets. Here is the second amino acid, arriving at the ribosome, carrying its specific amino acid, pairing with the second codon. And this then occurs one codon at a time, starting at the beginning of the coding sequence, reading one codon at a time and adding one amino acid to the growing chain at each position until it reaches the stop codon at which point the completed polypeptide chain or coding chain is released and begins to fold up into its appropriate structure.

Much of what I've discussed about the molecular feature of genes and how they are translated into proteins was discovered in simple organisms like bacteria and the viruses that infect them. But the technological breakthroughs of genetic engineering pioneered by the work of Stanley Cohen and Herb Boyer made the DNA sequences of every organism's genes available for in-depth analysis.

One of the surprises that emerged when DNA cloning became possible and the structure of human and other mammalian genes were examined was that they have a different design from those in simple organisms. Whereas most bacterial genes have an uninterrupted coding sequence, that is a continuous sequence of base pair specifying a protein's amino acid sequence, mammalian genes are interrupted by noncoding sequences throughout.

Here is a diagram which shows the DNA strands. In this particular case the red sequences represent the protein coding information and the blue the interruptions of noncoding information which we refer to as introns. During transcription, that is the copying of the DNA into RNA, all of that sequence is copied and subsequently the blue sequences, the interruptions, are removed by a process we refer to as RNA splicing. The end result of splicing is to bring together all of the coding sequence to form a continuous stretch which now resembles those of bacterial messenger RNAs, and of course these encode a corresponding sequence of amino acids in the protein. In many cases the lengths of these introns exceed the coding stretches by factors of 5 to 10. So although mammalian genes may be very long, their protein coding sequences may be only ten to twenty percent of the gene's length.

This new wrinkle has enormous implications for gene function, for how genes are expressed in higher organisms. Now we know that because of alternate ways of splicing out introns it is possible to produce more than one protein from a single gene, each with a different biological function. Indeed, whether a fly will be male or female depends on how a particular gene is spliced. There are still many intriguing questions about how genes function. However, based on past history a quote from Lawrence Durell seems appropriate: "With every advance from the known to the unknown, the mystery increases."

Questions from the Panel for Paul Berg

Marcia Barinaga: From the sequence of DNA you can tell the sequence of the protein it codes for just by knowing the code for the amino acids. What does the sequence of the protein tell you? Can it tell you anything about what the protein does?

Paul Berg: Well, that is a goal, trying to interpret the sequence of a protein, trying to understand what function it may have. Now, 10 years ago it would have been a fruitless effort. Today so many genes have been cloned and when we examine their sequences we can often see features, the same kind of sequence and structure in another protein whose function we do know. And therefore it becomes possible to make intelligent deductions about what a protein might do on the basis of just looking at its sequence. For example, we know certain features of a protein that allow it to be embedded in a membrane. And those kinds of sequence are characteristic. And so to an educated eye, one can almost look at a protein sequence and deduce whether it might be a membrane associated protein rather than a cytoplasmic protein. We can also recognize features of proteins that serve as regulatory elements in regulating gene expression. Some of those have very characteristic features. In the literature today it is astonishing at the way in which people cloning a gene and deducing only its protein sequence from the DNA sequence can make very educated guesses about what those proteins might do and very frequently they turn out to be correct.

Natalie Angier: I have a question that goes more to the historic and political. Dr. Varmus alluded to the historic Asilomar conference that took place about 17 years ago (1975) in his opening remarks, and at that conference much of the early anxiety as well as excitement was aired about this new field. I wonder if you'd reflect a little bit upon the changes in the climate of opinion since Asilomar both among your scientific peers as well as the public at large.

Paul Berg: Well, the power of hindsight is extraordinarily impressive. I think in 1974 and 1975 we were confronted with a new development. A new power if you will, to do things that nobody had been able to do before, experimentally. There was a question about whether some of the things people might do would be harmful to themselves, to the people who work around them, or to the public at large. I think many people have misread the history of the Asilomar period as assuming that all of us who signed that original letter were persuaded or convinced that we were dealing with a very dangerous technology. That is very far from the truth. I think in our own hearts most of us felt that there was probably very little risk but perhaps some, and it was because we could not eliminate the possibility of any risk with 100 percent certainty that we acted in what I believe was a prudent way, which was to just slow down the pace at which we could do things and to prescribe what we thought were reasonable. Not everybody agreed that they were sensible, but they were reasonable ways to proceed. And I think in the long run it has helped the field. I think at the time there were many people who would have blown the whistle, so to speak, had we tried to cover up and people proceeded with kinds of experiments because I think there were people willing to call this to the attention of the public, accusing us of having hidden or proceeded in experiments that were highly dangerous, as many people said. So I think the fact the we acted first and seemed to be acting responsibly helped the field. Now no doubt people were inconvenienced and there were expenses associated with the requirements needed to follow the guidelines, the initial form of the guidelines, but I think all of us felt those were not unreasonable as a preliminary step expecting that as we learned more and more it was very likely that the risks would appear less and less and therefore the bureaucracy and the regulations would probably be relieved, which is exactly what has happened. So I don't think we have the public still suspecting that we are doing highly dangerous experiments. The focus has changed more to some of the ethical and moral considerations that are associated with genetic modifications or manipulation.

Natalie Angier: If I can just follow up on that, do you see any applications of the technology in the future that give you either pause or unease on an ethical level that may have given pause or unease 20 years ago, in terms of possible risk?

Paul Berg: Well, I think I'll speak for what I think is a general feeling, that in the area of genetically modifying human cells there is a sort of boundary in terms of where we would aim our experiments. One is somatic cell modification. That seems to be ethically and morally acceptable. Germline modification, where we modify the genes in those cells which give rise to future generations, is probably not a wise thing to do because we know so little about the impact or the effect of disturbing the genome and what its consequences might be for future generations. Once done, there is no way to recall it. Somatic modification has a much more limited context and in many ways might be seen as not more than traditional forms of therapy. So that's about one area where I think that people have talked about. But in terms of the kinds of experiments we are doing in terms of gene cloning, isolation of genes and modifying cells, I personally do not see any problems in terms of public health or endangering public health or in ethical ways.

Natalie Angier: Just for clarification can you tell us what a somatic cell is?

Paul Berg: A somatic cell is every cell of the body other than those which are used for procreation. That is, the sperm and the egg we refer to as germ cells and the genes they carry we refer to as the germline because they are transmitted to future generations. But the cells in our liver, or in our bone marrow, or in our brain are somatic cells. Their genetic content is never transferred to the offspring. And so modifying the cells of those tissues will not reflect any change in future generations.

Winding Your Way Through DNA Symposium Index

Career Center Index

About Biotech Index

Custom Search on the AE Site