Winding Your Way Through DNA Symposium
San Francisco, California
Friday Evening, September 25, 1992<
Paul Berg, Ph. D.
Introduction by Harold Varmus:
On June 19, 1975 a headline in the Rolling Stone Magazine said,
"140 scientists ask, 'Now That We Can Rewrite the Genetic Code, What
Are We Going to Say?'"
Beneath that headline, the reporter, Michael Rogers, described one of
the most contentious meetings in the history of science: the
Asilomar Conference to debate the risks of newly discovered
recombinant DNA techniques and to consider placing restrictions on them.
Now although that meeting and its consequences remain controversial, its
intrepid organizer, Paul Berg, our next speaker, won wide respect for
his courage and articulate leadership.
We are again calling upon both his bravery and his clarity to review
the central tenets of molecular genetics, in a sense the intellectual
fallout of what you've just heard from Jim Watson. Now this a field in
which Paul has played several prominent roles as documented by many
honors including the Nobel Prize in chemistry. Like many of our
speakers Paul is a professor at the Stanford University School of
Medicine where he directs the newly constructed Beckman Center for
Molecular Medicine. A fabled lecturer with a deep commitment to
scientific literacy, Paul organized the Stanford Centennial Symposium
on the Human Genome Project last year, a hugely successful public
meeting that inspired our own symposium. We are happy to have him here
tonight to explain molecular genetics in 30 minutes.
In his own inimitable way, Jim Watson has just given you a glimpse into
a historic discovery. I hope he has also impressed on you some sense of
the exhilaration that comes from some breakthrough insight such as the
elucidation of the structure of DNA. Now following on the discovery
that genes consist exclusively of DNA, it transformed every aspect of
biology in the most profound ways, many of which you will hear about
this evening and tomorrow. Now two immediate insights emerged from
First, the central role in the heredity assigned to chromosomes
could be attributed to the DNA that they contained. It was then
only a small step to accept that genes were discrete stretches of DNA
arrayed in a linear fashion along the chromosomes DNA.
Second, it was apparent that genetic information, that is the
presumptive instructions that are used to define, construct and maintain
each living organisms, is encoded in the chemical and molecular
structure of its DNA.
But how? How can an organism's traits be specified and stored in a DNA
What is the form of the information? What does the message say?
How is it read out?
These questions formed the agenda of molecular biology for the 40 years
since the DNA structure was solved. Today the answers to the questions
I posed are largely known, but we are only beginning to learn how the
interplay of the many thousands of messages in an organism's genes
account for the complexities of that organism's developmental and
metabolic behavioral characteristics.
Let's return to the structure of the DNA double helix to consider what
feature might serve as the source of the genetic information. You'll
notice from this diagram that as Jim described the DNA consists of two
helical chains, or two chains wound around each other in a helical way,
and if we look at the backbones of those chains they are composed of
sugar and phosphate in a monotonous structure over the entire DNA
However, if one examines the DNA you see that in fact it consists
also of pairs of bases, A on one strand, T on the other, G on one
strand, C on the other. And therefore it seems evident that the only
structure of this sort can be the order of the bases along the DNA
Now can a sequence of base pairs represent information? Why
Most of you are familiar with the Morse code by which various
combinations of two symbols, dots and dashes, can represent all
the letters of our and other alphabets, and therefore can be used to
encode information as words and sentences.
In a more contemporary context, consider the means we now use to
record and play back sounds and images.
Now you're all familiar with the fact that sound can be encoded by
magnetic variations, laid down along a very thin plastic tape enclosed
in this little box, and that this information can be retrieved by
passing the tape through a transducer which reads these magnetic symbols
and transforms them back into sound. Similarly, compact discs can
encode information and that information can be encoded in the form of
digital information, 0's and l's. This is a message which can be read
by the proper transducer and it represents a meaningful series of
sentences. Indeed, stretchers containing variable arrangements of about
15 0's and l's is sufficient to specify all the words and symbols of all
the world's languages.
Contrast this format with the genetic language in the next slide.
This is the sequence of a viral gene. As you can see, it consists of
nothing more than A's, T's, G's and C's. I only represent one of
the two strands of the DNA because we know that the other can be
described by the base-pairing rule. Mutations in a gene are simply
changes in one or another of these A's, T's, G's or C's. For example,
the replacement of an A by any of the other bases or the loss of one or
more bases are mutations. And these may destroy the gene's function.
Now the first clue as to what genetic information specifies actually
emerged at the turn of the century from the astute observations of a
British physician named Sir Archibald Garrod. Garrod noted that
certain forms of metabolic diseases were inherited. And because
proteins were known to provide the metabolic machinery, he speculated
that genes might be involved in creating the proteins that carry out the
chemical reactions of metabolism. Thus, Garrod surmised that genes
would define an organism's traits through their involvement of making
Now I show this colorful slide only to emphasize the fundamental role
of genes: that is, to make proteins. Make this statement here
indelibly in your mind. Now by the mid 1930's George Beadle and Edward
Tatum studied the genetics of a fungus called Neurospora . Their
evidence confirmed that genes were involved in the production of that
organism's proteins. Moreover, they showed that mutations in certain
genes prevented cells from growing because of their inability to make
particular proteins. Each protein studies seemed to be governed by a
different gene. Beadle and Tatum's discovery suggested that every
protein in a cell results from the expression of a gene. This notion
which became known as the one gene one protein theory supported
Garrod's suggestion about how genes create traits.
Today we know that nothing happens in living systems without proteins.
When a protein is not made, a function is lost. There is also no doubt
that our physical, metabolic and behavioral characteristics are
dependent on the array of proteins our cells and tissues make. So,
too, our susceptibility to diseases such as heart disease, diabetes,
rheumatoid arthritis, among others, depends on the variety and amounts
of proteins we make. It is therefore no exaggeration to say that we are
what we are because of the kind of proteins that we make.
To understand how genes control the formation of proteins, we need to
digress briefly to describe proteins. Proteins consist of amino
acids strung end to end to produce chains whose length varies
with each protein. [This is a single protein and from this angle it is
hard for me to trace its beginning, I think it is there, the chain goes
around and they are numbered so you can follow them along.]
Now some proteins are composed of as few as 50 amino acids and others
of hundreds, thousands, and even more. Most proteins contain about
20 different kinds of amino acids. But the order of the amino
acids along the chain is different in different proteins. So there
is virtually an infinite variety of proteins depending on the order of
amino acids in the chain.
Now proteins normally do not exist as just stretched out
unidimensional chains. Instead, the chains fold up into specific
three-dimensional structures depending on the order of amino acids in
the chain. This is a diagram of a protein which is known to be
associated with human cancer, the ras protein, and it is composed
as you can see, as the chain forms a particular kind of structure called
the alpha helix, the chain goes around and forms a
three-dimensional cleft in which an important element of that gene's
function must bind. That cleft will not be formed if the order of the
amino acids is changed so that the protein cannot fold up into its
functional form. That means that any errors in assembling the correct
order of the amino acids in a protein chain will cause the protein to
fold improperly and cause it to fail to function.
Later, I will mention examples of mistakes that produce a change in the
amino acid sequence that causes the protein to fold improperly and
consequently to fail to carry out the function it is designed for. The
consequences of such mistakes is very often serious disease. Now the
solution of the DNA structure, and the realization that discrete
stretches of DNA constitute the genes, change the focus from how genes
make proteins to questions of how base pair sequences in DNA specify the
enormous collection of proteins made by cells. Indeed this became the
central problem in molecular biology during the 1950s.
There were two problems to solve. One was to discover the
informational relationship or code that relates the linear sequence of
base pairs in a gene to the amino acid sequence of the proteins
specified by that gene. The second was to learn how the decoding
process actually occurs in living cells.
The readout of DNA information into protein is referred to as
translation. However, DNA itself is not involved directly in
making proteins. Only its information is used. And that information is
first transferred to a related type of molecule called RNA. This occurs
by process called transcription.
Now in transcription the two strands of the DNA are transcendently
separated and a new copy of one of the two strands is made by using the
same base pairing rules that holds the two strands of DNA together. The
only exception is that instead of T, which occurs in DNA, there is a
derivative of T called U. So we form a molecule of RNA which is in fact
a copy of the sequence which has been displaced using this strand to
direct the assembly of the so-called complementary or messenger RNA.
Now RNA therefore is a copy of the DNA sequence because it contains the
same order of units along its chain except for the U replacing T.
Therefore RNA contains all the information of the gene which is why the
RNA copies are referred to as messenger RNAs. You might want to
think of RNA as a dialect of DNA.
Throughout the 1950s there were many attempts to solve the genetic code.
That is, to discover what the correspondence is between the order of
bases in DNA and its RNA copy and the order of amino acids in protein.
Recall that the DNA and RNA languages consist of only four different
letters, while the protein language contains 20 different letters.
Thus, there must be some arrangement of A's, G's, U's, and C's in RNA to
signify each one of the 20 amino acids in a protein. Although the task
was considered exceedingly formidable, and most people believed that the
genetic code would not be solved in our lifetime, a chance discovery in
1961 opened the way to cracking the code. And only three years later,
in 1964, the entire genetic dictionary was known.
We can write the dictionary as a series of triplets, three bases each
specifying a discrete amino acid. There are 64 possible triplets to in
fact specify each of the 20 amino acids. So for example, the amino acid
tyrosine is specified by the two triplets TAT and TAC, valine is
specified by four, and there are four signals which serve as
punctuation, one which is ATG, which serves as the start signal for each
and every coding unit. That is, every protein begins with the codon ATG
which specifies the beginning of the protein coding sequence. And every
protein coding sequence is terminated by one of the three triplets or
"stop codons" shown here, which serve as stop signals, which is the
position at which the decoding process terminates.
Now to show you how the correspondence actually goes, I show you here a
very short RNA which begins with an AUG specifying methionine, the next
triplet specifying histidine, and so on until it reaches one of the
termination signals, and the process is terminated here in the protein
chain. So this is in fact a manifestation of this coding relationship,
a sequence of bases specifying a correspondence sequence of amino acids.
One of the remarkable features of the genetic dictionary is that it is,
with minor exceptions, universal. That is, the same codon triplet
specifies the same amino acid in all known organisms on our planet. You
will see that this is an essential feature of the genetic engineering.
Thus, the coding sequence of a human gene will be translated into the
same protein in humans or yeast or a bacterium and vice versa.
Given the coding relationships between DNA and proteins, it is quite
straightforward to understand what mutations are. A change in a gene's
DNA sequence may lead to the production of an altered protein. Often
the change affects only one base pair in DNA. But this may be
sufficient to cause an incorrect amino acid to be put into the protein.
Here is the sequence of bases specifying the beginning of one of the
proteins needed to make human hemoglobin. Sickle cell disease is a
consequence of a change in the triplet GAG to GTG. This causes a change
in the coding so that instead of glumatic acid the amino acid valine is
inserted in this particular position. Here is a three-dimensional view
of the sub-unit of the hemoglobin molecule which is affected. Here is
the amino acid which has been altered. Instead of the normal amino acid
of glutamic, valine has been inserted here and that causes this
particular protein to change its structure when it delivers oxygen to
the tissues. That change in structure causes the red cell to assume a
sickle shape which causes clogging up of the small blood vessels.
Normal hemoglobin does not do this. This one change, out of
approximately 120 amino acids in his protein, has produced this
This [picture] is the change which occurs in approximately 70 percent of
the cystic fibrosis mutations. It results from a loss of 3 bases from
the coding sequence TCT producing a new coding sequence ATT which
eliminates one amino acid from a protein of approximately 1500 amino
acids. This change prevents the protein from functioning properly as a
channel that allows ions to pass in and out of cells. Here, too, a
small change in the coding sequence of the gene has profound
consequences for the organism.
Many if not most human diseases are the result of inherited mutations
affecting proteins with essential physiological or structural functions.
Now knowing the genetic code itself does not tell us how the messenger
RNA is actually translated into a protein. This process is complex and
involves a very large number of repetitive steps. We now know that the
assembly of proteins is carried out by small particles in all cells
called ribosomes which themselves consist of another kind of RNA
associated with a large assortment of proteins. Here is an electron
micrograph showing these particles which are obtained from a microbe
here, and there is a sort of representation which I'll use in my
succeeding slides to represent these particles, these machines which
assemble protein chains by using messenger RNA as the source of the
How does each amino acid pair with its appropriate codons? There
is no known chemical basis for a direct match up between an amino acid
and three bases. Instead, this is achieved by special adapter RNAs
called transfer RNAs to which amino acids become attached, each
different adapter or transfer RNA carries a specific amino acid and it
has a sequence on it of three nucleotides which are complementary to the
codon. We refer to it as the anticodon. The anticodon forms
base pairs with the triplet codon in the message. Base pairing between
the codon and the messenger RNA and the anticodon in the transfer RNA
positions each amino acid at its proper location and facilitates the
joining of amino acids into a protein chain. The next slide shows how a
transfer RNA bearing its amino acid is actually used during protein
assembly. So here we see one amino acid attached to its transfer RNA
and it's held in opposition to the messenger RNA by base pairing between
these complementary triplets. Here is the second amino acid, arriving
at the ribosome, carrying its specific amino acid, pairing with the
second codon. And this then occurs one codon at a time, starting at the
beginning of the coding sequence, reading one codon at a time and adding
one amino acid to the growing chain at each position until it reaches
the stop codon at which point the completed polypeptide chain or coding
chain is released and begins to fold up into its appropriate structure.
Much of what I've discussed about the molecular feature of genes and how
they are translated into proteins was discovered in simple organisms
like bacteria and the viruses that infect them. But the technological
breakthroughs of genetic engineering pioneered by the work of Stanley
Cohen and Herb Boyer made the DNA sequences of every organism's genes
available for in-depth analysis.
One of the surprises that emerged when DNA cloning became possible and
the structure of human and other mammalian genes were examined was that
they have a different design from those in simple organisms. Whereas
most bacterial genes have an uninterrupted coding sequence, that is a
continuous sequence of base pair specifying a protein's amino acid
sequence, mammalian genes are interrupted by noncoding sequences
Here is a diagram which shows the DNA strands. In this particular case
the red sequences represent the protein coding information and the blue
the interruptions of noncoding information which we refer to as
introns. During transcription, that is the copying of the DNA
into RNA, all of that sequence is copied and subsequently the blue
sequences, the interruptions, are removed by a process we refer to as
RNA splicing. The end result of splicing is to bring together
all of the coding sequence to form a continuous stretch which now
resembles those of bacterial messenger RNAs, and of course these encode
a corresponding sequence of amino acids in the protein. In many cases
the lengths of these introns exceed the coding stretches by factors of 5
to 10. So although mammalian genes may be very long, their protein
coding sequences may be only ten to twenty percent of the gene's length.
This new wrinkle has enormous implications for gene function, for how
genes are expressed in higher organisms. Now we know that because of
alternate ways of splicing out introns it is possible to produce more
than one protein from a single gene, each with a different
biological function. Indeed, whether a fly will be male or female
depends on how a particular gene is spliced. There are still many
intriguing questions about how genes function. However, based on past
history a quote from Lawrence Durell seems appropriate: "With every
advance from the known to the unknown, the mystery increases."
Questions from the Panel for Paul Berg
Marcia Barinaga: From the sequence of DNA you can tell the
sequence of the protein it codes for just by knowing the code for the
amino acids. What does the sequence of the protein tell you? Can it
tell you anything about what the protein does?
Paul Berg: Well, that is a goal, trying to interpret the sequence
of a protein, trying to understand what function it may have. Now, 10
years ago it would have been a fruitless effort. Today so many genes
have been cloned and when we examine their sequences we can often see
features, the same kind of sequence and structure in another protein
whose function we do know. And therefore it becomes possible to make
intelligent deductions about what a protein might do on the basis of
just looking at its sequence. For example, we know certain features of
a protein that allow it to be embedded in a membrane. And those kinds
of sequence are characteristic. And so to an educated eye, one can
almost look at a protein sequence and deduce whether it might be a
membrane associated protein rather than a cytoplasmic protein. We can
also recognize features of proteins that serve as regulatory elements in
regulating gene expression. Some of those have very characteristic
features. In the literature today it is astonishing at the way in which
people cloning a gene and deducing only its protein sequence from the
DNA sequence can make very educated guesses about what those proteins
might do and very frequently they turn out to be correct.
Natalie Angier: I have a question that goes more to the historic
and political. Dr. Varmus alluded to the historic Asilomar conference
that took place about 17 years ago (1975) in his opening remarks, and at
that conference much of the early anxiety as well as excitement was
aired about this new field. I wonder if you'd reflect a little bit upon
the changes in the climate of opinion since Asilomar both among your
scientific peers as well as the public at large.
Paul Berg: Well, the power of hindsight is extraordinarily
impressive. I think in 1974 and 1975 we were confronted with a new
development. A new power if you will, to do things that nobody had been
able to do before, experimentally. There was a question about whether
some of the things people might do would be harmful to themselves, to
the people who work around them, or to the public at large. I think
many people have misread the history of the Asilomar period as assuming
that all of us who signed that original letter were persuaded or
convinced that we were dealing with a very dangerous technology. That
is very far from the truth. I think in our own hearts most of us felt
that there was probably very little risk but perhaps some, and it was
because we could not eliminate the possibility of any risk with 100
percent certainty that we acted in what I believe was a prudent way,
which was to just slow down the pace at which we could do things and to
prescribe what we thought were reasonable. Not everybody agreed that
they were sensible, but they were reasonable ways to proceed. And I
think in the long run it has helped the field. I think at the time
there were many people who would have blown the whistle, so to speak,
had we tried to cover up and people proceeded with kinds of experiments
because I think there were people willing to call this to the attention
of the public, accusing us of having hidden or proceeded in experiments
that were highly dangerous, as many people said. So I think the fact
the we acted first and seemed to be acting responsibly helped the field.
Now no doubt people were inconvenienced and there were expenses
associated with the requirements needed to follow the guidelines, the
initial form of the guidelines, but I think all of us felt those were
not unreasonable as a preliminary step expecting that as we learned more
and more it was very likely that the risks would appear less and less
and therefore the bureaucracy and the regulations would probably be
relieved, which is exactly what has happened. So I don't think we have
the public still suspecting that we are doing highly dangerous
experiments. The focus has changed more to some of the ethical and
moral considerations that are associated with genetic modifications or
Natalie Angier: If I can just follow up on that, do you see any
applications of the technology in the future that give you either pause
or unease on an ethical level that may have given pause or unease 20
years ago, in terms of possible risk?
Paul Berg: Well, I think I'll speak for what I think is a general
feeling, that in the area of genetically modifying human cells there is
a sort of boundary in terms of where we would aim our experiments. One
is somatic cell modification. That seems to be ethically and
morally acceptable. Germline modification, where we modify the
genes in those cells which give rise to future generations, is probably
not a wise thing to do because we know so little about the impact or the
effect of disturbing the genome and what its consequences might be for
future generations. Once done, there is no way to recall it. Somatic
modification has a much more limited context and in many ways might be
seen as not more than traditional forms of therapy. So that's about one
area where I think that people have talked about. But in terms of the
kinds of experiments we are doing in terms of gene cloning, isolation of
genes and modifying cells, I personally do not see any problems in terms
of public health or endangering public health or in ethical ways.
Natalie Angier: Just for clarification can you tell us what a
somatic cell is?
Paul Berg: A somatic cell is every cell of the body other than
those which are used for procreation. That is, the sperm and the egg we
refer to as germ cells and the genes they carry we refer to as the
germline because they are transmitted to future generations. But the
cells in our liver, or in our bone marrow, or in our brain are somatic
cells. Their genetic content is never transferred to the offspring.
And so modifying the cells of those tissues will not reflect any change
in future generations.