D.C. (2/19/00) The Human Genome Project, ahead of schedule and under budget
has almost completed its first objective, identifying the complete DNA sequence
of the human genome. Now the hard part begins, as researchers begin to develop
strategies to more quickly identify those genes and gene products that may
have therapeutic potential.
David Haussler, UC Santa Cruz
Numerous research groups and a number of private companies are working feverishly
to screen genetic gold from the dross. At stake are not only potential cures
for cancer, heart disease and other ailments, but billions if not trillions
of dollars for those companies that identify important sequences. The combination
of hardware and software techniques from the molecular biology lab and the
computer lab to sort out useful proteins has spawned a new subspecialty, bioinformatics.
Several researchers were on hand at the annual meeting of the American Association
for the Advancement of Science to present their latest findings in this field.
The problem facing researchers in this field is the sheer volume of gene
sequence data produced by the Human Genome Project and related undertakings.
The challenge is to identify genes in the less than 10 percent of the human
genome that is thought to comprise protein-coding gene sequences. Once the
gene is identified, researchers then try and determine what the protein product
is and how it is regulated.
"The driving force behind bioinformatics is the availability of these large
databases and the need to come up with sophisticated computer models for extracting
useful information from them. Computer analysis will be an integral part of
identifying genes and understanding their functions," said David Haussler,
professor of computer science at the University of California, Santa Cruz.
Haussler also recently joined the Human Genome Project's bioinformatics team.
Dr. Haussler and colleagues have pioneered several important computational
techniques to aid in finding genes. They have pioneered the use of a new statistical
method based on an idea known as the theory of support vector machines (SVMs).
SVMs are able to handle high-dimensional datasets in which each data point
has many features or attributes. Using powerful computers, this system will
help the scientists put the currently disorganized DNA sequence information
of the Human Genoma Project in an order more suitable for gene hunting. Once
target genes are identified, bioinformatic techniques will also be essential
in the development of applications such as screening tests and medical treatments.
"Our vision for bioinformatics spans a broad spectrum, from basic molecular
biology all the way up to clinical diagnostics," Haussler said.
NY Shows Initiative
A group of New York City scientists are developing a bold strategy to take
advantage of the data hidden within the Human Genoma project databases. The
'structural genomics initiative' aims to use bioinformatic technologies to
identify promising drug targets. They will focus on proteins that cause disease
in humans, as well as those that are used in treating disease.
are embarking on a program, which, if proven effective, will provide a way
for researchers to come to grips with the impending flood of genetic data
and speed its translation into therapeutic use," says Andrej Sali, assistant
professor, Alfred P. Sloan research fellow and Sinsheimer scholar at The Rockefeller
University. "The initiative is aimed at developing a comprehensive mechanistic
understanding of human and microbial physiology at the molecular level. This
strategy should lead us to medically relevant data more quickly."
Human genome research has become a vast international effort involving thousands
of research groups. Researchers can avail of free (and commercial) databases
of known sequences via the Internet, where they can also post their own findings.
The New York team hopes to accelerate its own work by putting its findings
on the web. It is their hope that any protein structure they discover will
be of immediate relevance to academic and/or industrial research teams studying
that biological system. By publicizing target lists on the Internet, the structural
genomics pilot studies they conduct could generate scientific interest and
expertise and attract suggestions for additions to their respective target
lists. The pilot studies will serve as an important resource for distribution
of tools and reagents for research.The pilot studies will be able to serve
as an important resource for distribution of tools and reagents for research,
Sali says adding:
"One can imagine that some future NIH grant applications would include both
a request for funds and a request for a supply of a particular purified protein
deposited in a centralized cold-storage facility."
Ultimately, Sali believes research in the area of structural genomics could
provide the means to address one of the great unsolved problems in molecular
biology, that is, the the relationship between one-dimensional sequence information
(the order of amino acids in a protein) and three-dimensional structure (the
folds of the complete protein).