Regulation of Gene Expression: Building Transcriptional Regulatory Complexes
INTRODUCTION. More than thirty years ago Francois Jacob and Jacques Monod described the first paradigm for differential gene expression. They proposed that regulation of the switching on and off of genes involved in sugar (lactose) metabolism in bacterial cells is accomplished by the binding of regulatory proteins to regulatory DNA sequences. Looking back it is amazing how prescient they were because we now know that, in general, eukaryotic genes also contain regulatory sequences in their DNA called enhancers and promoters. Gene expression in eukaryotic cells occurs when: (i) genes encoded by the cell's DNA are copied into mRNAs by transcription (much of the regulation of differential gene expression takes place at the level of transcription) and (ii) specific mRNAs are used as templates to make specific proteins in translation. But every gene is not expressed in every cell type (differential gene regulation). Thus, regulation of gene expression can be described as follows: A human cell contains about 100,000 genes, but each specific cell type expresses only about 10,000 genes. The production of a specific collection of proteins by a specific cell type is the result of differential gene expression.
PROMOTERS AND ENHANCERS. Promoters constitute binding sites for RNA polymerase and the general transcription factors. In eukaryotic cells, RNA Polymerase II (RNAPII; a large complex of proteins whose enzymatic activity does the actual copying of DNA into RNA during transcription) is responsible for the transcription of protein-encoding genes. RNAP's helper proteins are the so-called general transcription factors. Together, RNAPII and the general transcription factors are involved in the transcription of nearly every protein-encoding gene. Enhancers are DNA binding sites for gene-specific trancriptional regulatory proteins (proteins that regulate the transcription of one or a subset of genes). In bacteria, generally one protein binds to a single regulatory sequence in the DNA and turns transcription on or off. In eukaryotes, gene transcription is regulated by enhancer DNA sequences that contain not one but multiple regulatory DNA sequence elements that represent binding sites for multiple transcription factors. DNA-bound transcriptional regulatory proteins come together [through protein-protein interactions] to form gene-specific transcriptional regulatory complexes. So it is not a single factor and binding site, but, rather, the specific combination of DNA sequence elements and regulatory proteins that determines which genes will be transcribed in which cell types, a concept known as combinatorial regulation of gene expression. These gene-specific trancription complexes relay trancriptional regulatory information to RNAPII and help to determine whether a gene is transcriptionally active or silent under specific conditions (for example, at a specific time in development; in a particular cell type).
ANIMAL VIRUSES AS MODEL SYSTEMS FOR STUDIES ON TRANSCRIPTIONAL REGULATORY COMPLEXES. What general principles have emerged from studies of transcriptional regulation? In many cases, viruses that infect mammalian cells have served as model systems for studies of eukaryotic gene regulation. Because the virus uses some of the host cell machinery to carry out its own processes, the virus provides us with a window through which to observe regulatory mechanisms operating in animal cells. A model system that has been intensively studied is the Herpes simplex virus (HSV) immediate early (IE) enhancer. Soon after entry of a host cell by HSV, a lytic infection is initiated by activation of transcription of the five IE genes in the HSV genome. Induction of IE gene transcription relies on cellular proteins that interact with the viral transcriptional activating protein VP16. These host cell proteins along with VP16 form multicomponent regulatory complexes on the IE enhancer that activate IE transcription. The IE enhancer contains one or more copies of two distinct regulatory DNA elements, one with the sequence TAATGARAT and the other consisting of repeats of the sequence CGGAAR (A=adenine, T=thymine, C=cytosine, G=guanine, and R = any of the four bases found in DNA, i.e., A, C, T, or G). Investigators were surprised to discover that VP16 does not itself bind to either of these DNA sequences. Rather, the TAATGARAT and CGGAAR elements in the HSV genome constitute binding sites for cellular proteins that bind to DNA in a sequence specific manner.
THE CGGAAR ELEMENT. In order to understand the mechanisms of regulation of IE gene transcription, it was necessary to identify all of the proteins that interact with the IE enhancer and then decipher how they fit together to form transcriptional regulatory complexes. It was shown that one DNA binding activity, called GABP for GA binding protein, was composed of two distinct polypeptides, GABP[alpha] and GABP[beta]. These two proteins, which bind to the CGGAAR element, were first detected in rat liver, and later shown to exist in human tissues. The next step was to unravel the molecular contacts between the proteins and DNA elements that establish the GABP trancriptional regulatory complex. The GABP[alpha] protein contains a domain similar to the DNA binding domain of the ETS family of transcription factors. The so-called ETS domain of GABP[alpha] allows weak binding to DNA, thus providing the first molecular interface of the regulatory complex. GABP[beta] cannot interact with DNA on its own, but in the presence of GABP[alpha], a stable protein-DNA binding complex is formed. What additional contacts are required to build a stable protein-DNA complex at the CGGAAR DNA element? GABP[beta] also interacts with the ETS domain of GABP[alpha] via a series of four tandem 33--amino acid repeats (the ankyrin repeats) found in [beta]. It is interesting that this protein-protein contact surface is also used by other proteins like the red blood cell membrane protein ankyrin, the transcriptional repressor protein IkB, and the developmental regulatory protein Notch from fruit flies. The GABP[beta] protein also contains a domain that facilitates the formation of a tetrameric (four-component) complex consisting of two copies of GABP[alpha] and two copies of GABP[beta]. Other experiments revealed that in the presence of GABP[alpha], GABP[beta], too, makes contact with DNA. Recently, the three-dimensional crystal structure of the GABP complex was determined (see references below).
THE TAATGARAT ELEMENT. Although the HSV transcriptional activator protein VP16, like GABP[beta], cannot interact with DNA on its own, VP16 can participate in the formation of a stable trancriptional regulatory complex at the TAATGARAT element when VP16 is in the presence of certain cellular proteins. The first step in identifying these cellular proteins came when researchers noticed a similarity in the DNA sequence of the TAATGARAT element and the so-called octamer element, an eight-nucleotide DNA sequence that is recognized by a family of cellular octamer binding proteins. Members of the octamer protein family have been shown to regulate transcription of the immunoglobulin genes in immune (B) cells and to participate in cell type specific transcription in neurons (brain cells). Octamer binding (Oct) proteins interact with DNA with the use of a specific type of protein domain called a homeodomain. Association of the Oct-1 with VP16 also occurs via the homeodomain. Oct-1 is very similar to another family member called Oct-2; both proteins bind on their own to the octamer sequence (ATGCTAAT). However, despite the similarities, Oct-1 and Oct-2 have been implicated in the trancriptional regulation of distinct sets of genes. Two related proteins that recognize identical DNA sequences likely regulate different genes by interacting with different proteins to build unique transcriptional regulatory complexes. For example, although the homeodomains of Oct-1 and Oct-2 differ at just seven out of 60 amino acid positions, only Oct-1 is capable of interacting with VP16.
So far, then, we have the following interaction surfaces: Oct-1 to DNA and VP16 to Oct-1. Further experimentation (including DNA-protein crosslinking studies) revealed that in the presence of Oct-1 and the TAATGARAT DNA sequence, VP16, too, makes contact with DNA (specifically with the GARAT part of the sequence). This sounds very much like the story with GABP[beta]. Like GABP[beta], VP16 must interact with another protein to contact DNA. However unlike GABP[beta], entry of Oct-1 into a stable transcriptional regulatory complex with VP16 and TAATGARAT requires at least one additional cellular protein, HCF (host cell factor). HCF interects with VP16 to aid in the building of a stable transcription complex at the TAATGARAT element. What general principles can we extract from these studies on the HSV IE enhancer: that regulation of transcription involves the precise assembly of specific proteins and DNA elements into unique transcriptional regulatory complexes, and that the stability of the complex is determined by a collection of specific protein-DNA and protein-protein contacts. To summarize, the molecular contacts necessary to build stable transcriptional regulatory complexes on the IE enhancer consist of the following: GABP[alpha] contacts DNA and GABP[beta]; one copy of GABP[beta] contacts DNA and another copy of GABP[beta]d; Oct-1 contacts DNA and VP16; and VP16 contacts DNA and HCF. It is through these additional molecular contacts that proteins that bind DNA weakly on their own (GABP[alpha] and Oct-1) can assemble into a stable transcriptional regulatory complex. These protein-DNA complexes can then regulate transcription of the associated gene.
WHY ARE DIFFERENT GENES EXPRESSED IN DIFFERENT CELL TYPES? Now that we have an idea of how transcriptional regulatory complexes are built, let us return to the problem of cell-type specific gene expression. If every cell in the body contains the same DNA, why do only brain cells produce neurotransmitter proteins and only liver cells produce albumin? Through studies conducted over the past ten to 15 years, transcriptional regulation has emerged as a major control point in cell-type specific gene expression. As an example of cell-type specific transcription, we will consider trancriptional regulatory processes that take place in the fat cell.
ORPHAN RECEPTORS. The so-called orphan receptors constitute an intriguing and ever-growing class of transcriptinal regulatory proteins. Orphan receptor proteins are similar to the steroid hormone receptor class of transcription factors. Steroid hormone receptors are ligand-activated proteins that modulate the transcription of selected genes under specific developmental and metabolic conditions. Steroid hormones (the ligand) enter the cell and exert their effects via the steroid hormone receptor proteins which, upon hormone binding, recognize specific DNA binding sequences (hormone response elements, HRE). The HRE-bound receptor modulates the rate of transcription of the adjacent gene, thereby effecting a change in the cellular phenotype. For the orphan receptors, the physiological regulatory ligands have not yet been identified. The discovery of physiologically relevant ligands is a priority in this field, as their identity may provide clues as to how aspects of physiology and development are regulated and integrated. Orphan receptors participate in a wide variety of biological processes, including liver development, neurogenesis, and modulation of pain. Here, our discussion is limited to a fat cell--specific orphan receptor protein that illustrates seminal concepts in transcriptional regulation.
ADIPOCYTE (FAT CELL) DIFFERENTIATION. In vertebrate organisms like humans, fat cells (adipocytes) serve as a nutritional energy repository. An increase in the number of fat cells in the body, through cellular differentiation, can occur at any point in the life of the organism in response to increased food intake. Cellular differentiation, the process by which a precursor cell becomes a specialized cell, usually involves changes in gene expression. Adipocyte differentiation is characterized by alterations in gene expression and cellular morphology. The metabolic changes that occur in cells during adipocyte development can be mimicked in cell culture. Because the "model" system exists, fat cell development is a widely studied process. Adipocyte P2 (aP2) is an adipocyte-specific intracellular lipid binding protein expressed exclusively in differentiated fat cells. Differentiation-dependent, tissue-specific transcription is directed by an adipocyte-specific enhancer, which is composed of several regulatory DNA elements. One DNA element, termed ARE6, is necessary and sufficient for adipocyte-specific transcription. ARE6 specifies a binding site for specific transcriptional regulatory proteins and exhibits sequence similarity with the HREs. The particular DNA sequence motif found in ARE6 is a preferred binding site for two classes of transcriptional regulatory proteins: the retinoid X receptors (RXRs) and the peroxisome proliferator-activated receptors (PPARs). The RXRs constitute a class of transcription factors that bind to and are activated by physiological compounds called retinoids. The PPARs belong to the steroid hormone receptor superfamily and were discovered as proteins that are activated by agents (hypolipidemic drugs, plasticizers, herbicides) that cause proliferation of peroxisomes in rat liver. Scientists went on to discover a PPAR family member (mPPAR[gamma]2) that is present only in fat cells and showed that a distinct RXR protein and mPPAR[gamma]2 form a protein-protein-DNA complex on the ARE6 element of the aP2 enhancer. Also, introduction of mPPAR[gamma]2 and RXR into cultured cells results in transcriptional activation of the aP2 gene via the aP2 enhancer. This aP2 enhancer activity is stimulated by peroxisome proliferators, fatty acids, and certain retinoids. Although the physiological ligand for mPPAR[gamma]2 is unknown, transcriptional activation by the PPARs is stimulated by various complex fats or lipids, including unsaturated fatty acids and arachidonic acid. However, these naturally occuring lipids have not been shown to bind directly to mPPAR[gamma]2. Because fatty acids can induce transcription of the aP2 gene as well as activate the transcription factor mPPAR[gamma]2, it is likely that mPPAR[gamma]2 facilitates the activation of aP2 transcription by fatty acids in differentiating fat cells.
mPPAR[gamma]2 CAN INDUCE ADIPOCYTE DIFFERENTIATION. If one function of mPPAR[gamma]2 is to induce fat cell differentiation in response to physiological lipid activators like fatty acids, this orphan receptor might provide a molecular means of communication between fat cell differentiation and fat metabolism. In a direct test of this possibility, mPPAR[gamma]2 was introduced into cultured precursor cells and was shown to induce differentiation of these cells into fat cells. Differentiation of these cells into fat cells was potentiated by the addition of known PPAR activators to the cells. When mPPAR[gamma]2 and another protein, C/EBP (which belongs to a different class of transcription factors), were introduced into cultured cells at the same time, the cells differentiated into fat cells even more efficiently. Thus several levels of transcriptional regulation participate to achieve tissue-specific transcription and fat cell differentiation: stimulation by lipid and lipid-like compounds (and potentially by an appropriate physiological ligand), the tissue-specific expression of distinct transcription factors, and the establishment of appropriate molecular contacts among multiple types of transcription factors.
TRANSCRIPTIONAL REGULATION AND CHROMATIN. No article on transcription regulation would be complete without mentioning the concept of chromatin. I introduce the topic briefly here; however, this is an extremely active area of research, and one can find numerous articles on chromatin structure and function in the scientific literature. In eukaryotes, transcription occurs in the context of chromatin, where DNA is wound around a complex of eight histone proteins (this octamer complex contains two of each of four types of histone proteins) to form a structure called a nucleosome. One important consideration is how transcriptional regulatory proteins contend with the inherently repressive effects of chromatin. In general, when promoter and enhancer DNA sequences are reconstituted into nucleosomes in vitro, protein binding to DNA and initiation of transcription is inhibited. However, a number of transcripton factors have the ability to recognize their cognate DNA elements in the context of nucleosomal DNA. Binding of these transcription factors causes a disruption of the nucleosomal structure, such that an adjacent regulatory site becomes available for binding to other gene-specific and general transcriptional regulatory proteins. Scientists are in the process of deciphering fully the mechanisms by which trancriptional regulatory proteins enhance transcription in the context of chromatin.
RNAPII - RNA polymerase II
HSV - Herpes simplex virus
IE genes - HSV immediate early genes
GABP - GA binding protein
Oct-1 - Octamer binding protein 1
HRE - Hormone response element
aP2 - adipocyte-specific lipid binding protein 2
RXR - Retinoid X receptor
PPAR - Peroxisome proliferator-activated receptor.
Post your question for Dr. LaMarco.