|
|
||||||||



,1
* Laboratory of Comparative Genomics, University of Illinois, Urbana, Illinois 61801;
Department of Veterinary Pathobiology, University of Illinois, Urbana, Illinois 61801;
Laboratory of Molecular Genetics, Department of Animal Sciences, University of Illinois, Urbana, Illinois 61801; and
Department of Pharmacology and Oncology, Duke University Medical Center, Durham, North Carolina 27710
1To whom requests for reprints should be addressed at 329 Edward R. Madigan Laboratory, University of Illinois, 1201 West Gregory Drive, Urbana, IL 61801. E-mail: schook{at}uiuc.edu
| Abstract |
|---|
|
|
|---|
Key Words: genomics animal models transgenesis recombineering
| Introduction |
|---|
|
|
|---|
Resolving human complex diseases is difficult (e.g., the time required to develop disease symptoms, expenses associated with human clinical experiments, ethical issues) and, thus, appropriate biomedical models must be developed and validated. Biomedical models have been defined as surrogates for a human being, or a human biologic system, that can be used to understand normal and abnormal function from gene to phenotype and to provide a basis for preventive or therapeutic intervention in human diseases (3, 4). In the past, researchers have used two approaches to study human diseases. One strategy fully characterizes a human clinical disease and chooses the most appropriate animal model based on criteria such as anatomical and/or physiological characteristics (i.e., biological relevance), cost, and animal husbandry required. Another tactic has been to fully characterize a naturally occurring or induced (by chemical or radiation exposure) mutant animal (most commonly the rat or mouse) and identify which human disease it resembles. Although a great deal of progress has been achieved using these methods, they have intrinsic faults that limit their relevance to clinical medicine. The recent advent of techniques in molecular biology, genomics, transgenesis, and cloning furnishes investigators with a new ability to study vertebrates such as pigs, cows, chickens, and dogs with greater precision and utilize them as model organisms.
Comparative and functional genomics and proteomics provide effective approaches in identifying the genetic and environmental factors responsible for complex diseases and in the development of prevention and treatment strategies and therapeutics. By identifying and studying homologous genes across species, researchers are able to accurately translate and apply experimental data from animal experiments to humans and vice versa. Although gene location and sequence is important, determining the functions and regulatory elements of genes is the critical step in understanding the biology of a disease and how it may be prevented or treated. This review supports the hypothesis that associated enabling technologies can be used to create, de novo, appropriate animal models that recapitulate the human clinical manifestation. Comparative and functional genomic and proteomic techniques can then be used to identify gene and protein functions and the interactions responsible for disease phenotypes, which aids in the development of prevention and treatment strategies.
| Traditional View: Choosing Appropriate Animal Models |
|---|
|
|
|---|
|
| Current View: Utilizing Genetic Information to Create Animal Models |
|---|
|
|
|---|
Until recently, most animals have not been amenable to both forward and reverse genetic manipulations and, thus, investigators were forced to choose between biomedical relevance and genetic power. Several invertebrate organisms possess many of the qualities of an ideal genetic model organism (e.g., short generation time, small size, sequenced genome, genetically amenable) and are useful in basic gene and protein functional analysis. However, invertebrates often lack a direct link to human medicine because of their simplistic anatomy and physiology and lack of gene homologs. For example, yeast strains are poor models for apoptosis because they lack endogenous caspases, or Bcl-2s genes, that are the main apoptosis regulators in mammals (8). An invertebrates simple physiology and genetic regulation also negates its relevance to studying complex diseases. The fact that the genome of Caenorhabditis elegans (a roundworm composed of fewer than 1000 somatic cells and 1 mm in length) contains just one-third fewer genes than humans underscores the importance of gene regulation and gene-gene/protein-protein interactions rather than gene number. Although an invertebrate has an important role in fundamental gene analysis, its simplistic phenotype limits its use in studying the physiology of mammalian species phenotypically determined by complex genetic interactions.
Mammalian species have greater biomedical relevance but, until recently, have not possessed genetic power. Because sufficient genetic information has only been available in rodents, most genetic techniques have been limited to these species. Although expensive in comparison to invertebrates, murine mutagenic screens have been successful in revealing novel human genotypes (9, 10). Several mutagenic strategies have been used in the past and are briefly reviewed by Stanford et al. (11). Because of the pitfalls associated with spontaneous mutagenesis (e.g., low frequency, ~5 x 106 per locus, only visible phenotypes detected), methods able to induce mutagenesis such as x-ray exposure are utilized. The use of x-ray mutagenesis began in the 1930s because it produced mutations more frequently than spontaneously occurring mutagenesis (20100x greater) and caused chromosomal rearrangements that provided landmarks for cloning (11). Because x-ray exposure is often difficult to interpret due to multiple genetic mutations, more powerful phenotype- or gene-driven approaches have been established for identifying gene function.
Phenotype-driven approaches (forward genetics) for identifying gene function are based on chemical mutagenesis (9, 10), while gene-driven approaches (reverse genetics) are based on insertional mutagenesis (11). The most common phenotype-driven approach in mice uses the potent chemical mutagen ethylnitrosourea (ENU) to generate random mutations in the genome. By mating ENU-exposed male mice to normal females, numerous mutant F1 progeny are produced and identified using various screening methods (e.g., relevant clinical symptoms). Mutated genes from abnormal (desired) phenotypes are identified and animals are archived by collecting and freezing sperm or ovaries from the F1 mutant.
Comparative mapping allows researchers to predict the identity and location of the corresponding human gene. Although a chemical mutagen could introduce multiple hits in a genome, ENU exposure usually results in monogenic mutations (9, 10). Although ENU exposure does not produce molecular landmarks for cloning, it has demonstrated broad utility in several mutagenic screening programs because it produces single-gene mutations and is amenable to high-throughput screening and sequence validation (11).
The gene-driven approach to mutagenesis uses insertional mutagens to target or trap genes, which may result in more controlled and targeted mutations. These techniques also have a very high mutation rate, resulting in nearly 100% transgenic animals once prescreened in vitro. Access to sequence data and molecular biological techniques, insertional mutagenesis, or transgenesis enables researchers to create biomedical models intended for specific diseases rather than waiting for a random mutant to occur from chemical induction. Bockamp et al. (12) reviewed techniques used to generate transgenic mouse models. Although the first transgenic techniques resulted in gene knockouts (constitutive transgenesis), recent techniques permit conditional control of gene expression (13). Thus, transgenesis may enable the creation of the ideal model by causing the over- or underexpression of a given gene or by inserting a novel DNA sequence into an animals germline.
The use of embryonic stem cells in culture (14, 15) and knowledge gained in regard to homologous recombination (HR) in mammalian cells (16) allows the targeting of specific genes. By establishing a precise site of integration and, consequently, influencing specific genes, HR in embryonic stem cells avoids the unfavorable effects that occur when sequences are randomly inserted. Selection markers have been developed to select cells that have incorporated exogenous DNA into their genomes (positive selection cassette) or that have excised the cassettes (negative marker) (17). Although gene targeting is more controlled and efficient than chemical mutagenesis, it still often fails to produce an animal model that resembles the desired human disease (11).
Many of the problems associated with traditional knockout strategies that exhibit constitutive expression of the transgene may be avoided with the conditional control of gene expression. In other words, investigators have the ability to activate or suppress gene expression without resulting in secondary pleiotropic effects. An ideal conditional transgenic animal would have the following characteristics: (i) reversible genetic switch, (ii) zero or low basal gene expression when the gene is switched off, (iii) high and rapid induction of gene expression when the gene is switched on, (iv) tightly controlled induction of gene expression without pleiotropic side effects, and (v) induction of the gene by highly specific nontoxic compounds (12, 13).
Binary transgenic systems that control the expression of a gene by the interaction of two components have been successfully used. With these systems, an effector trans-gene acts on a target gene to either activate or silence its expression. Site-specific recombination (effecting DNA) and transcriptional transactivation (effecting RNA) are two methods used to perform conditional transgenesis. DNA recombinases may be used to rearrange a target gene, thereby silencing its expression. Cre from the bacteriophage P1 and Flp from Saccharomyces cerevisiae are members of the integrase family and are commonly used for site-specific DNA recombination (18, 19). These recombinases are suitable for use in mammalian cells because they do not require the presence of any accessory proteins or high-energy co-factors for their activity. In this system, the conditional allele in the F1 progeny will be inactivated only in the tissues that express Cre. Because DNA recombination permanently alters transgene activation, its major drawback is that it is irreversible. Transcriptional transactivation systems such as the tetracycline-dependent regulatory system (20) and Gal4/ UAS system (21) only influence RNA and have several advantages: reversibility of the expression of the target gene, sensitivity of transgene activation levels to inducer concentration, and ability to control the expression of more than one target transgene (13).
Gene trapping, another insertional strategy, is more advantageous than gene targeting in many respects because it is not as labor intensive, it is able to report an endogenous gene-expression pattern, and it traps genes regardless of their transcriptional activity (11). Although three types of trap vectors exist (i.e., enhancer trapping, promoter trapping, gene trapping), gene traps seem to be the most effective. The gene-trap vectors contain a splice acceptor site immediately upstream from a promoterless reporter gene and are able to be inserted into a large collection of chromosomal sites (22). On the transcriptional activation of the endogenous cis-acting promoter and enhancer elements of the trapped gene, a fusion transcript is generated from the upstream coding sequence and the reporter gene, which simultaneously mutates the trapped gene and reports its expression pattern (11). The main disadvantage to gene trapping is that because it is inserted in an intron, alternative splicing may lead to lower levels of wild-type transcripts and result in hypomorphic alleles (23).
Chromosomal engineering, or recombineering, uses phage-based homologous recombination and has the capability to manipulate large segments of DNA such as those carried on by bacterial artificial chromosomes (BAC) or P1 artificial chromosomes, which is impossible with many cloning vectors (24). Several approaches involving the modification of the host bacterium have been developed to permit BAC manipulation, including inducible promoters permitting transient expression of bacterial recE and recT genes or other analogous bacteriophage lamda (
) genes (exo and bet) (25). Recently, an Escherichia coli strain harboring a defective
prophage was developed and promoted high BAC recombination frequencies. In this strain, the prophage provides the recombination genes exo, bet, and gam under the control of a temperature-sensitive
cl-repressor (26). The E. coli strain DY380 was generated by introducing the
prophage into the BAC host strain DH10B, providing a rapid, single-step method to generate subtle changes in any gene in BAC clones using oligonucleotides as targeting vectors (27). Therefore, in contrast to most E. coli cloning methods that require the use of restriction endonucleases to cleave DNA and DNA ligases to join DNA fragments, recombineering can be accomplished without using these enzymes. Using a polymerase chain reaction (PCR)-based selective amplification screen to identify targeted clones, Swaminathan et al. (27) demonstrated the ability of this system to generate single-base changes, deletions (up to 1.93 kb), and insertions of unique sequences in different regions of a BAC. Advantages of these systems are that only short segments of homology are required to direct recombination, and the high-efficiency rates allow recombinants to be screened rather than selected (24). Because recombinants can be screened, only one recombination step is required to create the desired modification directly, without the use of selective markers (e.g., drug-selectable markers, loxP sites, FLP restriction target (FRT) sites).
The use of double-stranded RNA (dsRNA) to cause RNA interference results in loss of function and is a powerful screening technique used to identify genes associated with specific biological processes. Using dsRNA to inhibit the expression of specific genes was first used in C. elegans by Fire et al. (28). These researchers reported that dsRNA was substantially more effective at producing interference than either strand individually and that the effect was evident in both those injected and their progeny (28). Several recent publications have provided potential mechanisms by which dsRNA cause the degradation of targeted messenger RNA (2932). Although the use of long dsRNA enables effective silencing of gene expression in lower organisms, it is of limited use in mammals because the introduction of dsRNA (longer than 30 nucleotides) induces an interferon response that is sequence nonspecific (33). To avoid the interferon response in mammalian systems, small interfering RNA or short hairpin RNA (shRNA) are used. Until recently, genome-wide RNA interference surveys of gene function were limited to nematode worms and fruit flies (34, 35). However, two groups have recently developed resources for large-scale, RNA interferencebased screens in mammals (36, 37). Berns et al. (36) reported the construction of a set of retroviral vectors encoding 23,742 distinct shRNA, which target 7914 human genes for suppression. Similarly, Paddison et al. (37) reported the construction and application of an shRNA expression library (comprising ~28,000 sequence-verified, shRNA-expression cassettes contained within multifunctional vectors) targeting 9610 human genes and 5563 mouse genes. Given the recent advances in this area, there is no doubt that RNA interference screens will continue to be important in determining mammalian gene function.
Because sequence information is required for the aforementioned techniques, their use has been limited to species with annotated genome sequences (e.g., invertebrates, mice). Regardless of the techniques used to generate mutants or transgenics, this information can be used to develop a mutant map of that species. Although a murine mutant map will not provide a detailed, comprehensive picture of all the networks and interactions of genes that contribute to complex diseases, it will lay the groundwork for mammalian genetics by expanding our understanding of the role of specific genes (38). Remarkable progress has been made with regard to mutagenic and transgenic strategies in the past few decades. As these strategies continue to be used and improved, their utility in mice genomics will continue to improve and their applicability to other mammalian species will be realized.
| Future View: The Marriage of Genetic Information and Biomedical Relevance |
|---|
|
|
|---|
Murine transgenesis methodology has been heavily studied and is highly successful in creating transgenic strains; however, it often fails to produce a model that resembles the desired human disease phenotype (11). Although many factors are likely responsible for the lack of success, differences in human and mouse life span and/or anatomy and physiology may play larger roles. Because of the vast differences in certain human and mouse organ systems, researchers must utilize the strengths offered by other animal models. For example, because divergence in lung and pancreatic anatomy between mice and humans is largely to blame for the lack of pathological lesions present in mouse cystic fibrosis models, an ovine model is now being used to study this disease (40). Prostate cancer is another area of research that the mouse is not best suited to study, given the differences in gross anatomy and micro-anatomy of the human and mouse prostate. Conversely, the dog is the only species besides humans to frequently develop spontaneous prostate cancer and has several advantages over other models: (i) both human and canine prostate cancer are strongly associated with age; (ii) like the human version, canine prostate carcinoma has a high propensity for osseous metastases; and (iii) dogs provide a large animal model, which makes imaging and diagnostic studies possible (41). Finally, our laboratories are currently using the pig as a model for the devastating disease ataxia-telangiectasia (A-T) because it is poorly reproduced in transgenic mice. Loss of function in both alleles of the human ATM gene gives rise to A-T, resulting in a progressive loss of motor control (ataxia) and early death (42). The absence of ataxia and only mild neuropathological defects in transgenic mice (Atm/) demonstrates the need for other animal models (43, 44). In addition to anatomical and physiological similarities with humans, pigs, like mice, can be genetically manipulated to lack the functional alleles of ATM, which results in a large animal model of A-T.
In addition to differences in anatomy and physiology, rodents may not be the most desirable in terms of gene homology. Although rodents are evolutionarily closer to humans than those from the Orders Carnivora (e.g., dog, cat) and Artiodactyla (e.g., pig, cow), the higher rate of nucleotide substitution observed in rodents may diminish their relevance in regard to comparative genomic techniques (45, 46). In a recent experiment the dog, cat, pig, and cow all had a higher percentage (~60%) of sequence homology to humans than did rats or mice (~40%) (46). Kirkness et al. (47) recently compared the canine genome (6.22 million sequence reads, 1.5x coverage) with drafts of the human and mouse genomes (National Center for Biotechnology Information Builds 31 and 3, respectively) using BLASTN. Despite much lower sequence coverage of the dog (1.5x), alignments covered a similar number of human transcripts and genes as the mouse (8x) (47). Other analyses discovered that although the level of nucleotide substitution was similar in dogs and humans, a 1.6-fold higher substitution rate was measured in the mouse (47). The recent reports of Thomas et al. (46) and Kirkness et al. (47) identified some of the limitations associated with rodent research and advocated the use of nonrodent species.
The Human Genome project windfall has greatly benefited other mammalian species such as the dog, cow, chicken, and pig, which are now having their genomes sequenced. Genome sequencing will soon recategorize these species as gene rich and will allow the use of enabling technologies (e.g., recombineering, transgenesis) to create appropriate animal models that possess more biological relevance than invertebrates or rodents. Genome characteristics, sequencing status, and on-line resources of common model organisms are presented in Tables 1
and 2
. By genetically manipulating animals that are more similar to humans in terms of anatomy and physiology, ideal animal models may be created.
|
|
| Functional Genomics and Proteomics: Providing Mechanisms and Intervention |
|---|
|
|
|---|
Successfully understanding clinical disease using genomic and proteomic technologies is a lofty goal that will be difficult to attain, especially for complex phenotypes or diseases. The process of determining gene function is a much more daunting task than once thought, as the one gene (mutant), one product (disease) theory most often does not apply. The fact that most mammalian genomes are estimated to contain only about 30,000 to 40,000 genes suggests that, in addition to the overall number of genes present in a genome, other factors such as temporal and spatial gene-expression patterns, alternative splicing, post-transitional modification, and protein-protein interactions greatly influence phenotype. Bioinformatic programs that are able to accurately translate complex genotypes into phenotypes by predicting the occurrence and relevance of these factors will be required to fully understand the complex organ systems of the human body and detect the abnormalities responsible for complex diseases. Before complex gene interactions can be interpreted, information regarding each individual gene must be collected and understood. Several high-throughput methods of assessing gene products including affinity precipitation (protein-protein interactions) (49), two-hybrid techniques (50), synthetic rescue (51), lethality experiments (52), and DNA microarray analysis (53, 54) have been developed in the last 25 years. The recent advent and acceptance of microarray technology, in particular, has made a major impact on biological research and has taken us another step closer to understanding complex biological systems.
The concept behind DNA microarrays is the precise positioning of DNA fragments (probes) at a high density on a solid support so that they can act as molecular detectors (5558). DNA microarray analysis can be used to identify sequence variations (e.g., single nucleotide polymorphisms, gene mutations) or determine the gene-expression level (abundance) of a set of messenger RNA molecules. DNA microarrays have broad utility because they can measure the expression of thousands of genes simultaneously, providing a global view of gene expression rather than only a few genes that limit classical techniques. To accurately interpret and compare microarray data, minimum information about a microarray experiment (MIAME) standards have been established for microarray experiments (59). Although criticisms still exist (e.g., expression levels are only relative to standard or reference, lack of standard methodology, quality control issues), microarray experiments that abide by the MIAME standards, are properly designed (60), and are validated using other molecular biological techniques (e.g., real-time PCR) (61) will play a crucial role in understanding complex biological systems.
The assessment of proteins may be referred to as proteomics and includes the measurement of proteins produced, identification of protein functions, and identification of protein-protein interactions (62). Because proteomic techniques provide a measure of RNA translation and post-translational modifications, they can be used in combination with DNA microarrays (i.e., measures of transcription) to generate more informative data. In addition to validating gene-expression data, measuring protein profiles and protein-protein interactions may identify critical posttranslational modifications that influence phenotypes. Because no single technology platform exists that satisfies all of the desired proteomic measurements (i.e., identify proteins, determine function, identify and interpret protein interactions), numerous tools are used in the field (62).
To date, two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) and mass spectrometry used in tandem has been the most widely used method for protein analysis (63). Proteins are separated by 2D-PAGE according to charge (i.e., isoelectric point) by isoelectric focusing in the first dimension and according to size (i.e., molecular mass) by sodium dodecyl sulfate (SDS)-PAGE in the second dimension (64). Proteins of interest can be isolated from the gel and identified via mass spectrometry. Because matrix-assisted laser desorption/ionization time-of-flight mass spectroscopy can determine the mass of a protein or peptide with increasing sensitivity and ease of use, it has become the method of choice for protein identification by peptide-mass mapping (62). Because gel-based systems are technically complex, labor intensive, cost intensive, and fundamentally limited, liquid chromatography-mass spectroscopy (LC-MS) systems and protein microarrays are also being developed (62). Protein microarrays provide excellent tools for proteomics research with a systems-oriented approach because only proteins of interest are measured. However, these chip-based proteomic systems have unique challenges, as prior knowledge of the proteins to be studied and appropriate affinity reagents is required (63). Shotgun proteomics, the combination of LC-MS and sequence database searching, has also been used for analyzing complex mixtures of proteins (65). Although effective LC-MS requires fractionation techniques to reduce complexity and isotope-coded affinity tags for quantitation, it is another reasonable alternative to gel-based techniques.
As scientists identify gene products and functions, proper gene nomenclature and classification must be performed to link with specific phenotypes. A standard method of naming and classifying genes is crucial for comparative research. The field of comparative genomics and proteomics would be complete chaos without the use of standardized methodology because a single gene product can have several molecular functions (phenotype), many gene products can share a single molecular function, and a gene may play different roles in different organisms. The Gene Ontology (GO) consortium (http://www.geneontology.org) was established to produce a controlled vocabulary that is applicable to all organisms and to establish guidelines for classifying gene function. According to GO, the function of a gene product may be described by its role in a biological process, molecular function, or cellular component. Because a gene product may have one or more functions in regard to biological process, molecular function, or cellular component, it may be classified under one or more of these categories. The GO molecular function terms represent activities that perform actions, but do not specify the location or in what context it takes place. GO biological process terms refer to one or more ordered assemblies of molecular functions, but are different than a pathway (e.g., signal transduction). Finally, GO cellular component terms describe the component of a cell and anatomical structure or gene product group of which it is a part.
The implementation of high-throughput techniques generates genomic data that challenge both scientists and clinicians, transforming it into relevant biological resources. Effectively navigating information from the large number and size of the molecular databases currently on the Internet is daunting. The first challenge is finding the most appropriate database(s) for the task at hand. The second challenge is to keep up with the latest information uploaded onto that site, as many are regularly updated with considerable amounts of new data. In fact, the amount of genomic data in the public database at the National Center for Biotechnology Information doubles every 18 months (66). Although some databases have a broad scope and contain information that is useful for a wide range of biological scientists, others are designed to focus on metabolic pathways or genes of a specific disease and are useful to scientists studying in that field of research. Baxevanis (67) assembled a list of approximately 400 high-quality molecular biology databases that may help scientists and clinicians choose which database is the most appropriate.
Although it has not yet reached the bedside, functional genomics and proteomics have numerous clinical applications that will someday aid in the development of early detection devices, diagnostics, and therapeutics of complex diseases. Currently, most complex diseases are detected in late stages and, consequently, have poor prognoses. Therefore, a great need exists for screening tools that can detect diseases at early stages when more intervention strategies are available and survival rate is greater. The importance of early detection may be exemplified using ovarian cancer survival rates and stage detected: Patients having ovarian cancer detected in Stage I have a very high 5-year survival rate of 95%, compared with cases detected in late stages who have a 5-year survival rate of 35% to 40% (6870). Although an effective clinical biomarker should be measurable in an accessible bodily fluid such as serum, urine, or saliva (71), tissue biopsies are often required as a starting point. Using advanced computer algorithms, researchers are already beginning to identify serum or tissue gene expression and protein profiles or signatures of diseases for early detection and diagnostic purposes. Researchers focused on diseases such as inflammatory bowel disease (72); hepatic carcinoma and liver diseases (73); and breast (74), prostate (75), and ovarian (76) cancers are already using gene expression and protein profiles for this purpose. In addition to early detection, disease signatures may provide information regarding the event(s) that initiated the disease and the movement toward developing effective prevention and intervention strategies.
| Perspectives |
|---|
|
|
|---|
| Acknowledgments |
|---|
| Footnotes |
|---|
| References |
|---|
|
|
|---|
tk, for use in embryonic stem cells. Genesis 28:3135, 2000.[Medline]
This article has been cited by other articles:
![]() |
W. G. Bergen and H. J. Mersmann Comparative Aspects of Lipid Metabolism: Impact on Contemporary Research and Use of Animal Models J. Nutr., November 1, 2005; 135(11): 2499 - 2502. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |