|
|
||||||||
,1
* Department of Pediatrics, Section of Pulmonary Medicine, and
Department of Medicine, Division of Endocrinology, Diabetes and Metabolism, University of Colorado at Denver and Health Sciences Center, Aurora, Colorado 80045
1To whom requests for reprints should be addressed at Department of Pediatrics, Section of Pulmonary Medicine, University of Colorado at Denver and Health Sciences Center, Mail Stop 8119, P.O. Box 6511, Aurora, CO 80045. E-mail: Mark.Duncan{at}uchsc.edu
| Abstract |
|---|
|
|
|---|
Key Words: DIGE difference gel electrophoresis
| Introduction |
|---|
|
|
|---|
In proteomics there are several distinct analytical platforms for the task of discovering new biomarkers. These approaches differ substantially in design, and this influences the speed (throughput), cost, complexity, and, most important, the fundamental aspects of the data they return. In this paper we will outline common proteomics approaches and discuss how proteomics methods are applied in biomarker studies by way of examples from our own work.
| Major Paradigms in Biomarker Discovery |
|---|
|
|
|---|
|
Profiling continues to attracted considerable interest, in large part because of its potential to deliver high-throughput data. In particular, attention has been directed at its application in a clinical laboratory setting. However, increasingly this approach is viewed with skepticism. In part this is because different groups undertaking essentially the same analyses have come up with different diagnostic features (10). Further, it is increasingly apparent that when compared with alternative biomarker discovery strategies, the spectra are not rich in information, and the complexity of the proteome is substantially underestimated. However, perhaps the most serious concern is that it is difficult to move away from this platform to the next step and to identify the distinguishing "peaks" in the profiles. This is necessary if the tests are to be validated by independent methods, if mechanistic data are to flow from these investigations, or to develop platform-independent assays for these protein markers. Finally, and unfortunately, several studies based on a compromised experimental design and/or suboptimal data analysis have profoundly dampened enthusiasm for this approach (4, 1015). These experiences have lead to scrutiny of this approach and highlighted the need for due diligence in all phases of the biomarker discovery and validation process.
2. 2D Gels Combined with Mass Spectrometry.
Another commonly adopted approach to discovery proteomics involves 2D gel electrophoresis (2DGE). Here a complex mixture of proteins is separated in the first dimension based on isoelectric point (pI) and then, orthogonally, in the second dimension based on molecular weight (SDS-polyacrylamide gel electrophoresis [SDS-PAGE]). Thousands of proteins can be resolved on a single gel, the gel stained to visualize them, and their relative amounts determined. The protein spots can be excised from the gel, digested with a protease (usually trypsin), and analyzed by mass spectrometry to identify the parent proteins. Protein identities are determined by comparison of experimental data to primary sequence databases. Gel-based approaches are most frequently coupled with MALDI-ToFMS to generate a mass fingerprint (or map), but liquid chromatography combined with tandem mass spectrometry (LC-MS/MS) is also used, especially to resolve ambiguities. Changes in the abundance and/or position of spots on the gel can be used to make detailed qualitative and quantitative comparisons between two or more conditions. The success of the 2D gel strategy is illustrated by over 1500 papers reporting the application of this approach.
Gels are a powerful approach to separation, but proteins at the extremes of either molecular weight or pI are underrepresented by this approach, and, in particular, hydrophobic proteins are difficult to analyze without resorting to specialized methods.
Enthusiasm for performing comparative proteomics studies based on 2D gel electrophoresis was dampened by the irreproducibility of the technique, and consequently, the problems associated with making comparisons across gels. Difference gel electrophoresis (DIGE) is an emerging technology that provides an increase in analytical precision, dynamic range, and sensitivity. DIGE facilitates repetitive measurements and multivariable analyses in a single, coordinated experiment because it allows for the incorporation of an internal standard composed of equal amounts of every sample.
In DIGE, two distinct protein samples (e.g., representing normal and disease) are separately labeled with two cyanine dyes, mixed, and then separated by 2DGE on a single gel (16). Fluorescent laser scanning of the gel at two distinct excitation and emission wavelengths generates two distinct images that can be superimposed on a pixel-to-pixel basis. Relative increases and decreases in the levels of the proteins can therefore be quantified precisely. Furthermore, a third dye is available that can be used as an internal standard to increase quantitative precision and improve protein spot matching when comparing multiple gel images (17). Without the benefits of the two-dye (or three-dye) strategy, high analytical variability makes it difficult to determine biological variability, and this has stymied previous attempts to obtain statistically meaningful results. The capability to run two samples on the same gel eliminates this problem.
3. Shotgun Strategies: LC-MS/MS.
Increasingly, proteomic investigations use the shotgun strategy, an approach based on proteolysis of proteins followed by analysis of the complex peptide mixture by LC-MS/MS (18). Sequences are then assigned to the MS/MS spectra by automated database searching algorithms. The approach is well suited to automated analysis and gives good coverage of the diverse array of proteins present in biological samples, and with the current generation of LC-MS/MS systems, this approach offers unrivaled sensitivity. Shotgun strategies are a powerful approach to gaining specific information regarding the peptide constituents of a complex mixture, but it is important to recognize that the experimental process begins with enzymatic digestion. Consequently, unambiguous reassembly of the peptide sequences into their precursor proteins is rarely, if ever, possible, especially in higher eukaryotic organisms, because a single peptide sequence can be represented in several distinct and biological active proteins (isoforms).
Shotgun strategies can also be combined with stable isotope labeling to allow for the quantification of changes in protein expression levels of hundreds to thousands of proteins in a single experiment. The most commonly adopted approaches include ICAT (1921) and iTRAC (22) methods. However, quantification is again based on relative changes in the levels of labeled peptides that may be common to a family of proteins, with differential regulation/ abundance, and therefore quantification experiments can lead to ambiguous or conflicting results.
| Key Considerations in Biomarker Research |
|---|
|
|
|---|
The Early Detection Research Network, established by the Division of Cancer Prevention, National Cancer Institute, has identified five separate phases in the development and testing of disease biomarkers (24). These phases are summarized in Table 2
. Notably, they clearly distinguish between the tasks of identifying promising directions (or phase 1, identification) and the subsequent careful characterization of these candidates (phases 25).
|
The three paradigms discussed here have been widely employed and over the past decade or more have delivered a continuous outpouring of data. However, of the thousands of putative biomarkers identified by these strategies, few have been validated, and even fewer have made their way into routine clinical use. In the following sections we highlight some important considerations in the area of new biomarker discovery and subsequent clinical implementation, and we illustrate these points with several examples from our own work.
The Discovery Phase.
Qualitative Considerations.
During the discovery phase of biomarker work, the most assiduous approach is one that allows comprehensive qualitative and precise quantitative analysis of biological samples with the objective of identifying the largest possible set of proteins that distinguish between the control and test populations. Currently, discovery is usually undertaken by employing either 2DGE or LC-MS/MS: two distinct strategies that return complementary data sets. For example, although 2D gels unveil some of the complexity inherent in each gene product, the technique usually fails to provide representative coverage of small, basic, and low-abundance proteins. Shotgun strategies deliver more comprehensive representation of the protein complement, but with this approach information on each distinct gene product, including truncations, alternative splicing, and other modifications is typically lost. Not surprisingly then, when both techniques have been applied to the analysis of the very same sample, the overlap between the data sets is modest (25, 26), and it therefore follows that the application of several tools is advisable if not essential. This applies whether the objective is to search for biomarkers or to unravel the complexities of a biological system.
We have employed multiple analytical strategies in most of our studies, and although some of these approaches are cumbersome, slow, costly, and quantitatively imprecise, the objective has always been to obtain the most comprehensive coverage of the proteome as is possible. Gel-based strategies are pivotal in our discovery platform because these allow separation of intact proteins. It is increasingly evident that this is important because post-translational and post-transcriptional variants are ubiquitous in eukaryotic systems and include phosphorylation, glycosylation, sulfation, ubiquitination, truncation, and alternative splicing (27). These modifications frequently transform a single gene product into multiple variants that are chemically and functionally distinct.
1. Analysis of Human Seminal Plasma.
The need to adopt alternative strategies is well illustrated in our studies of human seminal plasma (26). In this work the peptide and protein components of pooled human seminal fluid were identified by a combination of techniques including gel electrophoresis (1D and 2D), MALDI-ToFMS, and LC-MS/MS. Over 100 unique protein and peptide components of normal human seminal fluid were identified, including over 20 distinct forms of prostate-specific antigen (PSA). This, of course, raises questions about the suitability of existing tests aimed at only one epitope as a biomarker. The epitope that is targeted and the specificity of any PSA antibody may be critically important because the form of PSA that is biologically relevant in specific diseases, such as prostate cancer, is yet to be defined. Published PSA assays often involve the use of a cocktail of antibodies targeted at different forms of PSA, but in the absence of information about the diagnostic utility of each form, this strategy may be a serious compromise. Complete characterization of these different molecular forms is required so that investigators can establish the specific forms of PSA (or other proteins) that might serve as the best biomarkers of health and/or disease. This approach has the potential to significantly enhance both the specificity and sensitivity of current diagnostic tests.
2. Analysis of Human Urine.
In a study of the proteins in human urine, we identified a heparan sulfate proteoglycan fragment on 2D gel analysis at an apparent molecular weight of ~20 kDa (Hunsucker and Duncan, unpublished observations). Figure 2
shows the peptide mass map obtained by MALDI-ToFMS. This indicates almost complete sequence coverage of the C-terminal fragment. The identity of the spot on the gel was also confirmed by LC-MS/MS analysis of the same tryptic digest used for MALDI-ToFMS analysis. Although the intact protein has a molecular weight of 450 kDa, we could account for only AA 42244391 (4391 being the C-terminal amino acid). The predicted molecular weight of this region of the protein is 17.9 kDa, consistent with its mobility on the gel. By using multiple strategies (2D gels, MALDI-ToFMS, and LC-MS/ MS), we were able to demonstrate that there is a heparan sulfate proteoglycan C-terminal fragment present as a component of normal human urine. LC-MS/MS alone (shotgun) would not have been able to determine the intact molecular weight of this species in the original urine sample. Several other groups have found this protein in normal urine, two by shotgun approaches (25, 28) and the other by 2DGE (29), but none of these reports made any mention of truncation.
|
4. Analysis of Human Cell Lines.
A final example is taken from a recent study of human nonsmall-cell lung cancer (NSCLC) cell lines. In this study aimed at identifying novel markers of a patients sensitivity or resistance to a specific line of therapeutic intervention, we compared the same cell line before and after exposure to the drug. We found several examples of differential regulation of protein isoforms after treatment. Figure 3
shows one such example where we picked and identified both of the circled spots as the same protein, triosephosphate isomerase. The more basic isoform was downregulated following treatment; the more acidic isoform was upregulated (Hunsucker, Solomon, and Duncan, unpublished observations).
|
Quantitative Considerations.
As a component of the discovery phase, it is necessary not only to identify proteins but also to be able to quantify the differences in levels between two or more populations (e.g., control vs. disease). In our own work we have placed special emphasis on precise quantification of intact proteins and their variants. The primary quantitative tool in our discovery platform is DIGE. This approach allows separation of intact proteins, and, consequently, protein isoforms can be independently identified and quantified. This proves to be a critical step because of the propensity for post-translational modifications to convert a single gene product into multiple functionally and chemically distinct entities. In addition, DIGE offers precise relative quantitative comparisons between two samples, and, when combined with mass spectrometry, the differentially expressed proteins can be targeted and identified. DIGE alone can only provide quantitative information on proteins; spots have to be excised from the gels, digested with trypsin, and then analyzed by MALDI-ToFMS and/or LC-MS/MS for identifications to be made.
Protein Identification.
MALDI Versus LC-MS/MS. Protein identification from simple mixtures or pure proteins (i.e., excised gel bands) based on MALDI-ToFMS (i.e., mass mapping or mass fingerprinting) is fast, reliable, and easily automated. We download raw mass spectrometry data, de-noise, baseline correct, calibrate, detect peaks based on signal-to-noise ratio, and match the resulting peak lists against theoretical mass maps, all in a batch processing format. This strategy allows hundreds of proteins to be identified in several hours without the need for operator intervention. Automated processing does not obviate the requirement for careful review of the data, including visual inspection of the spectra and the search results to ensure validity. The crucial elements required for data processing are a reliable primary sequence database, accurate calibration of the mass spectrum, accurate peak detection (i.e., defining peak signal-to-noise ratio), accurate monoisotopic mass assignment, and contaminant peak removal. Algorithms that set an intensity threshold for peak detection are not amenable to automation because spectra have variable background and noise levels. Although most protein identifications yield to MALDI-ToFMS analysis, in rare instances (<10% in our hands) there is insufficient information available to make a match, or confounding peaks are present in the peptide mass fingerprint. In these instances we adopt LC-MS/MS because it has the advantage that it employs chromatography to separate peptides before ionization, and, increasingly, LC-MS/MS offers sensitivity advantages over MALDI-ToFMS. Although more time consuming, determining the components of a mixture by LC-MS/MS returns more comprehensive information (i.e., peptide sequence coverage) on portions of the protein(s).
Protein identification from complex mixtures (e.g., via shotgun methods) is a much more challenging task, and currently there is no standard method for the analysis and validation of mass spectrometric data. Some investigators have begun to think about guidelines and standards, but these are yet to be put into universal practice (31). Further, even when these guidelines are followed, the false-positive rate for protein identification can be high, particularly in the absence of expert inspection of the data (32). There are numerous protein identification algorithms and multiple approaches for assessing the statistical significance of protein identifications, but in the wrong hands these can generate misleading data (3336). Strategies based on LC-MS/MS are unquestionably powerful, but the dual processes of data generation and interpretation are far from trivial.
There is a need for improved software that allows for reliable automated identification of proteins from mass spectrometric data. Automation will markedly reduce the need for operator intervention and, if properly implemented, will reduce user error and bias. Unfortunately, however, the analysis of data generated in a proteomics study remains the major cause of delay, frustration, and errors for many research groups.
The Role of Mass Spectrometry Post-Discovery.
Most biomarker discovery work stops short of validation because the processes and tools used for discovery are very different to those required subsequently. However, there is an urgent need for general strategies for the quantification of proteins that can be applied in the biomarker validation phase, and this area has received relatively little attention. The current mind-set is that mass spectrometry has no place past the discovery phase, but this may be shortsighted.
We and others have been active in applying mass spectrometry to the quantification of peptides and proteins in biological tissues and fluids. These methods can prove to be precise and sensitive and are capable of providing absolute protein levels. The most promising and powerful strategies incorporate sample cleanup steps, digestion with trypsin, thoughtful selection of internal standards (either structural analogs or stable isotope-labeled standards), and then mass analysis of the peptide mixture. The approach requires the careful selection of an internal standard and a proteolytic fragment that can be cleaved, isolated reproducibly in high yield, and measured over a broad dynamic range.
Our own strategies have employed MALDI-ToFMS (3739); others have focused on LC-MS/MS (4043). Kuhn and colleagues adopted nanoflow chromatographytandem MS on a triple quadrupole mass spectrometer (MRM mode), and they reported this to be a powerful approach to prescreening candidate protein biomarkers in human serum before antibody and immunoassay development (43). Muddiman and colleagues have adopted a similar strategy (protein cleavageisotope dilution mass spectrometry [PC-IDMS]), and they also concluded that PC-IDMS is a promising technique for quantifying proteins. They specifically note the potential of this approach to standardizing immunoassays, monitoring post-translational modifications and quantifying newly discovered biomarkers before the development and implementation of an immunoassay (40). Gygi and colleagues have also used isotopically labeled internal standards and a similar LC-MS/MS strategy for the precise determination of protein expression and post-translational modification levels in cell lysates (42). An LC-MS/ MS approach that can independently confirm the diagnostic potential of a target protein identified during the discovery phase is particularly attractive because it is not based on antibodies of ill-defined specificity that can be both costly and time consuming to acquire. LC-MS/MS can be quickly implemented and offers specificity and sensitivity that is more than adequate for most circumstances.
The place for mass spectrometry further along the validation and implementation pipeline is less obvious. Although MALDI-ToFMS, 2DGE and LC-MS/MS are powerful tools to probe the complexity of a biological sample at the discovery phase, with few exceptions they are suboptimal for subsequent steps. During the discovery phase, generic conditions for protein isolation, separation, and mass analysis are employed in an attempt to gain comprehensive coverage of the proteome. However, under these conditions no single protein is measured cost effectively or with optimal speed, sensitivity, or precision. Once a target analyte (or set of analytes) is identified, higher-throughput, more cost-effective and precise analytical methods are required for routine testing. Some investigators have enthusiastically proposed the application of mass spectrometry throughout the whole process, but there are several fundamental reasons for why this is rarely, if ever, the best choice.
As the molecular weight of a compound increases, its physical properties begin to place limits on the utility of the mass spectrometer to serve as a sensitive and precise quantitative tool. As mass increases, the natural isotopic envelope distributes the total ion current for a given population of molecules across multiple species. (Or at low resolution, the outcome is exceedingly broad peaks.) Even at the mass of insulin (i.e., 5730 daltons), the natural isotopic distribution extends across 10 separate detectable species. At higher molecular weights (e.g., ~50,000 daltons), the natural isotope envelope is distributed across more than 25 distinct species. This is not a result of an instrumental limitation but is a fundamental property of biomolecules and is therefore insurmountable. The distribution of the signal across so many distinct species dramatically compromises the achievable limit of detection, confounds the potential to distinguish between closely related species of similar mass, and complicates the selection of an internal standard.
Contrary to common dogma, mass spectrometry is far from ideal for intact protein quantification and techniques based on immunorecognition can offer superior "selectivity," excellent precision, and high sensitivity. There are very few instances where mass spectrometry is being applied to the quantification of proteins, especially in a routine setting, and, where it has been employed, it invariably involves proteolysis and subsequent analysis of specific peptide fragments (3942). In stark contrast, assays for a wide array of proteins are routinely and cost effectively performed by ELISA methods with both accuracy and precision, and, further, the hardware necessary to perform these assays is easy to use, affordable, and ready available. In fact, immunoassays are the backbone of routine clinical pathology worldwide. The most significant advantage of mass spectrometric approaches is that they do not require the generation of specific antibodies and therefore can be developed expeditiously. Therefore, protein quantification by mass spectrometry might sometimes offer special advantages, such as enhanced specificity (39), but immuno-assays deliver the most cost-effective, precise, accurate, sensitive, and transportable approach to protein determinations in almost all instances.
| Conclusions |
|---|
|
|
|---|
We have discussed our strategy for biomarker discovery and presented some of our own data to illustrate key practical concerns. Currently there are biomarker discovery studies under way at many centers worldwide, and almost all of these are employing different approaches. While any specific discovery approach can be rationalized, there is no standard analytical strategy, nor should there be. Different approaches to protein isolation select for a distinct subset of the proteome (e.g., acid, basic, low molecular weight, high molecular weight), and different separation and detection methods offer unique advantages. Each combination of methods provides a different view of the proteome and has the potential to yield valuable insights and promising biomarkers. This diversity in experimental approaches during the discovery phase is highly desirable and will ultimately help to reveal the complexity of the proteome and provide the largest sample set of potential biomarkers for further investigation. Thereafter, carefully designed and rigorously controlled validation studies will establish the clinical utility of each candidate biomarker, either when employed alone or in combination with others. Despite a decade or more of active research and considerable hype along the way, we are only now beginning to see candidate markers make their way through the whole process. Current indications are that proteomics is beginning to deliver on its promise to provide sensitive and specific disease biomarkers that can significantly improve our ability to diagnose and treat a broad spectrum of diseases.
|
| Acknowledgments |
|---|
| Footnotes |
|---|
2 BiomarkerA characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathogenic processes, or pharmacologic response to a therapeutic intervention. ![]()
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
R. T. Netea-Maier, S. W. Hunsucker, B. M. Hoevenaars, S. M. Helmke, P. J. Slootweg, A. R. Hermus, B. R. Haugen, and M. W. Duncan Discovery and Validation of Protein Abundance Differences between Follicular Thyroid Neoplasms Cancer Res., March 1, 2008; 68(5): 1572 - 1580. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Corton, J. I. Botella-Carretero, J. A. Lopez, E. Camafeita, J. L. San Millan, H. F. Escobar-Morreale, and B. Peral Proteomic analysis of human omental adipose tissue in the polycystic ovary syndrome using two-dimensional difference gel electrophoresis and mass spectrometry Hum. Reprod., March 1, 2008; 23(3): 651 - 661. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. W. T. Chan, C. G. Howes, and L. J. Foster Quantitative Comparison of Caste Differences in Honeybee Hemolymph Mol. Cell. Proteomics, December 1, 2006; 5(12): 2252 - 2262. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |