| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Editorial |
Diabetes Unit and Departments of Medicine and Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts 02114; Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts 02141; and Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
Address all correspondence and requests for reprints to: Jose C. Florez, M.D., Ph.D., Diabetes Unit and Department of Molecular Biology, Simches Research Center 6720, 185 Cambridge Street, Massachusetts General Hospital, Boston, Massachusetts 02114. E-mail: jcflorez{at}partners.org.
Our ability to detect genetic contributors to disease riskor the root causes of any other biological process, for that matterrests on three key parameters: the magnitude of the effect, the quality of the measure, and the quantity of observations. In regards to the latter, the size of affected family pedigrees, the amount of informative transmissions from parent to offspring, or the number of cases and controls in an association study bear directly on the likelihood that a genetic analysis will generate true positive results. Thus, the study of genetic traits that impair fertilityand therefore diminish the number of available observationspresents formidable challenges and illustrates the particularly arduous road undertaken by investigators of reproductive phenotypes (1).
The magnitude of the effect of a particular genetic variant on a phenotype is determined by the complex interplay between nature and nurture; the effect of nature is largely fixed, whereas environmental interactions with the gene variant itself may be difficult to control in human epidemiological studies. But when the magnitude of the genetic effect is large enough, as it often occurs in Mendelian diseases, its contribution may be detected despite the variability inherent to human behavior. In these monogenic traits, a single mutation has a large effect on protein function and the resulting phenotype, such that the variant is deterministic in regards to the trait it causes: its presence almost universally heralds disease (with adjustments for penetrance), and its absence is protective. Thus, it is not surprising that the last decade has witnessed a burgeoning of reports yielding fundamental knowledge on our understanding of genes involved in monogenic diseases that affect fertility (2).
The path is much thornier in complex, polygenic diseases. These phenotypes are caused by a number of genes, each with a modest effect on the trait under study. Their impact on the individual is probabilistic, rather than deterministic: there is no longer a 1:1 relationship between polymorphism and phenotype. Given the small magnitude of the effect, investigators must do their best at optimizing the other two parameters available to experimental manipulation, namely the quality of the measure and the quantity of observations.
The quality of the measure improves with advances in genotyping technology and with exquisite refinement of the phenotype under study, which often requires a profound understanding of human physiology in health and disease. A comprehensive assessment of how to interpret such probabilistic estimates requires a firm grasp of biostatistics, in which the investigator rigorously accounts for the possibility that, out of many genetic variants examined, a spurious association may surface by chance. And it behooves geneticists to examine the largest possible number of samples, while preserving phenotypic quality and homogeneity. A careful and laborious illustration of how to do so in a common reproductive phenotype, the polycystic ovary syndrome (PCOS), is presented in an article by Urbanek et al. (3) in this issue of the journal.
The authors of the report have themselves made significant contributions to the clinical definition of PCOS and to our knowledge of genetic influences on its manifestation (4). In a previous analysis (5), a group of 150 families [44 affected sib pairs (ASPs) and 163 trios, named "set 1"] was screened for both linkage and association to 37 candidate genes. In that study, the most significant linkage occurred for a variant in the follistatin gene and withstood correction for multiple hypothesis testing (Pc = 0.01); the strongest nominal association was found for marker D19S884 in the insulin receptor (INSR) region but did not survive correction for the number of hypotheses examined. Unfortunately, no clear causal variant in the follistatin gene was identified upon sequencing its promoter and coding regions, and subsequent linkage and association studies yielded negative results (6, 7). Interestingly, a very small follow-up study of 85 cases and 87 controls by another group did replicate the association of marker D19S884 with PCOS (7).
For this manuscript, Urbanek et al. (3) assembled a replication group ("set 2") consisting of 217 families that included 63 ASPs and 227 trios. Probands were carefully ascertained according to accepted diagnostic criteria. The authors decided to label sisters as "affected" if they had isolated evidence of hyperandrogenemia even in the absence of menstrual irregularities, on the basis of its widespread presence in PCOS and its documented heritability (8). Unaffected female relatives were ascertained conservatively, and male relatives were labeled as "unknown" because of the lack of a convincing related male phenotype.
The authors concentrated on a 13-Mb segment spanning the INSR region, as a follow-up of their prior association result. They selected 19 short tandem repeat (STR) polymorphisms and tested their samples for linkage through standard identity by descent (IBD) methods, and for association through the transmission disequilibrium test (TDT) in families with a single affected offspring, or the analogous pedigree disequilibrium test (PDT) in families with more than one affected offspring.
The TDT, which was originally developed by this studys senior author in a landmark contribution to the genetic literature on type 1 diabetes (9), is a family-based test of association. Trios (parents and offspring) are ascertained by the affected status of the offspring. Under the null hypothesis where a given candidate variant has no impact on disease, a heterozygous parent has a 50% chance of transmitting that variant to his/her affected child. However, because the trios are ascertained by disease status of the offspring, if the variant influences risk of disease there should be a deviation from 50:50 transmission (upward if the variant is deleterious, and downward if it is protective). The statistical significance of this deviation can be evaluated by a simple
2 test of the observed number of transmissions vs. the expected 50% result under the null. Verifying that such deviation does not occur in unaffected individuals (a phenomenon known as transmission ratio distortion, for instance if a variant affects overall survival in the general population) is an important control to perform.
An attractive feature of the TDT is that its family-based design makes it quite robust to population stratification. In studies of unrelated cases and controls, inadvertent population substructure may give rise to differences in allele frequencies that are due not to the disease status of the sample but to unrelated confounders, most often a diverse ethnic ancestry (10); if the proportion of admixed individuals is much larger in one of the two groups, the allelic frequency differencesattributed to disease status but in fact caused by divergent population historiesmay produce statistically significant but false positive associations. Family-based association tests, by and large, overcome this difficulty.
In the linkage portion of this study, Urbanek et al. (3) find nominal, uncorrected P values for IBD that range from 0.01 to 0.05 in both sets. Because they do not provide the full linkage data for all markers, it is not clear whether any single marker was replicated, or whether the evidence for linkage in each marker grew as the second set of samples was added to the analysis (as it should if the finding is real). They do state that they "consider this finding modest follow-up support for the set 1 results in regard to linkage." Significantly, IBD was 60% with nominal P values less than 0.05 for all markers in a 6.6-Mb region bounded by INSR and STR D19S840, with IBD decaying on either side of this interval. Given the relatively inferior power of linkage analyses when compared with association approaches in detecting modest genetic effects (11), it is not surprising that the linkage evidence presented for this polygenic trait is weak.
On the other hand, the TDT analysis showed nominal, uncorrected P values less than 0.05 for various markers in both sets 1 and 2, with one marker (D19S884) showing significant association with PCOS for allele 8 in both sets. Combined analysis reveals stronger evidence of association for D19S884, with a nominal P value (<0.0006) that survives correction for multiple hypothesis testing by permutation (Pc = 0.034); even when the punitively conservative Bonferroni correction is used, the P value almost reaches conventional empirical significance (Pc = 0.056).
The authors go on to clarify marker-marker interactions by testing a nearby STR (D19S922) that also showed nominal evidence for association. When the TDT is conditioned on individuals with the wild-type allele at D19S922, it still reveals highly significant transmission distortion for the D19S884 A8 allele, suggesting that the association signal arises from the latter. Their results are further validated by the use of the PDT in multiplex families and by ruling out transmission ratio distortion in unaffected samples.
This study tackles an important question in a very complex phenotype. Recognizing the caveats that must be kept in mind when conducting a thoughtful genetic association study, the authors meticulously carried out a number of key tasks: 1) they appropriately acknowledged their limitations in statistical power and increased their sample size; 2) they performed two separate genetic analyseslinkage and associationin a candidate region (although, because they were performed on the same population, the results from each cannot be considered independent); 3) they confirmed that nominal evidence for linkage was retained for all markers in the proposed interval; 4) they elected to use the TDT as an association test, thus controlling for population stratification; 5) they independently replicated previous findings of association; 6) they employed appropriate controls, by ruling out transmission ratio distortion in unaffected individuals and by performing the PDT in multiplex families; and 7) they corrected their P values for the multiple hypotheses examined, both by permutation testing and by the overly punitive Bonferroni method (which assumes that all tests are independent, which is not true when correlation exists between nearby genetic variants due to linkage disequilibrium). Taken together, these measures lend significant evidential weight to their main association result.
Nevertheless, this study is limited in two respects. First is the sparse density of polymorphisms in such a large region (one STR per 80 kb at the highest resolution), in part due to the choice of STRs rather than single nucleotide polymorphisms (SNPs) as markers of genetic variation. Resources such as the expanding inventory of publicly available SNPs in the human genome (12), the human haplotype map developed by the HapMap project (13, 14), and high-throughput SNP-based genotyping platforms make it possible to assay a much more comprehensive set of common variants in a genomic segment of interest. And second is the still modest sample size: experience from work on other complex metabolic traits has shown that, absent a significant genotypic risk, thousands of samples are usually needed to document valid and generally believed genetic associations (15, 16, 17, 18).
Be that as it may, the main question still remains: where is/are the causal variant(s)? Although the strongest evidence for association was found for the A8 allele of D19S884, this STR may merely signal its haplotypic correlation (linkage disequilibrium) with an as-yet-undetected functional polymorphism. The authors speculate on three nearby genes (ELAVL1 encoding an mRNA binding protein, CCL25 encoding a thymus-expressed chemokine, and FBN3 encoding a member of the fibrillin family of extracellular matrix proteins) and rightly point out that a creative scientist can elaborate a convincing enough biological story about any one of them.
They also downplay two other attractive candidate genes on chromosome 19, INSR (encoding the insulin receptor) and RETN (encoding the adipokine resistin), due to their relatively large physical distances from D19S884 (800 kb and 420 kb, respectively). Whether linkage disequilibrium is preserved over such distances in that particular region of the genome can be easily verified by downloading the available data from the HapMap web site, www.hapmap.org (13, 14): a cursory examination of the haplotype structure in Caucasians reveals that, indeed, linkage disequilibrium breaks down between D19S884, INSR, and RETN, illustrating the high probability of historical recombination between the loci. Thus, it is unlikely that the association signal at D19S884 is directly due to variants in INSR or RETN. On the other hand, as the authors suggest, regulatory elements acting over such long distances do exist in the human genome; whether D19S884 itself has an effect on INSR or RETN expression awaits functional studies.
In the meantime, replication of this finding in other existing PCOS cohorts will be an essential step toward establishing this result as a widely accepted, reproducible association: such are the demands of the scientific method. When conducted properly in adequately powered samples, true associations are often replicated (19, 20); only in such large, collaborative effortsespecially in the fertility fieldcan we hope to elucidate the genetic architecture of complex phenotypes.
Acknowledgments
I thank David Altshuler and William F. Crowley, Jr. for their guidance and mentorship and Corrine Welt for valuable comments on this manuscript.
Footnotes
This work was supported by National Institutes of Health Research Career Award 1 K23 DK65978-02.
Abbreviations: ASP, Affected sib pair; IBD, identity by descent; INSR, insulin receptor; PCOS, polycystic ovary syndrome; PDT, pedigree disequilibrium test; SNP, single nucleotide polymorphism; STR, short tandem repeat; TDT, transmission disequilibrium test.
Received October 5, 2005.
Accepted October 7, 2005.
References
Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nat Genet 26:7680[CrossRef][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Endocrinology | Endocrine Reviews | J. Clin. End. & Metab. |
| Molecular Endocrinology | Recent Prog. Horm. Res. | All Endocrine Journals |