help button home button Endocrine Society JCEM
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Full Text (PDF)
Right arrow Submit a related Letter to the Editor
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow Request Copyright Permission
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Florez, J. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Florez, J. C.
Right arrowPubmed/NCBI databases
*Genetics Home Reference
Related Collections
Right arrow Female Endocrinology
The Journal of Clinical Endocrinology & Metabolism Vol. 90, No. 12 6732-6734
Copyright © 2005 by The Endocrine Society


Editorial

Editorial: Genetic Susceptibility for Polycystic Ovary Syndrome on Chromosome 19: Advances in the Genetic Dissection of Complex Reproductive Traits

Jose C. Florez

Diabetes Unit and Departments of Medicine and Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts 02114; Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts 02141; and Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115

Address all correspondence and requests for reprints to: Jose C. Florez, M.D., Ph.D., Diabetes Unit and Department of Molecular Biology, Simches Research Center 6720, 185 Cambridge Street, Massachusetts General Hospital, Boston, Massachusetts 02114. E-mail: jcflorez{at}partners.org.

Our ability to detect genetic contributors to disease risk—or the root causes of any other biological process, for that matter—rests on three key parameters: the magnitude of the effect, the quality of the measure, and the quantity of observations. In regards to the latter, the size of affected family pedigrees, the amount of informative transmissions from parent to offspring, or the number of cases and controls in an association study bear directly on the likelihood that a genetic analysis will generate true positive results. Thus, the study of genetic traits that impair fertility—and therefore diminish the number of available observations—presents formidable challenges and illustrates the particularly arduous road undertaken by investigators of reproductive phenotypes (1).

The magnitude of the effect of a particular genetic variant on a phenotype is determined by the complex interplay between nature and nurture; the effect of nature is largely fixed, whereas environmental interactions with the gene variant itself may be difficult to control in human epidemiological studies. But when the magnitude of the genetic effect is large enough, as it often occurs in Mendelian diseases, its contribution may be detected despite the variability inherent to human behavior. In these monogenic traits, a single mutation has a large effect on protein function and the resulting phenotype, such that the variant is deterministic in regards to the trait it causes: its presence almost universally heralds disease (with adjustments for penetrance), and its absence is protective. Thus, it is not surprising that the last decade has witnessed a burgeoning of reports yielding fundamental knowledge on our understanding of genes involved in monogenic diseases that affect fertility (2).

The path is much thornier in complex, polygenic diseases. These phenotypes are caused by a number of genes, each with a modest effect on the trait under study. Their impact on the individual is probabilistic, rather than deterministic: there is no longer a 1:1 relationship between polymorphism and phenotype. Given the small magnitude of the effect, investigators must do their best at optimizing the other two parameters available to experimental manipulation, namely the quality of the measure and the quantity of observations.

The quality of the measure improves with advances in genotyping technology and with exquisite refinement of the phenotype under study, which often requires a profound understanding of human physiology in health and disease. A comprehensive assessment of how to interpret such probabilistic estimates requires a firm grasp of biostatistics, in which the investigator rigorously accounts for the possibility that, out of many genetic variants examined, a spurious association may surface by chance. And it behooves geneticists to examine the largest possible number of samples, while preserving phenotypic quality and homogeneity. A careful and laborious illustration of how to do so in a common reproductive phenotype, the polycystic ovary syndrome (PCOS), is presented in an article by Urbanek et al. (3) in this issue of the journal.

The authors of the report have themselves made significant contributions to the clinical definition of PCOS and to our knowledge of genetic influences on its manifestation (4). In a previous analysis (5), a group of 150 families [44 affected sib pairs (ASPs) and 163 trios, named "set 1"] was screened for both linkage and association to 37 candidate genes. In that study, the most significant linkage occurred for a variant in the follistatin gene and withstood correction for multiple hypothesis testing (Pc = 0.01); the strongest nominal association was found for marker D19S884 in the insulin receptor (INSR) region but did not survive correction for the number of hypotheses examined. Unfortunately, no clear causal variant in the follistatin gene was identified upon sequencing its promoter and coding regions, and subsequent linkage and association studies yielded negative results (6, 7). Interestingly, a very small follow-up study of 85 cases and 87 controls by another group did replicate the association of marker D19S884 with PCOS (7).

For this manuscript, Urbanek et al. (3) assembled a replication group ("set 2") consisting of 217 families that included 63 ASPs and 227 trios. Probands were carefully ascertained according to accepted diagnostic criteria. The authors decided to label sisters as "affected" if they had isolated evidence of hyperandrogenemia even in the absence of menstrual irregularities, on the basis of its widespread presence in PCOS and its documented heritability (8). Unaffected female relatives were ascertained conservatively, and male relatives were labeled as "unknown" because of the lack of a convincing related male phenotype.

The authors concentrated on a 13-Mb segment spanning the INSR region, as a follow-up of their prior association result. They selected 19 short tandem repeat (STR) polymorphisms and tested their samples for linkage through standard identity by descent (IBD) methods, and for association through the transmission disequilibrium test (TDT) in families with a single affected offspring, or the analogous pedigree disequilibrium test (PDT) in families with more than one affected offspring.

The TDT, which was originally developed by this study’s senior author in a landmark contribution to the genetic literature on type 1 diabetes (9), is a family-based test of association. Trios (parents and offspring) are ascertained by the affected status of the offspring. Under the null hypothesis where a given candidate variant has no impact on disease, a heterozygous parent has a 50% chance of transmitting that variant to his/her affected child. However, because the trios are ascertained by disease status of the offspring, if the variant influences risk of disease there should be a deviation from 50:50 transmission (upward if the variant is deleterious, and downward if it is protective). The statistical significance of this deviation can be evaluated by a simple {chi}2 test of the observed number of transmissions vs. the expected 50% result under the null. Verifying that such deviation does not occur in unaffected individuals (a phenomenon known as transmission ratio distortion, for instance if a variant affects overall survival in the general population) is an important control to perform.

An attractive feature of the TDT is that its family-based design makes it quite robust to population stratification. In studies of unrelated cases and controls, inadvertent population substructure may give rise to differences in allele frequencies that are due not to the disease status of the sample but to unrelated confounders, most often a diverse ethnic ancestry (10); if the proportion of admixed individuals is much larger in one of the two groups, the allelic frequency differences—attributed to disease status but in fact caused by divergent population histories—may produce statistically significant but false positive associations. Family-based association tests, by and large, overcome this difficulty.

In the linkage portion of this study, Urbanek et al. (3) find nominal, uncorrected P values for IBD that range from 0.01 to 0.05 in both sets. Because they do not provide the full linkage data for all markers, it is not clear whether any single marker was replicated, or whether the evidence for linkage in each marker grew as the second set of samples was added to the analysis (as it should if the finding is real). They do state that they "consider this finding modest follow-up support for the set 1 results in regard to linkage." Significantly, IBD was 60% with nominal P values less than 0.05 for all markers in a 6.6-Mb region bounded by INSR and STR D19S840, with IBD decaying on either side of this interval. Given the relatively inferior power of linkage analyses when compared with association approaches in detecting modest genetic effects (11), it is not surprising that the linkage evidence presented for this polygenic trait is weak.

On the other hand, the TDT analysis showed nominal, uncorrected P values less than 0.05 for various markers in both sets 1 and 2, with one marker (D19S884) showing significant association with PCOS for allele 8 in both sets. Combined analysis reveals stronger evidence of association for D19S884, with a nominal P value (<0.0006) that survives correction for multiple hypothesis testing by permutation (Pc = 0.034); even when the punitively conservative Bonferroni correction is used, the P value almost reaches conventional empirical significance (Pc = 0.056).

The authors go on to clarify marker-marker interactions by testing a nearby STR (D19S922) that also showed nominal evidence for association. When the TDT is conditioned on individuals with the wild-type allele at D19S922, it still reveals highly significant transmission distortion for the D19S884 A8 allele, suggesting that the association signal arises from the latter. Their results are further validated by the use of the PDT in multiplex families and by ruling out transmission ratio distortion in unaffected samples.

This study tackles an important question in a very complex phenotype. Recognizing the caveats that must be kept in mind when conducting a thoughtful genetic association study, the authors meticulously carried out a number of key tasks: 1) they appropriately acknowledged their limitations in statistical power and increased their sample size; 2) they performed two separate genetic analyses—linkage and association—in a candidate region (although, because they were performed on the same population, the results from each cannot be considered independent); 3) they confirmed that nominal evidence for linkage was retained for all markers in the proposed interval; 4) they elected to use the TDT as an association test, thus controlling for population stratification; 5) they independently replicated previous findings of association; 6) they employed appropriate controls, by ruling out transmission ratio distortion in unaffected individuals and by performing the PDT in multiplex families; and 7) they corrected their P values for the multiple hypotheses examined, both by permutation testing and by the overly punitive Bonferroni method (which assumes that all tests are independent, which is not true when correlation exists between nearby genetic variants due to linkage disequilibrium). Taken together, these measures lend significant evidential weight to their main association result.

Nevertheless, this study is limited in two respects. First is the sparse density of polymorphisms in such a large region (one STR per 80 kb at the highest resolution), in part due to the choice of STRs rather than single nucleotide polymorphisms (SNPs) as markers of genetic variation. Resources such as the expanding inventory of publicly available SNPs in the human genome (12), the human haplotype map developed by the HapMap project (13, 14), and high-throughput SNP-based genotyping platforms make it possible to assay a much more comprehensive set of common variants in a genomic segment of interest. And second is the still modest sample size: experience from work on other complex metabolic traits has shown that, absent a significant genotypic risk, thousands of samples are usually needed to document valid and generally believed genetic associations (15, 16, 17, 18).

Be that as it may, the main question still remains: where is/are the causal variant(s)? Although the strongest evidence for association was found for the A8 allele of D19S884, this STR may merely signal its haplotypic correlation (linkage disequilibrium) with an as-yet-undetected functional polymorphism. The authors speculate on three nearby genes (ELAVL1 encoding an mRNA binding protein, CCL25 encoding a thymus-expressed chemokine, and FBN3 encoding a member of the fibrillin family of extracellular matrix proteins) and rightly point out that a creative scientist can elaborate a convincing enough biological story about any one of them.

They also downplay two other attractive candidate genes on chromosome 19, INSR (encoding the insulin receptor) and RETN (encoding the adipokine resistin), due to their relatively large physical distances from D19S884 (800 kb and 420 kb, respectively). Whether linkage disequilibrium is preserved over such distances in that particular region of the genome can be easily verified by downloading the available data from the HapMap web site, www.hapmap.org (13, 14): a cursory examination of the haplotype structure in Caucasians reveals that, indeed, linkage disequilibrium breaks down between D19S884, INSR, and RETN, illustrating the high probability of historical recombination between the loci. Thus, it is unlikely that the association signal at D19S884 is directly due to variants in INSR or RETN. On the other hand, as the authors suggest, regulatory elements acting over such long distances do exist in the human genome; whether D19S884 itself has an effect on INSR or RETN expression awaits functional studies.

In the meantime, replication of this finding in other existing PCOS cohorts will be an essential step toward establishing this result as a widely accepted, reproducible association: such are the demands of the scientific method. When conducted properly in adequately powered samples, true associations are often replicated (19, 20); only in such large, collaborative efforts—especially in the fertility field—can we hope to elucidate the genetic architecture of complex phenotypes.

Acknowledgments

I thank David Altshuler and William F. Crowley, Jr. for their guidance and mentorship and Corrine Welt for valuable comments on this manuscript.

Footnotes

This work was supported by National Institutes of Health Research Career Award 1 K23 DK65978-02.

Abbreviations: ASP, Affected sib pair; IBD, identity by descent; INSR, insulin receptor; PCOS, polycystic ovary syndrome; PDT, pedigree disequilibrium test; SNP, single nucleotide polymorphism; STR, short tandem repeat; TDT, transmission disequilibrium test.

Received October 5, 2005.

Accepted October 7, 2005.

References

  1. Seminara SB, Crowley Jr WF 2002 Genetic approaches to unraveling reproductive disorders: examples of bedside to bench research in the genomic era. Endocr Rev 23:382–392[Abstract/Free Full Text]
  2. Achermann JC, Ozisik G, Meeks JJ, Jameson JL 2002 Genetic causes of human reproductive disease. J Clin Endocrinol Metab 87:2447–2454[Free Full Text]
  3. Urbanek M, Woodroffe A, Ewens KG, Diamanti-Kandarakis E, Legro RS, Strauss III JF, Dunaif A, Spielman RS 2005 Candidate gene region for polycystic ovary syndrome on chromosome 19p13.2. J Clin Endocrinol Metab 90:6623–6629[Abstract/Free Full Text]
  4. Urbanek M, Spielman RS 2002 Genetic analysis of candidate genes for the polycystic ovary syndrome. Curr Opin Endocrinol Diabetes 9:492–501[CrossRef]
  5. Urbanek M, Legro RS, Driscoll DA, Azziz R, Ehrmann DA, Norman RJ, Strauss III JF, Spielman RS, Dunaif A 1999 Thirty-seven candidate genes for polycystic ovary syndrome: strongest evidence for linkage is with follistatin. Proc Natl Acad Sci USA 96:8573–8578[Abstract/Free Full Text]
  6. Urbanek M, Wu X, Vickery KR, Kao L-C, Christenson LK, Schneyer A, Legro RS, Driscoll DA, Strauss III JF, Dunaif A, Spielman RS 2000 Allelic variants of the follistatin gene in polycystic ovary syndrome. J Clin Endocrinol Metab 85:4455–4461[Abstract/Free Full Text]
  7. Tucci S, Futterweit W, Concepcion ES, Greenberg DA, Villanueva R, Davies TF, Tomer Y 2001 Evidence for association of polycystic ovary syndrome in Caucasian women with a marker at the insulin receptor gene locus. J Clin Endocrinol Metab 86:446–449[Abstract/Free Full Text]
  8. Legro RS, Driscoll D, Strauss III JF, Fox J, Dunaif A 1998 Evidence for a genetic basis for hyperandrogenemia in polycystic ovary syndrome. Proc Natl Acad Sci USA 95:14956–14960[Abstract/Free Full Text]
  9. Spielman RS, McGinnis RE, Ewens WJ 1993 Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 52:506–516[Medline]
  10. Freedman ML, Reich D, Penney KL, McDonald GJ, Mignault AA, Patterson N, Gabriel SB, Topol EJ, Smoller JW, Pato CN, Pato MT, Petryshen TL, Kolonel LN, Lander ES, Sklar P, Henderson B, Hirschhorn JN, Altshuler D 2004 Assessing the impact of population stratification on genetic association studies. Nat Genet 36:388–393[CrossRef][Medline]
  11. Risch N, Merikangas K 1996 The future of genetic studies of complex human diseases. Science 273:1516–1517[Medline]
  12. Reich DE, Gabriel SB, Altshuler D 2003 Quality and completeness of SNP databases. Nat Genet 33:457–458[CrossRef][Medline]
  13. The International HapMap Consortium 2003 The International HapMap Project. Nature 426:789–796[CrossRef][Medline]
  14. Altshuler D, Brooks LD, Chakravarti A, Collins FS, Daly MJ, Donnelly P; International HapMap Consortium 2005 A haplotype map of the human genome. Nature 1299–1320
  15. Altshuler D, Hirschhorn JN, Klannemark M, Lindgren CM, Vohl MC, Nemesh J, Lane CR, Schaffner SF, Bolk S, Brewer C, Tuomi T, Gaudet D, Hudson TJ, Daly M, Groop L, Lander ES 2000 The common PPAR{gamma} Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nat Genet 26:76–80[CrossRef][Medline]
  16. Gloyn AL, Weedon MN, Owen KR, Turner MJ, Knight BA, Hitman G, Walker M, Levy JC, Sampson M, Halford S, McCarthy MI, Hattersley AT, Frayling TM 2003 Large-scale association studies of variants in genes encoding the pancreatic ß-cell KATP channel subunits Kir6.2 (KCNJ11) and SUR1 (ABCC8) confirm that the KCNJ11 E23K variant is associated with type 2 diabetes. Diabetes 52:568–572[Abstract/Free Full Text]
  17. Florez JC, Burtt N, de Bakker PIW, Almgren P, Tuomi T, Holmkvist J, Gaudet D, Hudson TJ, Schaffner SF, Daly MJ, Hirschhorn JN, Groop L, Altshuler D 2004 Haplotype structure and genotype-phenotype correlations of the sulfonylurea receptor and the islet ATP-sensitive potassium channel gene region. Diabetes 53:1360–1368[Abstract/Free Full Text]
  18. Smyth D, Cooper JD, Collins JE, Heward JM, Franklyn JA, Howson JMM, Vella A, Nutland S, Rance HE, Maier L, Barratt BJ, Guja C, Ionescu-Tirgoviste C, Savage DA, Dunger DB, Widmer B, Strachan DP, Ring SM, Walker N, Clayton DG, Twells RCJ, Gough SCL, Todd JA 2004 Replication of an association between the lymphoid tyrosine phosphatase locus (LYP/PTPN22) with type 1 diabetes, and evidence for its role as a general autoimmunity locus. Diabetes 53:3020–3023[Abstract/Free Full Text]
  19. Ioannidis JP, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG 2001 Replication validity of genetic association studies. Nat Genet 29:306–309[CrossRef][Medline]
  20. Lohmueller K, Pearce CL, Pike M, Lander ES, Hirschhorn JN 2003 Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet 33:177–182[CrossRef][Medline]




This Article
Right arrow Full Text (PDF)
Right arrow Submit a related Letter to the Editor
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow Request Copyright Permission
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Florez, J. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Florez, J. C.
Right arrowPubmed/NCBI databases
*Genetics Home Reference
Related Collections
Right arrow Female Endocrinology


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Endocrinology Endocrine Reviews J. Clin. End. & Metab.
Molecular Endocrinology Recent Prog. Horm. Res. All Endocrine Journals