help button home button Endocrine Society JCEM ENDO 08
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

Journal of Clinical Endocrinology & Metabolism, doi:10.1210/jc.2005-0962
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
90/11/5928    most recent
Author Manuscript (PDF)
Right arrow Submit a related Letter to the Editor
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow Request Copyright Permission
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sikaris, K.
Right arrow Articles by Handelsman, D. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sikaris, K.
Right arrow Articles by Handelsman, D. J.
Related Collections
Right arrow Male Endocrinology
The Journal of Clinical Endocrinology & Metabolism Vol. 90, No. 11 5928-5936
Copyright © 2005 by The Endocrine Society

Reproductive Hormone Reference Intervals for Healthy Fertile Young Men: Evaluation of Automated Platform Assays

Ken Sikaris, Robert I. McLachlan, Rymantas Kazlauskas, David de Kretser, Carol A. Holden and David J. Handelsman

Royal College of Pathologists of Australasia and Australasian Association of Clinical Biochemists, Chemical Pathology Quality Assurance Programs Pty. Ltd., Flinders Medical Centre (K.S.), Bedford Park, South Australia 5042, Australia; Prince Henry’s Institute of Medical Research, Monash Medical Centre (R.I.M.), Clayton, Victoria 3168, Australia; Australian Sports Drug Testing Laboratory, National Measurement Institute (R.K.), Pymble, New South Wales 2073, Australia; Andrology Australia, Monash Institute of Medical Research, Monash University (D.d.K., C.A.H.), Clayton, Victoria 3168, Australia; and Department of Andrology, Concord Hospital, ANZAC Research Institute, University of Sydney (D.J.H.), Sydney, New South Wales 2139, Australia

Address all correspondence and requests for reprints to: Dr. David Handelsman, Department of Andrology, Concord Hospital, ANZAC Research Institute, University of Sydney, Sydney, New South Wales 2139, Australia. E-mail: djh{at}anzac.edu.au.


    Abstract
 Top
 Abstract
 Introduction
 Subjects and Methods
 Results
 Discussion
 References
 
Context: Management of male infertility and/or androgen deficiency requires accurate hormonal measurements with valid reference intervals.

Objective: The objective of this study was to develop a valid reference panel of blood samples from healthy eugonadal young men with verified normal reproductive function and to use this panel to evaluate the performance of seven fully automated, commercial multiplex immunoassay platforms used to measure serum total testosterone (T), LH, and FSH.

Design: This was an observational study of consistency among seven different automated immunoassays for each of total T, LH, and FSH. Each method was implemented in two laboratories, with each repeating the analysis of the full reference panel samples twice. Serum T concentrations were also measured by gas chromatography/mass spectrometry (GC/MS), and serum inhibin B levels were determined by an ELISA.

Setting: The study was performed at commercial, high-volume, clinical pathology laboratories.

Participants: From 147 men screened, sera from 124 healthy, reproductively normal men (age, 21–35 yr) with normal sperm output were used as a reference panel. All laboratories selected for elite performance in the national immunoassay quality assurance program agreed to participate.

Main Outcome Measure(s): For each of the 868 assays, descriptive statistics were calculated in the natural and log-transformed scales and were analyzed by nested, repeated measures ANOVA after log transformation. Reference intervals, defined as 95% confidence limits, were calculated using arithmetic (natural scale), geometric (log scale) and nonparametric methods.

Results: Descriptive statistics and reference intervals for serum T, LH, and FSH differed widely and significantly between methods, but variation between laboratories for the same assay was negligible. All T methods showed significant differences in regression slope and intercept in deviance plots as well as in estimated reference ranges compared with the independent GC/MS reference method. Although similar between-method differences existed for gonadotropin assays, the smaller quantitative discrepancies allowed assignment of consensus reference intervals for serum FSH (1.3–8.4 IU/liter) and LH (1.6–8.0 IU/liter), although these differed from manufacturers’ currently quoted expected values.

Conclusions: Using a reference panel of sera from healthy eugonadal young men with verified normal reproductive function, major differences exist between commercial T immunoassays as well as divergence from the GC/MS standard. This impairs their clinical diagnostic utility and requires substantial improvements in automated T immunoassay technologies or a switch to GC/MS methods. Gonadotropin assays showed less variability, but current high-throughput immunoassays remain suboptimal to confirm accurate diagnosis of azoospermia or androgen deficiency.


    Introduction
 Top
 Abstract
 Introduction
 Subjects and Methods
 Results
 Discussion
 References
 
THE DIAGNOSIS OF reproductive dysfunction requires hormone assays that show good internal and external validity. Internal validity implies high precision and low bias, whereas external validity refers to the calibration of assays against appropriate gold standard reference panels from which the relevant reproductive dysfunctions have been systematically excluded. Historically, reproductive hormone assays were originally established in research laboratories that developed and maintained assays including internal and external quality controls. Subsequently, as these assays found a central role in routine clinical practice and sample numbers have increased, there has been a switch to large, high-throughput pathology laboratories that employ commercial multiplex automated platform immunoassays.

For steroid assays in particular, important methodological simplifications required to make this transition to automation and high throughput have resulted in a loss of specificity and sensitivity (1, 2). Widely differing reference intervals compared with those of in-house research laboratory immunoassay methods has been perplexing (3). Major inaccuracy and bias have been reported (4, 5), with commonly used assay platforms showing poor agreement with independent testosterone (T) assays based on liquid chromatography/mass spectrometry (5). At low blood T levels, such as in women, children, and early male puberty where blood T levels are comparable with those in castrate men, the unreliability and poor sensitivity of commercial T assays (1, 2, 4, 5, 6) have led to the measurements being considered no better than random number generation (6). Many of these problems stem from the elimination of solvent extraction, chromatography, and tritiated tracers from the automated platform assays (1, 2).

An additional problem for clinical application of automated, high-throughput commercial T assays is the failure of most laboratories to undertake proper external validation, calibrating the assay against a reference interval based on men with verified normal reproductive function. Instead, most use reference intervals provided by the manufacturer as expected values, usually without adequate details of the origin and validity of the reference population, as noted in other routine clinical chemistry assays (7). For example, in a preliminary survey undertaken through the Chemical Pathology Quality Assurance Program conducted by the Royal College of Pathologists of Australasia and the Australasian Association of Clinical Biochemists, it was observed that among 17 laboratories reporting T levels, the lower reference limit quoted for men varied between 2.5 and 11.0 nmol/liter, and the upper reference limit varied between 21.6 and 40.0 nmol/liter, with different reference intervals cited even for the same commercial assay (8). Reference intervals were attributed to manufacturers’ kit inserts, in-house studies, or historical values in about equal proportions (8).

Thus, the clinical diagnosis of androgenic disorders may be confounded by limitations of both internal (assay methodology) and external (reference interval) validations. This study sought to evaluate the performance of widely used, automated multiplex assay platforms in the measurement of reproductive hormones. We used a carefully selected reference group of healthy fertile young men to compare estimated reference intervals among different assay platforms for LH, FSH, and T as well as against the gold standard gas chromatography/mass spectrometry (GC/MS) reference method for T.


    Subjects and Methods
 Top
 Abstract
 Introduction
 Subjects and Methods
 Results
 Discussion
 References
 
Subjects

The approach to developing reference intervals was that recommended by the U.S. National Committee for Clinical Laboratory Standards (Approved Guideline C28-A2). The reference panel recruited volunteer men, aged 21–35 yr, with normal general and reproductive health defined by physical examination (height, weight, blood pressure, virilization, and testis volumes by Prader orchidometry) and clinical history (including fertility status and reproductive pathology). Exclusion criteria were history or evidence of testicular or androgenic pathology (gynecomastia or hypospadias); obesity (body mass index, >30 kg/m2); testicular pathology or male infertility; undertaking extreme exercise; chronic use of drugs, alcohol (>15 standard drinks or 225 g/wk), opiates, or marijuana; use of any medication known to affect androgen levels 30 d before providing a serum sample; use of any hormone treatment within last year; any severe febrile illness in the preceding 3 months; or any other serious medical illness. Informed written consent was obtained from all participants, and the study was approved by the relevant institutional human ethics committees (Southern Health in Melbourne and the Central Sydney Area Health Service in Sydney). Participants were reimbursed for time and travel costs to participate in a single visit to the study center.

Sample size was designed to allow the establishment of both parametric and nonparametric reference limits with 95% confidence intervals. Hence, it was determined that at least 120 complete samples would be required and, consequently, the study aimed to recruit approximately 150 men, allowing for up to 20% exclusions.

Blood collection and preparation

Blood samples were taken between 0800 and 1000 h after a light breakfast to minimize the effects of diurnal rhythm, fasting, and dehydration. A total of 80 ml blood was collected by a single venipuncture into plain plastic Vacutainer (BD Biosciences, Franklin Lakes, NJ) tubes (without anticoagulants or separation gel). After standing at room temperature for 1 h, the blood sample was centrifuged for 5 min at 4000 rpm. The supernatant serum was carefully removed with a fine disposable plastic pipette, and all serum from a single subject was pooled into a 70-ml sterile specimen jar (Interpath Services Pty. Ltd., Heidelberg, Australia). For each study participant, the pooled serum sample was aliquoted into 2-ml screw-top microtubes (Sarstedt AG & Co., Numbrecht, Germany) for subsequent storage. At least 12 2-ml aliquots were collected from each study participant. Each microtube was labeled with a coded volunteer identification label and stored at –70 C.

Selection of routine assay platforms and laboratories

The panel of 124 sera was used to establish reference intervals for total T, LH, and FSH over seven widely available commercial automated analyzer systems as well as for T by GC/MS and inhibin B by an in-house manual commercial immunoassay. For each major assay system, laboratories were selected from those with best consistent performance (top 20% for precision and bias compared with median for the assay system as a group) according to the external Chemical Pathology Quality Assurance Program conducted by the Royal College of Pathologists of Australasia and the Australian Association of Clinical Biochemists. Each laboratory approached agreed to participate in the study. All were routine clinical pathology laboratories providing large numbers of immunoassays for large public hospitals or doctors in the community. Assay manufacturers provided sufficient reagents to the participating laboratories for the study, but had no involvement in the origin, design, analysis, or reporting of the study data.

A total of 15 laboratories participated in the study using a range of technologies, which included Architect i2000 and AxSym (Abbott Laboratories, Chicago, IL), ADVIA ACS-180 and Centaur (Bayer Diagnostics, Tarrytown, CA), ACCESS (Beckman Coulter, Fullerton, NY), DPC Immulite 2000 (Diagnostic Products Corp., Los Angeles, CA), Vitros ECi (Ortho Clinical Diagnostics, Raritan, NJ), and Elecsys E170 and E2010 (Roche, Mannheim, Germany). Reporting of results was de-identified by assigning letter codes for each method.

GC/MS analysis of serum total T

Total T was measured by GC/MS under the auspices of Dr. R Kazlauskas (Australian Sports Drug Testing Laboratory, National Measurement Institute, the Australian laboratory accredited by the World Anti-Doping Agency) (9). Using one set of 124 samples, serum (0.2–0.3 ml) was extracted with diethyl ether, and the extract was derivatized with n-methyl-n-trimethylsilyltrifluoroacetamide. Extract was separated by GC using an HP Ultra 1 column (0.11-µm film thickness) on an HP5980 (Agilent) gas chromatograph. MS was performed on a MAT95S high-resolution mass spectrometer (Thermo Finnegan, San Jose, CA) at a resolution of 3500. Samples were measured in three batches using a four-point (0, 5, 10, and 15 ng/ml) calibration curve with consistent linearity. Results were corrected for recovery (>96%) according to an internal standard of deuterated T added to samples before extraction. The detection limit was 1 ng/ml, and the coefficient of variation was 10.6% at a mean of 18.0 nmol/liter.

Serum inhibin B

Using the only widely available commercial assay, serum inhibin B was measured using a specific ELISA according to the manufacturer’s instructions (DSL-10–84100i ACTIVE Inhibin B ELISA, Diagnostic Systems Laboratories, Webster, TX), with standards (range, 10–1000 pg/ml) provided as part of the manufacturer’s kit. The average intraplate coefficient of variation was 6.6%, and the interplate coefficient of variation was 8.0% (n = 9 plates). The limit of detection was 10 pg/ml.

Semen samples

Semen samples were collected by masturbation after a suggested 3- to 5-d period since last ejaculation, with sperm concentration assessed using standard World Health Organization methods (10).

Analysis of samples

One set of 124 frozen samples was distributed on dry ice to each participating laboratory. Serum samples were stored at –20 C in the participating laboratory until analyzed. Before analysis, samples were thawed at room temperature and mixed thoroughly by inversion. Any samples with fibrin deposits were recentrifuged at 4000 rpm. The full serum panel was assayed for T, LH, and FSH by all participating laboratories within 5 d of thawing with samples maintained at 4 C, conditions under which these analytes are stable (11, 12). All 124 samples were assayed in singlicate on each of 2 d according to that laboratory’s standard operating protocol. Each participating laboratory used the same lot of reagents for each assay run. Results are reported in Systeme International units (T, nmol/liter = ng/dl x 0.0347; FSH, IU/liter = 1.0 x mIU/ml; LH, IU/liter = 1.0 x mIU/ml).

Inhibin B was measured in a separate study, involving duplicate assay runs over 2 d, at the Monash Institute of Medical Research.

Statistical analysis

Each analyte (T, LH, and FSH) was run in seven different commercial methods, with each method implemented in two laboratories, each running the set of 124 samples twice. The results for each run were analyzed according to each individual run as well as for each method by averaging the four replicates for each sample across two replicate runs and two laboratories. These data were analyzed by a nested ANOVA for repeated measures (seven methods as between factors and two laboratories nested within each method) using Number Cruncher Statistical Systems software (www.ncss.com). Descriptive statistics were run in both the arithmetic (natural) and log-transformed (geometric) scales, with the distribution analyzed for normality using the Shapiro-Wilkes W statistic (13). Missing samples were ignored because they were considered missing at random, as defined by Little and Rubin (14). Samples with values below the detection limit of the assay were arbitrarily assigned a value of the detection limit for that assay. Deviance plots based on Bland-Altman methodology (15), but modified to compare candidate methods with a reference method (zero bias by definition), were used to compare commercial T assays with the GC/MS T assay. Reference intervals based on 95% confidence intervals were then developed for serum T, FSH, LH, and inhibin B for each method using the arithmetic (natural) and log-transformed (geometric) scales on the assumption of a normal or log-normal distributions, respectively, as well as using a nonparametric approach. Linear correlations to compare methods were calculated according to the nonparametric procedure of Passing and Bablok (16) or by the parametric Deming regression, assuming equal variance for the methods (17).


    Results
 Top
 Abstract
 Introduction
 Subjects and Methods
 Results
 Discussion
 References
 
Reference panel

A total of 147 young men, aged 21–35 yr, from the general male community volunteered for testing after responding to advertising in the local media. Eleven men were excluded before blood collection for testicular pathology (n = 5; including cryptorchidism, varicocele, and infertility), abnormal screening biochemistry (n = 2; anemia and abnormal liver function tests), chronic drug use (n = 1), underweight (body mass index, <20; n = 1), and voluntary withdrawal from participation (n = 1). The remaining 136 eligible subjects had a normal full blood examination and renal and liver function tests, and provided one semen sample. Twelve men (8.8%) were excluded from the final reference panel due to subnormal total sperm output. The final reference panel included sera from 124 healthy fertile young men. The characteristics of the serum reference study population are detailed in Table 1Go. No blood samples were excluded due to lipemia.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Characteristics of the 124 study subjects

 
Total T reference interval

After exclusion of missing results (n = 139 individual samples), 3457 (96%) of 3596 individual samples were included in the determination of serum total T reference intervals. All total T reference distributions were significantly skewed in the natural scale, as determined by the Shapiro-Wilkes W statistic (Table 2Go). Because the normality of the distributions all improved, and seven of eight (excepting method C) were rendered normal by log transformation, all subsequent analyses used log-transformed data. Repeated measures ANOVA indicated consistent differences between methods (all F6,27 > 26; P < 0.001), whereas laboratories (F1,27 < 1.0; P > 0.34) were a minimal source of variation for all descriptive statistics (Table 2Go). When referred to the GC/MS estimates, the descriptive statistics differed between 6–37%.


View this table:
[in this window]
[in a new window]
 
TABLE 2. Descriptive statistics for serum total T (nanomoles per liter) for each platform

 
The geometric reference interval based on log transformation conformed closely to the nonparametric reference interval, whereas the arithmetic reference interval based on the natural scale deviated significantly for all methods, and none was well approximated by the manufacturer’s recommended reference interval (Table 3Go). The lower reference limit ranged from 7.5–12.7 nmol/liter (geometric) or 7.3–12.6 nmol/liter (nonparametric), whereas the upper limit ranged from 25.8–34.4 nmol/liter (geometric) or 25.9–32.9 nmol/liter (nonparametric) compared with 10.4–29.8 (geometric) or 10.4–30.1 (nonparametric) for GC/MS. Using nonparametric (Passing-Bablok) or Deming regression, six of the seven methods differed significantly in slope (four significantly higher and two significantly lower slope than 1.0) and intercept (three significantly lower than 0.0) from the independent GC/MS method. None of the seven methods conformed closely to the GC/MS reference intervals or was significantly superior to others compared with the GC/MS reference method according to deviance plots (Fig. 1Go).


View this table:
[in this window]
[in a new window]
 
TABLE 3. Serum total T (nanomoles per liter) reference intervals for each platform reported

 


View larger version (28K):
[in this window]
[in a new window]
 
FIG. 1. Deviance plots of the difference between each of the seven methods (A–G) from the reference GC/MS method. In each plot, the y-axis represents the difference between the serum T concentration for that method minus the GC/MS result plotted against the GC/MS reference method result on the x-axis. For each panel, 496 (4 x 124) points represent the results from all four assays for that method (two laboratories in two separate assays). The solid line represents the line of identity (zero difference between methods), and the dashed line represents the mean difference for all samples.

 
LH reference interval

After exclusion of missing results (n = 112 individual samples), 3360 (97%) of 3472 individual samples were included in the determination of serum LH reference intervals. All LH reference distributions were significantly skewed in the natural scale as determined by the Shapiro-Wilkes W statistic (Table 4Go). Because all were rendered normal by log transformation, subsequent analyses used log-transformed data. Repeated measures ANOVA indicated consistent differences between methods (all F6,27 > 7.8; P < 0.001), whereas laboratories (F1,27 < 1.5; P > 0.23) were a minimal source of variation for all descriptive statistics (Table 4Go). Method comparisons showed that all seven methods differed significantly in slope and intercept.


View this table:
[in this window]
[in a new window]
 
TABLE 4. Descriptive statistics for serum LH (IU/liter) for each platform

 
The geometric reference interval based on log transformation conformed closely to the nonparametric reference interval, whereas the arithmetic (natural scale) reference interval deviated significantly for each method, and the manufacturer’s recommended reference interval was reasonably close for only two methods (Table 5Go). The lower reference limit ranged from 1.3–1.9 IU/liter (geometric) or 1.2–1.7 IU/liter (nonparametric), whereas the upper limit ranged from 7.3–8.7 IU/liter (geometric) or 7.8–9.1 IU/liter (nonparametric). Pooled estimates of the reference intervals across all seven methods were 1.6–8.0 IU/liter (geometric) and 1.5–8.1 IU/liter (nonparametric).


View this table:
[in this window]
[in a new window]
 
TABLE 5. Serum LH (IU/liter) reference intervals for each platform

 
FSH reference interval

After exclusion of missing results (n = 123 individual samples), 3349 (97%) of 3472 individual samples were included in the determination of serum FSH reference intervals. All FSH reference distributions were significantly skewed in the natural scale as determined by the Shapiro-Wilkes W statistic (Table 6Go). Because all were rendered normal by log transformation, subsequent analyses used log-transformed data.


View this table:
[in this window]
[in a new window]
 
TABLE 6. Descriptive statistics for serum FSH (IU/liter) for each platform

 
Repeated measures ANOVA for descriptive statistics (Table 6Go) indicated consistent differences between methods (all F6,27 > 2.9; P < 0.035), whereas laboratory differences were an additional source of variation for all quartiles’ mean, but not minimum, maximum, or SD (Table 6Go). Method comparisons showed that all seven methods differed significantly in slope and intercept.

The geometric (log-transformed) reference interval conformed closely to the nonparametric reference interval, especially for the lower limit. By contrast, the arithmetic (natural scale) reference intervals and the manufacturer’s recommended reference interval were inaccurate, with many wide deviations (Table 7Go). The lower reference limit ranged from 1.0–1.5 IU/liter (geometric) or 1.0–1.5 IU/liter (nonparametric), whereas the upper limit ranged from 6.6–10.0 IU/liter (geometric) or 7.9–10.5 IU/liter (nonparametric). Pooled estimates of the reference intervals across all seven methods were 1.3–8.4 IU/liter (geometric) and 1.2–9.5 IU/liter (nonparametric).


View this table:
[in this window]
[in a new window]
 
TABLE 7. Serum FSH (IU/liter) reference intervals for each platform

 
Serum inhibin B

Inhibin B results for all 124 specimens were included. The inhibin B distribution was Gaussian, with descriptive statistics [minimum, 10 pg/ml; quartile 1, 97 pg/ml; quartile 2 (median), 129 pg/ml; quartile 3, 171 pg/ml; maximum, 286 pg/ml; mean, 136 pg/ml; SD, 53 pg/ml]. The 95% confidence limits for the reference interval were 48–251 pg/ml in natural or geometric scale and by nonparametric estimation. There was a significant inverse relationship between serum inhibin B and FSH concentrations for each assay correlation coefficient varying between 0.211–0.322.


    Discussion
 Top
 Abstract
 Introduction
 Subjects and Methods
 Results
 Discussion
 References
 
The present study has systematically evaluated the variation in apparent reference intervals of seven widely used commercial immunoassay platforms for the hormones (total T, LH, and FSH) used globally to evaluate male reproductive function. By establishing a reference panel from a well-defined group of healthy young men in whom normal reproductive function was explicitly verified, the present findings demonstrate the limitations, notably the inconsistency, of existing commercial multiplex immunoassay methodologies. Although previous studies focusing on blood T measurements have employed unselected samples referred to a diagnostic laboratory (4) or a mixture of samples from hypogonadal and eugonadal men across a wide age range recruited for other studies (5), the present study demonstrates wide variability in performance of the most widely used, commercial multiplex assays when calibrated against a purpose-constructed, well-defined, and suitably large eugonadal reference population of healthy eugonadal young men. In addition, this study extends to include blood LH and FSH assay measurements, which are also an essential part of the full evaluation of male reproductive health. A striking finding is that the variability in T assays is significantly worse than that in gonadotropin assays despite the availability of a gravimetric standard for T compared with the LH and FSH assays, in which the extensive microheterogeneity inherently precludes such standardization.

Lack of consistency or agreement between T assay platforms and reference methods has recently been highlighted (3, 6) after reports that demonstrate the limitations of commercial T immunoassay platforms, especially in application to samples from women and adolescent boys (4). Although the performance of commercial T immunoassays was somewhat better in sample from men, there were significant deviations in both studies from an independent GC/MS reference method (4, 5). One limitation of previous studies is that they had no (4) or too few (5) eugonadal young men to thoroughly define a hormone reference interval. To remedy this, the present study recruited more than 120 eugonadal healthy young men to define the hormone reference intervals. A similar approach, using healthy fertile young men to define hormone reference intervals, has been reported (18), although those findings were based on blood samples obtained from a previous study, with less rigorous reproductive screening. The present approach using a well-defined panel of blood samples to construct reference intervals provides an advance on the present situation, where new methods or improvements in existing commercial immunoassay methods are introduced without sufficient verification of the consequential changes in reference intervals.

Ideally, valid T assay methods must demonstrate internal and external validity. Internal validity is gauged by the method providing essentially identical numerical values, that is, demonstrating high precision and minimal bias when calibrated against an independent reference method such as GC/MS. External validity should provide consistent reference intervals when calibrated against a valid reference population. Limitations to idealized performance may arise from assay design features and human performance factors. Although human error in assay performance is minimized by the development of automated assay formats, the technical simplifications used in the present generation of T immunoassays have included key features, such as removal of organic solvent extraction, preassay chromatographic purification, and replacement of tritiated isotopes (which do not alter the basic steroid structure) with tracers requiring bulky substituents on the steroid ring. Although the specific importance of each of these assay factors remains unresolved and contentious, there is considerable opinion and evidence (1, 2) (Jimenez, M., and D. J. Handelsman, unpublished observations) suggesting that these simplifications are significant detrimental design factors in overall T assay validity. The present study findings indicate that all current commercial T immunoassay platforms are technically deficient and that their overall performance is suboptimal to accurately confirm the diagnosis of androgen deficiency in men. These findings point to a need for methodological improvement, including the development of routine, sensitive, and high-throughput online GC/MS for steroid assays (19).

To secure external validity an appropriate reference population is vital. In this study we recruited more than 120 healthy eugonadal young men as the most appropriate reference group for the assessment of fertility and androgen status. In completing this recruitment, 15.6% of men were excluded for criteria that could influence testicular function, virtually all without overt clinical symptoms, including more than half with defects restricted to unexplained low sperm production. This highlights the limitations of employing unselected reference populations without regard to reproductive function, when male subfertility is a relatively frequent disorder affecting approximately 5% of men (20), often without other clinical manifestations. Inadvertent inclusion of subfertile men with unrecognized spermatogenic defects and elevated FSH levels in male reference populations would inappropriately widen FSH reference intervals. Such inappropriately high upper reference limits may misclassify azoospermic men with spermatogenic failure as having obstructive azoospermia and lead to inappropriate fertility treatment. Similarly, the standardization in this study by age, body weight, and morning blood samples removed potential confounding by age (21), obesity (22, 23), and diurnal rhythms (24), which would otherwise broaden the apparent reference interval. Similar findings have been reported from Denmark (25)

Any shift in the reference interval for serum T based on healthy eugonadal young men, particularly in the its lower limit, could, in theory, be considered to result in misclassification as hypoandrogenic of men whose blood T concentrations lie at the lower end of the reference interval. Such hypothetical over- or underdiagnosis is of concern only where blood T measurements assume a disproportionate clinical role. In younger men with organic underlying disorders of the pituitary or testes causing overt androgen deficiency, blood T measurements are useful to confirm, rather than make, the clinical diagnosis of androgen deficiency (26). By contrast, among older men without underlying organic pituitary or testicular disorders (so-called andropause), excessive reliance on blood T concentrations alone to make a putative diagnosis of age-related androgen deficiency has little reliable clinical or epidemiological basis (27), and such therapeutic interventions await substantiation by appropriate clinical trials (28). Although reference intervals for blood T concentrations from population-based studies of apparently healthy men have been reported from the Massachusetts Male Ageing Study (29), this involved men over the age of 40 yr who did not have specific adverse health conditions, but did not verify the normal male reproductive health of the reference population. The T assay used was a nonautomated assay, involving solvent extraction of samples before the assay. In the youngest age decile (40–49 yr) of that study, the derived reference interval of 8.7–31.7 nmol/liter was wider than the present estimate. Rather than differences due to the small increase in age, this discrepancy was most likely due to the inadvertent inclusion of men with unrecognized reproductive disorders in the apparently healthy Massachusetts Male Ageing Study reference population and/or variations in assay methodology, both of which were specifically standardized in the present study. The suboptimal performance of commercial immunoassays in this study is not likely to be due to unfortunate choice of laboratories, because those chosen were selected from the top 20% of well-performing laboratories based on the ongoing national quality assurance program. This view is confirmed by the minimal contribution of laboratories to variations in assay performance compared with method for all descriptive variables. These findings do, however, raise the possibility that more general standards of assay performance in the community, including less elite laboratories, may be considerably worse than depicted by this study.

By contrast with the serum T measurements, the FSH and LH assays showed better alignment across multiple commercial immunoassay platforms, with statistically significant, but only quantitatively minor, variations between assays. Hence, for practical purposes, consensus reference intervals of 1.6–8.0 IU/liter for LH and 1.3–8.4 IU/liter for FSH based on log-normal distribution of results across all assays could be derived from this study. Serum FSH measurement has a key significance clinically in differentiating obstructive from nonobstructive azoospermia. The diagnosis of obstruction requires demonstrating normal spermatogenesis, which is confirmed by a normal blood FSH concentration in an azoospermic man with two testes of normal volume, whereas an elevated blood FSH level is an indication of damaged or dysfunctional spermatogenesis as the presumed cause of azoospermia. Interpretation is therefore critically dependent on an accurate upper limit of the reference interval based on men with proven normal spermatogenesis. In practice, however, many recommended reference intervals were far too high, with five of seven being greater than 12 IU, a discrepancy most likely arising from the unrecognized inclusion of older men or those with reproductive failure in the original reference panels. Consequently, pathological elevations of blood FSH concentrations would lead to mistaken diagnosis of obstructive azoospermia when spermatogenic failure is the cause of the azoospermia. One study examining testicular histology to determine the etiology of infertility in consecutive cases of azoospermia proposed a value of 7.6 IU/liter as providing the optimal differentiation of obstructive and nonobstructive azoospermia (30), a level that aligns reasonably well with the upper reference limit obtained in this study.

Serum inhibin B measurement in blood has been suggested to augment the utility of serum FSH concentrations to make the noninvasive diagnosis of spermatogenesis, thereby making the most efficient and lowest risk-benefit use of testicular biopsy (31, 32). The present study defines reference intervals for inhibin B using the Diagnostic Systems Laboratories assay method, the only commercially available method at this time. The validation by use of a reference panel of men with proven normal reproductive function in this study may aid future studies of inhibin physiology and the role of inhibin B assays in the diagnosis and management of spermatogenesis in infertile men.

In conclusion, using a purpose-recruited group of 124 healthy eugonadal young men as a reference panel, the present study has identified significant differences between commercial automated platforms for T assays, with none being in satisfactory agreement with a gravimetrically based, independent GC/MS reference method and substantial discrepancies existing between methods in apparent reference intervals. Similar, but less severe, problems were also identified in blood LH and FSH assays, although better alignment between assays was evident. Although an appropriate reference panel eliminates many of the defects in blood LH and FSH assays, the methodological limitations of the present commercial blood T assays suggest more substantial technical improvements, including development of high-throughput GC/MS-based methods, are required. Current commercial methods provide suboptimal assistance in confirming the clinical diagnosis of male reproductive disorders, including androgen and spermatogenic deficiencies.


    Acknowledgments
 
We thank the staff and directors of the Royal College of Pathologists of Australasia and the Australasian Association of Clinical Biochemists, Chemical Pathology Quality Assurance Program, participating laboratories, and immunoassay manufacturer’s for their contribution and support of this study. The technical assistance of Mr. Nick Balasz (Southern Cross Pathology), Mr. Michael Daskalakis (Southern Cross Pathology), Ms. Anne O’Connor (Monash Institute of Medical Research), and Susan Soo (for performing) and Chris Howe (for collating the CG/MS data at the Australian Sports Drug Testing Laboratory) is also gratefully acknowledged.


    Footnotes
 
This work was supported by a grant from Andrology Australia (Australian Center of Excellence in Male Reproductive Health) and in collaboration among Prince Henry’s Institute of Medical Research, Monash Institute of Medical Research, the ANZAC Research Institute, the Royal College of Pathologists of Australasia, and the Australasian Association of Clinical Biochemists. Andrology Australia acknowledges the financial support of the Australian Government Department of Health and Ageing.

First Published Online August 23, 2005

Abbreviations: GC/MS, Gas chromatography/mass spectrometry; T, testosterone.

Received May 2, 2005.

Accepted August 12, 2005.


    References
 Top
 Abstract
 Introduction
 Subjects and Methods
 Results
 Discussion
 References
 

  1. Stanczyk FZ 2004 Reliability of extraction/chromatography RIAs. Clin Chem 50:778–779[Free Full Text]
  2. Stanczyk FZ 2004 Extraction/chromatographic testosterone RIA can be used as the "gold standard" for determining the reliability of direct testosterone immunoassay measurements. Clin Chem 50:2219–2220[Free Full Text]
  3. Matsumoto AM, Bremner WJ 2004 Serum testosterone assays–accuracy matters. J Clin Endocrinol Metab 89:520–524[Free Full Text]
  4. Taieb J, Mathian B, Millot F, Patricot MC, Mathieu E, Queyrel N, Lacroix I, Somma-Delpero C, Boudou P 2003 Testosterone measured by 10 immunoassays and by isotope-dilution gas chromatography-mass spectrometry in sera from 116 men, women, and children. Clin Chem 49:1381–1395[Abstract/Free Full Text]
  5. Wang C, Catlin DH, Demers LM, Starcevic B, Swerdloff RS 2004 Measurement of total serum testosterone in adult men: comparison of current laboratory methods versus liquid chromatography-tandem mass spectrometry. J Clin Endocrinol Metab 89:534–543[Abstract/Free Full Text]
  6. Herold DA, Fitzgerald RL 2003 Immunoassays for testosterone in women: better than a guess? Clin Chem 49:1250–1251[Free Full Text]
  7. Koide N, Shinji T, Okada K, Mizushima J, Matsuda N, Sunami H 1998 Inter-laboratory difference among eleven clinical laboratories in the Okayama City area. Acta Med Okayama 52:261–270
  8. de Kretser DM, McLachlan RI, Handelsman DJ, Holden CA, Balazs ND, Daskalakis M, Scott S, Sikaris K 2003 Andrology Australia’s study of testosterone reference intervals. Clin Biochem Rev 24:S42 (Abstract S27)
  9. Trout GJ, Kazlauskas R 2004 Sports drug testing–an analyst’s perspective. Chem Soc Rev 33:1–13
  10. World Health Organization 1999 WHO laboratory manual for the examination of human semen and sperm-cervical mucus interaction, 4th Ed. Cambridge: Cambridge University Press
  11. Kley HK, Rick W 1984 The effect of storage and temperature on the analysis of steroids in plasma and blood. J Clin Chem Clin Biochem 22:371–378[Medline]
  12. Reyna R, Traynor KD, Hines G, Boots LR, Azziz R 2001 Repeated freezing and thawing does not generally alter assay results for several commonly studied reproductive hormones. Fertil Steril 76:823–825[CrossRef][Medline]
  13. Shapiro SS, Wilks MB 1965 An analysis of variance test for normality (complete samples). Biometrika 52:591–611[Free Full Text]
  14. Little R, Rubin D 2002 Statistical analysis with missing data, 2nd Ed. Hoboken: Wiley
  15. Bland JM, Altman DG 1986 Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1:307–310[CrossRef][Medline]
  16. Passing H, Bablok 1983 A new biometrical procedure for testing the equality of measurements from two different analytical methods. Application of linear regression procedures for method comparison studies in clinical chemistry, part I. J Clin Chem Clin Biochem 21:709–720[Medline]
  17. Linnet K 1993 Evaluation of regression procedures for methods comparison studies. Clin Chem 39:424–432[Free Full Text]
  18. Andersson AM, Jorgensen N, Frydelund-Larsen L, Rajpert-De Meyts E, Skakkebaek NE 2004 Impaired Leydig cell function in infertile men: a study of 357 idiopathic infertile men and 318 proven fertile controls. J Clin Endocrinol Metab 89:3161–3167[Abstract/Free Full Text]
  19. Nelson RE, Grebe SK, DJ OK, Singh RJ 2004 Liquid chromatography-tandem mass spectrometry assay for simultaneous measurement of estradiol and estrone in human plasma. Clin Chem 50:373–384[Abstract/Free Full Text]
  20. De Kretser DM, Baker HW 1999 Infertility in men: recent advances and continuing controversies. J Clin Endocrinol Metab 84:3443–3450[Free Full Text]
  21. Gray A, Berlin JA, McKinlay JB, Longcope C 1991 An examination of research design effects on the association of testosterone and male aging: results of a meta-analysis. J Clin Epidemiol 44:671–684[CrossRef][Medline]
  22. Feldman HA, Longcope C, Derby CA, Johannes CB, Araujo AB, Coviello AD, Bremner WJ, McKinlay JB 2002 Age trends in the level of serum testosterone and other hormones in middle-aged men: longitudinal results from the Massachusetts male aging study. J Clin Endocrinol Metab 87:589–598[Abstract/Free Full Text]
  23. Jensen TK, Andersson AM, Jorgensen N, Andersen AG, Carlsen E, Petersen JH, Skakkebaek NE 2004 Body mass index in relation to semen quality and reproductive hormones among 1,558 Danish men. Fertil Steril 82:863–870[CrossRef][Medline]
  24. Bremner WJ, Vitiello MV, Prinz PN 1983 Loss of circadian rhythmicity in blood testosterone levels with aging in normal men. J Clin Endocrinol Metab 56:1278–1281[Abstract]
  25. Andersson AM, Petersen JH, Jorgensen N, Jensen TK, Skakkebaek NE 2004 Serum inhibin B and follicle-stimulating hormone levels as tools in the evaluation of infertile men: significance of adequate reference values from proven fertile men. J Clin Endocrinol Metab 89:2873–2879[Abstract/Free Full Text]
  26. Handelsman DJ, Zajac JD 2004 Androgen deficiency and replacement therapy in men. Med J Aust 180:529–535[Medline]
  27. Handelsman DJ, Liu PY 2005 Andropause: invention, prevention, rejuvenation. Trends Endocrinol Metab 16:39–45[CrossRef][Medline]
  28. Liverman CT, Blazer DG, eds 2004 Testosterone and aging: clinical research directions. Washington DC: National Academies Press
  29. Mohr BA, Guay AT, O’Donnell AB, McKinlay JB 2005 Normal, bound and nonbound testosterone levels in normally ageing men: results from the Massachusetts Male Ageing Study. Clin Endocrinol (Oxf) 62:64–73[CrossRef][Medline]
  30. Schoor RA, Elhanbly S, Niederberger CS, Ross LS 2002 The role of testicular biopsy in the modern management of male infertility. J Urol 167:197–200[CrossRef][Medline]
  31. Pierik FH, Vreeburg JT, Stijnen T, De Jong FH, Weber RF 1998 Serum inhibin B as a marker of spermatogenesis. J Clin Endocrinol Metab 83:3110–3114[Abstract/Free Full Text]
  32. Bohring C, Schroeder-Printzen I, Weidner W, Krause W 2002 Serum levels of inhibin B and follicle-stimulating hormone may predict successful sperm retrieval in men with azoospermia who are undergoing testicular sperm extraction. Fertil Steril 78:1195–1198[CrossRef][Medline]



This article has been cited by other articles:


Home page
J. Clin. Endocrinol. Metab.Home page
P. A. Boepple, F. J. Hayes, A. A. Dwyer, T. Raivio, H. Lee, W. F. Crowley Jr, and N. Pitteloud
Relative Roles of Inhibin B and Sex Steroids in the Negative Feedback Regulation of Follicle-Stimulating Hormone in Men across the Full Spectrum of Seminiferous Epithelium Function
J. Clin. Endocrinol. Metab., May 1, 2008; 93(5): 1809 - 1814.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Endocrinol. Metab.Home page
M. Grossmann, M. C. Thomas, S. Panagiotopoulos, K. Sharpe, R. J. MacIsaac, S. Clarke, J. D. Zajac, and G. Jerums
Low Testosterone Levels Are Common and Associated with Insulin Resistance in Men with Diabetes
J. Clin. Endocrinol. Metab., May 1, 2008; 93(5): 1834 - 1840.
[Abstract] [Full Text] [PDF]


Home page
Clin. Chem.Home page
R. S. Swerdloff and C. Wang
Free Testosterone Measurement by the Analog Displacement Direct Assay: Old Concerns and New Evidence
Clin. Chem., March 1, 2008; 54(3): 458 - 460.
[Full Text] [PDF]


Home page
PediatricsHome page
R. F. Greaves, R. W. Hunt, A. S. Chiriano, and M. R. Zacharin
Luteinizing Hormone and Follicle-Stimulating Hormone Levels in Extreme Prematurity: Development of Reference Intervals
Pediatrics, March 1, 2008; 121(3): e574 - e580.
[Abstract] [Full Text] [PDF]


Home page
Arch Intern MedHome page
C. Meier, T. V. Nguyen, D. J. Handelsman, C. Schindler, M. M. Kushnir, A. L. Rockwood, A. W. Meikle, J. R. Center, J. A. Eisman, and M. J. Seibel
Endogenous Sex Hormones and Incident Fracture Risk in Older Men: The Dubbo Osteoporosis Epidemiology Study
Arch Intern Med, January 14, 2008; 168(1): 47 - 54.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Endocrinol. Metab.Home page
C. A. Allan, B. J. G. Strauss, H. G. Burger, E. A. Forbes, and R. I. McLachlan
Testosterone Therapy Prevents Gain in Visceral Adipose Tissue and Loss of Skeletal Muscle in Nonobese Aging Men
J. Clin. Endocrinol. Metab., January 1, 2008; 93(1): 139 - 146.
[Abstract] [Full Text] [PDF]


Home page
Hum Reprod UpdateHome page
P. Roy, M. Alevizaki, and I. Huhtaniemi
In vitro bioassays for androgens and their diagnostic applications
Hum. Reprod. Update, January 1, 2008; 14(1): 73 - 82.
[Abstract] [Full Text] [PDF]


Home page
Eur J EndocrinolHome page
L. Aksglaede, J. H Petersen, K. M Main, N. E Skakkebaek, and A. Juul
High normal testosterone levels in infants with non-mosaic Klinefelter's syndrome
Eur. J. Endocrinol., September 1, 2007; 157(3): 345 - 350.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Endocrinol. Metab.Home page
C. Wang, P. Christenson, and R. Swerdloff
Clinical Relevance of Racial and Ethnic Differences in Sex Steroids
J. Clin. Endocrinol. Metab., July 1, 2007; 92(7): 2433 - 2435.
[Full Text] [PDF]


Home page
Eur J EndocrinolHome page
B. B Yeap, O. P Almeida, Z. Hyde, P. E Norman, S A P. Chubb, K. Jamrozik, and L. Flicker
In men older than 70 years, total testosterone remains stable while free testosterone declines with age. The Health in Men Study
Eur. J. Endocrinol., May 1, 2007; 156(5): 585 - 594.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Endocrinol. Metab.Home page
W. Rosner, R. J. Auchus, R. Azziz, P. M. Sluss, and H. Raff
Utility, Limitations, and Pitfalls in Measuring Testosterone: An Endocrine Society Position Statement
J. Clin. Endocrinol. Metab., February 1, 2007; 92(2): 405 - 413.
[Abstract] [Full Text] [PDF]


Home page
Hum ReprodHome page
R.I. McLachlan, E. Rajpert-De Meyts, C.E. Hoei-Hansen, D.M. de Kretser, and N.E. Skakkebaek
Histological evaluation of the human testis--approaches to optimizing the clinical value of the assessment: Mini Review
Hum. Reprod., January 1, 2007; 22(1): 2 - 16.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Endocrinol. Metab.Home page
D. J. Handelsman
The Rationale for Banning Human Chorionic Gonadotropin and Estrogen Blockers in Sport
J. Clin. Endocrinol. Metab., May 1, 2006; 91(5): 1646 - 1653.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
90/11/5928    most recent
Author Manuscript (PDF)
Right arrow Submit a related Letter to the Editor
Right arrow Purchase Article
Right arrow View Shopping Cart
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow Request Copyright Permission
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sikaris, K.
Right arrow Articles by Handelsman, D. J.
Right arrow