| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
CLINICAL REVIEW |
Knowledge and Encounter Research Unit (M.B.E., M.H.M., R.M., P.J.E., V.M.M.), Division of Preventive Medicine (M.H.M.), Division of Endocrinology, Diabetes, Metabolism, Nutrition (D.E., S.N., V.M.M.), Department of Medicine (M.H.M., D.E., K.H., R.E., V.M.M.), Mayo Clinic, and Mayo Clinic Libraries (P.J.E.), Mayo Clinic, Rochester, Minnesota 55905
Address all correspondence and requests for reprints to: Victor M. Montori, M.D., M.Sc., Mayo Clinic, W18A, 200 First Street SW, Rochester, Minnesota 55905. E-mail: montori.victor{at}mayo.edu.
| Abstract |
|---|
|
|
|---|
Objective: Our objective was to summarize evidence on the accuracy of common tests for diagnosing CS.
Data Sources: We searched electronic databases (MEDLINE, EMBASE, Web of Science, Scopus, and citation search for key articles) from 1975 through September 2007 and sought additional references from experts.
Study Selection: Eligible studies reported on the accuracy of urinary free cortisol (UFC), dexamethasone suppression test (DST), and midnight cortisol assays vs. reference standard in patients suspected of CS.
Data Extraction: Reviewers working in duplicate and independently extracted study characteristics and quality and data to estimate the likelihood ratio (LR) and the 95% confidence interval (CI) for each result.
Data Synthesis: We found 27 eligible studies, with a high prevalence [794 (9.2%) of 8631 patients had CS] and severity of CS. The tests had similar accuracy: UFC (n = 14 studies; LR+ 10.6, CI 5.5–20.5; LR– 0.16, CI 0.08–0.33), salivary midnight cortisol (n = 4; LR+ 8.8, CI 3.5–21.8; LR– 0.07, CI 0–1.2), and the 1-mg overnight DST (n = 14; LR+ 16.4, CI 9.3–28.8; LR– 0.06, CI 0.03–0.14). Combined testing strategies (e.g. a positive result in both UFC and 1-mg overnight DST) had similar diagnostic accuracy (n = 3; LR+ 15.4, CI 0.7–358; LR– 0.11, CI 0.007–1.57).
Conclusions: Commonly used tests to diagnose CS appear highly accurate in referral practices with samples enriched with patients with CS. Their performance in usual clinical practice remains unclear.
| Introduction |
|---|
|
|
|---|
The aging population and the obesity epidemic are making some features of CS, such as central obesity, hypertension, hyperglycemia, and bone fragility, common. Therefore, detecting patients with CS, particularly those with milder forms, requires accurate tests that are able to discriminate patients with and without hypercortisolism (3, 4, 5).
To summarize the available evidence of diagnostic accuracy of tests of abnormal cortisol overproduction, The Endocrine Society Cushings Syndrome Task Force commissioned us to conduct a systematic review of diagnostic accuracy of diagnostic tests for CS.
| Materials and Methods |
|---|
|
|
|---|
Eligibility criteria
We included cross-sectional and longitudinal studies that enrolled participants with true diagnostic uncertainty. Therefore, the diagnosis of CS could not be a criterion for enrollment in these studies, so-called phase II and III diagnostic studies (7). These studies may have included individuals selected because they had physical findings or comorbid conditions suggestive of CS.
Tests of interest were urinary free cortisol (UFC), serum and salivary midnight/bedtime cortisol, 1-mg overnight dexamethasone suppression test (DST) or the 2-d 2 mg DST. Eligible studies had a reference standard for diagnosing CS. Eligible reference standards included a pathological diagnosis, response to therapy targeting CS, or clinical follow-up (i.e. consensus among treating clinicians about a diagnosis of CS). Eligible studies measured the accuracy of test results with results expressed as 1) both sensitivity and specificity or 2) likelihood ratio. We included studies regardless of their publication status, language, or size.
Study identification
An expert reference librarian (P.J.E.) designed and conducted the electronic search strategy with input from study investigators with expertise in conducting systematic reviews. To identify eligible studies, we searched electronic databases (MEDLINE, EMBASE, Web of Science, Scopus, and citation search for key articles) from 1975 through September 2007. The detailed search strategy is available upon request. We also sought references from experts from The Endocrine Society Cushings Syndrome Task Force.
Reviewers working independently and in duplicate reviewed all abstracts and titles and, upon retrieval of potentially eligible studies, the full text publications for eligibility with adequate chance-adjusted inter-reviewer agreement (
statistic = 0.6; 95% confidence interval 0.4–0.7). Disagreements were resolved by consensus or arbitration.
Quality assessment
Reviewers working independently and in duplicate analyzed the eligible articles to assess the reported quality of the methods. We followed the tool for quality assessment of studies of diagnostic accuracy included in systematic reviews (QUADAS) (8).
Data extraction
Reviewers working independently and in pairs used a standardized form to extract a full description of study participants, including judgments about the extent of diagnostic uncertainty, the presence of comorbid conditions as eligibility criteria (not as characteristics of the sample), the tests and the procedures followed to conduct them, the cutoff or range definitions of diagnostic tests, whether these cutoffs were derived from previous research or determined by study authors, and the nature and characteristics of the reference standard used. To extract data to estimate diagnostic accuracy measures, we used the cutoffs authors chose to use in the primary studies. If more than one cutoff was reported or if the results were reported at the individual patient level, then we chose to use cutoffs that offered the best test performance.
Author contact
We sent letters to the corresponding authors (or any other author with contact address listed on the main manuscript) of each of the eligible studies by electronic mail (regular mail if we could not obtain an active e-mail). We asked these authors to verify the data we extracted and to complete missing data we could not identify in the published record. In case of no response, we repeated the request 2 wk later.
Statistical analysis
We used Meta-DiSc Software for Meta-analysis for Screening and Diagnostic tests version 1.4 (9). Using random effects metaanalyses, we pooled the sensitivities, specificities, likelihood ratios, and diagnostic odds ratio and estimated the 95% confidence intervals for the outcomes. Because the pooled sensitivity and the pooled specificity are interrelated, we focused our analyses on estimating and pooling likelihood ratios and diagnostic odds ratios. The diagnostic odds ratio of a test describes the ratio of the odds of a positive test result in patients with disease compared with patients without disease (10) and can be calculated as the ratio of the likelihood ratios for a positive and a negative test. It has the advantage of being a single indicator of test performance that provides a global meaning of agreement between a test and a reference standard and allows for pooling across studies when the main source of inconsistency is the threshold to consider a test positive [i.e. when there is a common receiver operator characteristic (ROC) curve across all studies].
Summary ROC curves allow readers to visually inspect the consistency of results across studies (answering the question of whether there is a single ROC curve across all these studies) and the accuracy of the test, as judged by the area under the summary ROC curve, in discriminating between patients with and without CS. In contrast to ROC curves in which individual data points represent different test cutoffs, in summary ROC curves, each point represents a study (11). We assessed the inconsistency among studies using the I2 statistic, which represents the proportion of variability across studies that is not due to chance. I2 values of 25, 50, and 75% indicate low, moderate, and high heterogeneity, respectively (12).
Subgroup analyses
A priori hypotheses to explain potential heterogeneity among studies included severity of CS, selection bias (i.e. samples of consecutive patients with high prevalence of CS), type of patients (referred because of clinicians suspicion of CS vs. no CS suspicion), cutoff rationale (driven by outcomes in the same sample, e.g. chosen to maximize specificity, or by the upper limit of the assay), and tests characteristics (sensitivity of the assay, use of liquid chromatography vs. RIA). We tested these hypotheses using a test for interaction considering P < 0.05 as significant (13), because we did not have enough studies to conduct meta-regression (14).
| Results |
|---|
|
|
|---|
Initial search of the literature yielded 1791 publications, of which 124 were potentially relevant to this review based on titles and abstracts (Fig. 1
). After full text review, we found 27 eligible studies (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41). We excluded one study from analyses because there were no CS cases in the sample (26) and excluded another study because we could not obtain essential data from the author (15).
|
Study characteristics
Table 1
summarizes the baseline characteristics of eligible studies. Fourteen studies assessed the diagnostic accuracy of UFC, six midnight serum cortisol, four midnight salivary cortisol, 14 the 1-mg overnight DST, and eight the 2-d 2 mg DST. Of 8631 patients enrolled in these studies, 794 (9.2%) had CS.
|
Table 2![]()
summarizes the methodological quality of the 27 included studies. Almost all studies enrolled patients with apparent diagnostic uncertainty of spectrum similar to the population in whom clinicians would use the tests in clinical practice (42). However, there is a strikingly broad range in the prevalence of CS across these studies, suggesting some degree of selection or referral bias. Their selection criteria were clearly described, and all received a reference standard that either diagnosed or excluded CS.
|
|
The appendix (published as supplemental data on The Endocrine Societys Journals Online web site at http://jcem.endojournals.org) includes tables with the test accuracy data from each included study (supplemental Tables 1–6). Table 3
shows pooled likelihood ratios for test results considered positive and negative. Table 3
also reports the diagnostic odds ratio and its associated inconsistency statistic (I2). Where the subgroup analyses revealed a significant interaction, we report the effect in each of the subgroups in addition to the pooled estimates, because the latter may have less validity. [Supplemental Figs. 1–5 (published as supplemental data on The Endocrine Societys Journals Online web site at http://jcem.endojournals.org) show summary ROC curves for the tests of interest].
|
|
Except where noted in Table 3
, all other subgroup analyses explored were not associated with significant test accuracy-subgroup interactions (see supplemental Tables 7–11).
Sensitivity analyses
Most patients included in the metaanalysis were enrolled in a single study (33). A sensitivity analysis, in which we removed this study, revealed similar pooled accuracy results (data not shown).
Zwinderman and Bossuyt (44) have proposed the use of bivariate random-effects metaanalysis to analyze the sensitivities and specificities together from which one could derive pooled likelihood ratios, rather than pooling the likelihood ratios directly; in this data set, however, the bivariate approach yields results consistent with those presented here (data not shown).
| Discussion |
|---|
|
|
|---|
We conducted a systematic review and metaanalyses of studies that enrolled patients with diagnostic uncertainty and conducted a test for hypercortisolism and a satisfactory reference standard test. This review offers 1) a survey comprised of mostly small studies with high prevalence of CS from referral centers, 2) pooled test characteristics that represent the best estimates of test accuracy for each of the tests assessed and their combinations, and 3) inconsistent results across studies that are not explained by the choice of test thresholds but likely represent differences in the spectrum of patients with and without CS, in the characteristics of the tests used, and in the definitions of CS. These inconsistencies remain unexplained given the limitations in our ability to explore these differences with few studies.
In all, we found that the UFC and the overnight DST have the most evidence supporting their use for the detection of CS, with limited evidence supporting the use of salivary and serum midnight cortisol tests. Limited evidence also supports the use of these tests in combination to both identify and exclude patients with CS. In two instances in which the inconsistency across studies was important, we were able to identify potential explanations. For the midnight serum cortisol test compared with assay-driven thresholds, outcome-driven thresholds overestimated test accuracy (i.e. test interpretation was fitted to the data in the studies). For the 1-mg overnight DST, studies in which the prevalence of CS was greater than 50% (the median across studies) reported more modest test characteristics, especially more false-positive test results. This paradoxical result may be due to chance, to a lower cortisol threshold for positivity, or to patients without CS who had other syndromes associated with impaired cortisol suppression.
Limitations and strengths
The key limitations of this review refer to the relative paucity of evidence of test accuracy for the evaluated tests and to the methodological quality of the included studies. In particular, the prevalence and severity of CS varies importantly across studies despite the authors representation of their populations as consecutive samples of patients referred without clear diagnosis. It is also striking that these studies rarely report indeterminate cases, given how often there is residual diagnostic uncertainty even among patients evaluated in centers of excellence. Finally, the report of a single cutoff in many of these studies precludes the estimation of likelihood ratios for ranges of test results. The arbitrary choice of test threshold and the dichotomy of the test results into positive and negative may contribute to a dichotomous view of diagnosis in which patients either have or do not have CS rather than a Bayesian approach in which additional test results modify the probability that a given patient has CS.
Incomplete searching, arbitrary study selection, poor quality of the primary studies, misguided analyses, and results that cannot be applied in practice represent potential limitations of systematic reviews. The extent to which publication bias affects studies of test accuracy is unknown, and the performance of tests of publication bias in the context of heterogeneous results is problematic (45); the accuracy of the indexing of such studies in the electronic databases is also unclear (46). Yet, our overlapping search strategies and extensive input from clinical experts should have minimized the chances that we missed studies that could substantially change the inferences drawn from this study.
Our review has the strengths of systematic reviews that summarize the totality of the available evidence following a protocol-driven procedure with explicit eligibility criteria, reproducible judgments about study quality and selection, and focused analyses (47). We also provide in the appendix the data from each of the studies to facilitate readers secondary analyses. Given our focus on samples of patients in whom there was diagnostic uncertainty (phase II and III diagnostic studies) (7), we may have successfully ameliorated the overestimation of test accuracy that results from so-called phase I diagnostic accuracy studies in which investigators evaluate the accuracy of the test in distinguishing patients with clear confirmed disease and individuals who are clearly free of disease. We were forced to use a single cutoff when many were reported from a given study with the subsequent loss of information and gain in simplicity and transparency. Yet, our analyses take into account inconsistencies associated with the choice of threshold (i.e. using the diagnostic odds ratio).
Because of our study selection criteria, this reviews results do not apply to patients with adrenal incidentaloma or to patients with suspected intermittent or so-called cyclical CS. Because of the high prevalence of CS in the included studies, the applicability of this study to general practice settings or to general endocrine practices is unclear.
With these limitations and strengths, clinicians seeking to apply these results in their practice can use a Fagan nomogram to update their estimates of the probability their patients have CS (Fig. 2
). Given the close biological relationship between the tests assessed here, it may be unwise to use this procedure to estimate the posttest probability when several of these tests are performed in series.
Implications for practice and research
The accompanying Endocrine Society practice guideline on the diagnosis of CS contains the practical implications of the results of this review. The Task Force recommends a particular algorithm that seeks to balance diagnostic accuracy with practical and logistical considerations.
Our systematic review has uncovered several research gaps in this area. From the laboratory perspective, laboratory and test manufacturers should seek and maintain standards for measuring cortisol in urine, serum, and saliva. Variability today introduces variability in the literature and in clinical practice and impairs clinicians ability to apply published cutoffs and results to their practice.
From the diagnostic accuracy perspective, prospective studies of the proposed algorithm may uncover further advantages and disadvantages of the proposed approach, including the downstream consequences of patient misclassification. Further work to evaluate the accuracy of testing algorithms in consecutive patients in whom clinical features suggest CS should 1) yield more accurate estimates of the diagnostic power of test results, 2) report findings using likelihood ratios for test result ranges rather than forcing a single cutoff on the data, and 3) use diagnostic categories that include those who clearly have and do not have CS and those with indeterminate results (48). Given the low incidence of CS and the increasing incidence of conditions with similar features (truncal obesity, bone loss, hyperglycemia, and hypertension), rigorous research is likely to yield more conservative estimates of test performance than those summarized here.
For stronger recommendations in the future, guideline panels will require evidence that patients are better off in important ways when they receive a diagnosis when the disease is subtle and mild rather than when it is florid and severe. The paucity of both patients and resources mandates collaboration across centers of excellence (i.e. endocrinologists with an interest in CS working in academic medical centers) tightly integrated with their referral sources (i.e. primary care and internal medicine clinicians) to generate this much-needed research evidence.
Conclusions
Commonly used tests to diagnose CS appear highly accurate, particularly when used in combination, in referral practices with samples enriched with patients with CS. Their performance in usual clinical practice remains unclear.
| Acknowledgments |
|---|
| Footnotes |
|---|
Disclosure Statement: M.B.E., M.H.M., R.M., D.E., K.H., S.N., R.E., P.J.E., and V.M.M. have nothing to declare.
First Published Online March 11, 2008
Abbreviations: CS, Cushings syndrome; DST, dexamethasone suppression test; ROC, receiver operator characteristic; UFC, urinary free cortisol.
Received January 22, 2008.
Accepted March 3, 2008.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
S. K. Baid, D. Rubino, N. Sinaii, S. Ramsey, A. Frank, and L. K. Nieman Specificity of Screening Tests for Cushing's Syndrome in an Overweight and Obese Population J. Clin. Endocrinol. Metab., October 1, 2009; 94(10): 3857 - 3864. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Raff Utility of Salivary Cortisol Measurements in Cushing's Syndrome and Adrenal Insufficiency J. Clin. Endocrinol. Metab., October 1, 2009; 94(10): 3647 - 3655. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Boscaro and G. Arnaldi Approach to the Patient with Possible Cushing's Syndrome J. Clin. Endocrinol. Metab., September 1, 2009; 94(9): 3121 - 3131. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. K. Nieman, B. M. K. Biller, J. W. Findling, J. Newell-Price, M. O. Savage, P. M. Stewart, and V. M. Montori The Diagnosis of Cushing's Syndrome: An Endocrine Society Clinical Practice Guideline J. Clin. Endocrinol. Metab., May 1, 2008; 93(5): 1526 - 1540. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Endocrinology | Endocrine Reviews | J. Clin. End. & Metab. |
| Molecular Endocrinology | Recent Prog. Horm. Res. | All Endocrine Journals |