| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
REVIEW |
Knowledge and Encounter Research Unit (B.A.S., M.H.M., V.M.M.), Divisions of Endocrinology, Preventive Medicine, and Internal Medicine, Mayo Clinic College of Medicine, Rochester, Minnesota 55905; Clinical Advances Through Research And Information Translation Research Group (H.J.S., G.H.G.), Department of Clinical Epidemiology and Biostatistics, Faculty of Health Sciences, McMaster University, Hamilton, Ontario L8S4L8, Canada; Department of Epidemiology (H.J.S.), Italian National Cancer Institute "Regina Elena," 00161 Rome, Italy; Basel Institute for Clinical Epidemiology (R.K.), University Hospital Basel, CH-4031 Basel, Switzerland; and Diabetes Institute (R.A.V.), Walter Reed Health Care System, Washington, D.C. 20307
Address all correspondence and requests for reprints to: Victor M. Montori, M.D., M.Sc., Mayo Clinic, W18A, 200 First Street SW, Rochester, Minnesota 55905. E-mail: montori.victor{at}mayo.edu.
| Abstract |
|---|
|
|
|---|
Evidence Acquisition: The authors are involved in the development of the GRADE standard and its application to The Endocrine Society clinical practice guidelines. Examples were extracted from these guidelines to illustrate how this grading system enhances the quality of practice guidelines.
Evidence Synthesis: We summarized and described the components of the GRADE system, and discussed the features of GRADE that help bring clarity and consistency to guideline documents, making them more helpful to practicing clinicians and their patients with endocrine disorders.
Conclusions: GRADE describes the quality of the evidence using four levels: very low, low, moderate, and high quality. Recommendations can be either strong ("we recommend") or weak ("we suggest"), and this strength reflects the confidence that guideline panel members have that patients who receive recommended care will be better off. The separation of the quality of the evidence from the strength of the recommendation recognizes the role that values and preferences, as well as clinical and social circumstances, play in formulating practice recommendations.
| Introduction |
|---|
|
|
|---|
In this article we will discuss the processes involved in developing helpful and rigorous clinical practice guidelines in a manner congruent with the approach The Endocrine Society has adopted. We anticipate that this will assist endocrinologists and other parties who are interested in critically appraising, implementing, and enhancing The Endocrine Societys clinical practice guidelines.
| Developing Rigorous and Helpful Clinical Practice Guidelines |
|---|
|
|
|---|
Therefore, evidence-based guidelines are most helpful when they provide recommendations that are clear, based on the best available research evidence, and transparent in terms of reporting the quality of the evidence and the basis for determining the strength of the recommendations. Often this includes explicitly describing the pertinent values and preferences the guideline authors bring to bear in developing the recommendations.
For over a decade, most guideline groups have recognized that developing a summary categorization of the strength of the recommendations and the quality of the evidence supporting them, processes sometimes called grading (of the recommendation strength) and rating (of the evidence quality), helps clinicians understand a practice guidelines summary message. Multiple systems in use produce different grading and rating categories, and rely on different letters, numbers, symbols, and terms (2). This can cause confusion while clarity is needed.
To address this concern, the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) working group, comprised of expert methodologists and guideline developers from a variety of health care organizations, set out to: 1) evaluate these different systems, 2) develop one recommended grading system, and 3) disseminate this system throughout medical communities and their literature. The challenge was great because many systems were already in place, all systems have limitations, and many organizations have spent significant resources on developing their rating system (3). GRADEs design criteria included simplicity and applicability to a wide variety of clinical recommendations that encompass the full spectrum of patient management decisions. The GRADE working group first published their findings in 2004 (4).
Since that time, numerous organizations have adopted GRADE as their guideline grading system. These organizations include The Endocrine Society, World Health Organization, American College of Chest Physicians, UpToDate, American College of Physicians, American Thoracic Society, The Cochrane Collaboration, European Respiratory Society, Agency for Healthcare Research and Quality, and Society of Critical Care Medicine (a complete list is available on the GRADE working group web site) (5). An emerging consensus seems to be forming around the adoption of GRADE. This would be a welcome progression because such widespread adoption will help maintain clarity and consistency in guidelines across medical disciplines.
The Endocrine Society appraised the merits of the GRADE system and decided in late 2004 to adopt it as the basis for its clinical practice guidelines. The Endocrine Society was the first North American organization to adopt GRADE and use it in its Clinical Practice Guidelines program. Guidelines on the use of testosterone in men (6), on the treatment and prevention of pediatric obesity, and on the diagnosis and treatment of hirsutism are examples of the application of the GRADE system to The Endocrine Society guidelines. However, endocrinologists have not had access to a context-specific discussion of this system as it relates to guidelines in endocrinology. In the following sections, we will use endocrinology examples to illustrate how this grading system helps improve the rigor and usefulness of clinical practice guidelines.
| The GRADE System |
|---|
|
|
|---|
To enhance further the interpretation and clarity of the recommendations, guideline developers use the terms "we recommend" to denote strong recommendations, whereas weak recommendations use the less definitive wording "we suggest." Furthermore, a strong recommendation receives a grade 1 classification, and a weak recommendation receives a grade 2 classification. The symbols chosen for the four levels of quality of evidence are: 


(very low); 


(low); 


(moderate); and 


(high) quality. Table 1
provides an overview of the GRADE system and a closer look at the components of each of its recommendation categories.
|
| Strength of Recommendations |
|---|
|
|
|---|
If guideline developers are confident that the desirable effects of adherence to a recommendation outweigh the undesirable effects, they will make a strong recommendation within the context of a described intervention. Typically, this requires high or moderate quality evidence on patient important outcomes. Exceptionally, panels can make strong recommendations based on low to very low quality evidence. This may occur when the values and preferences guideline developers bring to bear are such that when considering even low quality evidence, they are confident that the benefits of an intervention outweigh the undesirable outcomes (or vice versa). In these cases the panel can make a strong recommendation for (or against) the intervention.
For example, consider the decision to administer aspirin or acetaminophen to children with chicken pox. Observational studies have noted an association between aspirin administration and Reye syndrome. Because aspirin and acetaminophen are, in this context, similar in their analgesic and antipyretic effects, guideline developers may make a strong recommendation for acetaminophen despite the low quality evidence suggesting harm from aspirin because they place a very high value on avoiding potential life-threatening adverse effects.
A weak recommendation is one for which a guideline panel concludes that the desirable effects of adherence to a recommendation probably outweigh the undesirable effects, but the panel is not confident. Thus, if guideline developers believe that benefits and downsides are finely balanced, or appreciable uncertainty exists about this balance, they offer a weak recommendation. Thus, low or very low quality evidence usually leads to weak recommendations because of uncertainty about the balance between risks and benefits. Guideline panels may offer weak recommendations even when high quality evidence is available when that evidence clearly demonstrates that the benefits and risks are closely balanced. For example, a guideline panel may weakly recommend bisphosphonates in relatively low-risk patients with osteopenia, in whom the burden and costs of monitoring and treatment may or may not be worth the potential reduction in the risk of fragility fractures documented in randomized trials.
Table 2
summarizes the factors that influence the strength of a recommendation, factors that broadly correspond to: 1) certainty about the balance between benefits vs. burdens and harms, 2) resource use, and 3) variation in values and preferences. Consideration of this latter issue is key. Guideline panels will typically, either explicitly or implicitly, use their own preferences as imperfect proxies of patient values. Alternatively, they could consider the range of patients to whom the recommendation applies, and their range of values and preferences. Ideally, they will find a way to ensure that the recommendation is consistent with the values and preferences of most patients. How to achieve this goal remains a challenge; one approach includes involving relevant patients as panel members or involving patient groups able to minimize influences that could bias their judgments in the assessment of values and preferences.
|
We do not know how individual clinicians can best achieve the goal of incorporating patient values and preferences in following a weak recommendation, but some promising approaches exist. For example, some clinicians are using decision aids. Decision aids are tools that help clinicians communicate to patients the relevant evidence about the available options and their relative merits in a quantitative form. Examples of these tools can be found elsewhere (for examples, see http://kerunit.e-bm.org). Randomized trials have shown that these tools can improve the quality of decision making in many clinical settings (7). Conversely, for strong recommendations, a decision aid could be an inefficient use of time and other resources; although it is plausible that having the patient participate in making treatment choices may enhance adherence to therapy (8).
| Quality of the Evidence |
|---|
|
|
|---|
Because of possible limitations that fall into five categories (Table 3
), even RCTs may not provide high quality evidence. First, there may be serious limitations in the design and conduct of RCTs (including lack of concealment and blinding, and large loss to follow-up), and these limitations would lead to a reduction in the quality of the evidence base (weakening the inference decision makers can draw from these data) and in turn a reduction in the quality level. For example, to inform guideline developers about the efficacy of physical activity on pediatric obesity, the authors reviewed the results of a metaanalysis of 20 relevant RCTs (9). These trials had no reported allocation concealment or blinding and had significant loss to follow-up (29% of studies reported greater than a 20% loss). Therefore, the guideline panel downgraded the quality of the evidence.
|
|
|
For example, when considering the use of testosterone gel to prevent fragility fractures in elderly hypogonadal men, evidence from trials enrolling younger men show that intramuscular testosterone can increase bone mineral density (12, 13). Here, the evidence informs the efficacy of a different testosterone formulation on a different patient group on a surrogate outcome of no importance, in and of itself, to patients (bone density rather than fracture risk); no high-quality trials have answered the question of direct relevance to the guideline developer. If a recommendation was made specific to the use of testosterone gel to prevent fractures in elderly men, the quality of the evidence would be downgraded based on indirectness with respect to the population, intervention, and outcome. Furthermore, the guideline panel interested in making recommendations about the use of testosterone for osteoporosis will have to rely only on indirect comparisons (i.e. trials of each agent against placebo but no head-to-head trials) when considering the relative merits of testosterone vs. bisphosphonates, for instance.
Fourth, guideline developers should downgrade evidence when few studies, involving few participants and, most importantly, documenting few outcomes, inform the tradeoffs of risks and benefits. As an example, a metaanalysis of the results of trials evaluating the effects of testosterone on cardiovascular outcomes suggests that testosterone does not have an effect on cardiovascular events. However, this result is based on only six trials, a total of 308 participants, and only 21 outcomes. Considering the confidence interval width, the pooled data are consistent with both a 1-fold decrease and a 4-fold increase in the odds of cardiac events in patients treated with testosterone (14). This evidence carries great uncertainty, lowering the confidence that the estimates are accurate.
Finally, guideline developers should have limited confidence when reporting bias might have affected the underlying evidence. Publication bias, one form of reporting bias, occurs because trials that show no significant effect are less likely to be published, and outcome reporting bias occurs when researchers selectively report their findings depending on their significance. Clinical trial registries may help reduce publication bias (15). Chan et al. (16) found that reporting of trial outcomes is frequently incomplete, biased, and inconsistent with the original trial protocols. Prospective public registration of trial protocols could help diminish this concern. Box 1 describes an example of reporting bias. Publication bias is more likely to take place in fields in which small trials are the norm (e.g. many endocrinopathies) because large trials are less likely to remain unpublished. Although difficult to ascertain, reporting bias is prevalent, particularly when key patient-important outcomes are only reported in a few studies.
In contrast to RCTs, observational studies start with a "low" (i.e. case-control studies, and cohort studies) or "very low" (i.e. unsystematic clinical observations, case reports and series) quality level but may be upgraded in certain situations, e.g. when the magnitude of the treatment effect is very large (e.g. use of insulin to prevent morbidity and mortality in patients with type 1 diabetes presenting in diabetic ketoacidosis; use of glucocorticoids to prevent adrenal crisis in patients with Addisons disease). Thus, it is very important in guidelines to specify clearly the alternatives considered. Although high quality evidence, as we have seen, supports the use of glucocorticoids to prevent adrenal crises in patients with Addisons disease, low quality evidence supports the choice of a specific glucocorticoid replacement regimen out of several in common use.
In addition, the quality level can increase when all plausible confounders would reduce the magnitude of the treatment effect, yet the effect remains sizeable. For example, a systematic review showed higher mortality in for-profit hospitals when compared with not-for-profit hospitals (17). This result occurred despite the fact that for-profit hospitals usually have additional resources available and generally admit healthier patients, factors that should work in their favor. Considering these confounders would increase the magnitude of benefit of not-for-profit hospitals (3). Table 3
summarizes factors that influence the quality of evidence.
| Values and Preferences |
|---|
|
|
|---|
Consider the interpretation of guidelines in the case of an individual patient. A guideline may weakly recommend (a "suggestion," using the terminology of The Endocrine Society Clinical Practice Guidelines) that patients receive treatment with a medication based on low quality evidence because there is uncertainty about the tradeoffs between potential desirable and undesirable effects. An individual patient may place a high value on potential resolution of their symptoms and a low value on avoiding possible side effects, costs, and follow-up visits and tests while taking the medication. Such a patient may prefer to take this medication, in keeping with the suggestion. Another patient in similar circumstances may have different values, placing a higher value on avoiding potential adverse effects, costs, and burdens of medical treatment.
For example, when making a decision on treatment options for the prevention of osteoporotic fractures, some experts may formulate recommendations in favor of treatment with teriparatide for women at high fracture risk. One woman may share values and preferences in keeping with this recommendation, whereas another woman, in the same situation, may find the route of administration (injection) or the cost of teriparatide unacceptable and would thus prefer not to take the medication. The use of the GRADE system, with its transparency, offers patients and clinicians the opportunity to consider and make different clinical decisions, including decisions to not use an intervention that is weakly recommended (or to use one that the guideline weakly recommends against).
The appendix (published as supplemental data on The Endocrine Societys Journals Online web site at http://jcem.endojournals.org) offers illustrations from The Endocrine Society Clinical Practice Guidelines to highlight the issues presented here.
| Future Directions |
|---|
|
|
|---|
In regards to considering resource allocation in guidelines, there are challenges concerning the clarity, conflicts, validity, and applicability of the evidence (e.g. cost-effectiveness analyses), challenges in the interpretation and use of economic analyses to formulate guidelines (without the guidance of a health economist), and the impact of such analyses when guidelines are intended for broad, or even international, audiences. The American College of Chest Physicians has suggested an approach to this problem that is consistent with GRADE (18). The GRADE working group is preparing documents and a conference that will provide additional guidance on this topic.
There is also uncertainty as to the ideal composition of the guideline panel. Some favor broad representation, expanding from the usual set of clinical experts to include patients and health officials. However, how to select patients for participation in guidelines (e.g. highly educated patients are likely to participate actively, but they may not share values with many other patients), how to engage them into the process, and how to acknowledge their contribution is the subject of evolving science (19, 20, 21). The promise of being able to incorporate values and preferences in guideline development through direct patient consultation seems a fascinating prospect.
| Conclusions |
|---|
|
|
|---|
Box 1. An example of reporting bias
A systematic review of the effects of testosterone on erection satisfaction and function in patients with low testosterone offers an example of reporting bias. In this review the authors found one large trial that specifically addressed this issue in addition to three smaller trials (11). However, the large trials results on the outcome of interest were reported only as "not significant" in the published paper; the actual data were not reported and, therefore, could not be used in a metaanalysis. Using the data from the three other trials, there was a large treatment effect noted with testosterone therapy (difference between arms of 1.3 SD values, 95% confidence interval 0.2 to 2.3). However, after obtaining the complete data on the larger trial, the new pooled treatment effect was smaller in magnitude, much less precise, and no longer significant (0.8 SD values, 95% confidence interval –0.05 to 1.63), an example of reporting bias (23).
| Acknowledgments |
|---|
| Footnotes |
|---|
First Published Online January 2, 2008
Abbreviations: GRADE, Grading of Recommendations, Assessment, Development, and Evaluation; RCT, randomized controlled trial.
Received August 24, 2007.
Accepted December 21, 2007.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
R. E. Johnson and M. H. Murad Gynecomastia: Pathophysiology, Evaluation, and Management Mayo Clin. Proc., November 1, 2009; 84(11): 1010 - 1015. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. C. Hembree, P. Cohen-Kettenis, H. A. Delemarre-van de Waal, L. J. Gooren, W. J. Meyer III, N. P. Spack, V. Tangpricha, and V. M. Montori Endocrine Treatment of Transsexual Persons:An Endocrine Society Clinical Practice Guideline J. Clin. Endocrinol. Metab., September 1, 2009; 94(9): 3132 - 3154. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Maiorana and S. Cianfarani Impact of Growth Hormone Therapy on Adult Height of Children Born Small for Gestational Age Pediatrics, September 1, 2009; 124(3): e519 - e531. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. E. Cryer, L. Axelrod, A. B. Grossman, S. R. Heller, V. M. Montori, E. R. Seaquist, and F. J. Service Evaluation and Management of Adult Hypoglycemic Disorders: An Endocrine Society Clinical Practice Guideline J. Clin. Endocrinol. Metab., March 1, 2009; 94(3): 709 - 728. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. P. August, S. Caprio, I. Fennoy, M. Freemark, F. R. Kaufman, R. H. Lustig, J. H. Silverstein, P. W. Speiser, D. M. Styne, and V. M. Montori Prevention and Treatment of Pediatric Obesity: An Endocrine Society Clinical Practice Guideline Based on Expert Opinion J. Clin. Endocrinol. Metab., December 1, 2008; 93(12): 4576 - 4599. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Rosenzweig, E. Ferrannini, S. M. Grundy, S. M. Haffner, R. J. Heine, E. S. Horton, and R. Kawamori Primary Prevention of Cardiovascular Disease and Type 2 Diabetes in Patients at Metabolic Risk: An Endocrine Society Clinical Practice Guideline J. Clin. Endocrinol. Metab., October 1, 2008; 93(10): 3671 - 3689. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. W. Funder, R. M. Carey, C. Fardella, C. E. Gomez-Sanchez, F. Mantero, M. Stowasser, W. F. Young Jr., and V. M. Montori Case Detection, Diagnosis, and Treatment of Patients with Primary Aldosteronism: An Endocrine Society Clinical Practice Guideline J. Clin. Endocrinol. Metab., September 1, 2008; 93(9): 3266 - 3281. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. K. Nieman, B. M. K. Biller, J. W. Findling, J. Newell-Price, M. O. Savage, P. M. Stewart, and V. M. Montori The Diagnosis of Cushing's Syndrome: An Endocrine Society Clinical Practice Guideline J. Clin. Endocrinol. Metab., May 1, 2008; 93(5): 1526 - 1540. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. Martin, R. J. Chang, D. A. Ehrmann, L. Ibanez, R. A. Lobo, R. L. Rosenfield, J. Shapiro, V. M. Montori, and B. A. Swiglo Evaluation and Treatment of Hirsutism in Premenopausal Women: An Endocrine Society Clinical Practice Guideline J. Clin. Endocrinol. Metab., April 1, 2008; 93(4): 1105 - 1120. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Endocrinology | Endocrine Reviews | J. Clin. End. & Metab. |
| Molecular Endocrinology | Recent Prog. Horm. Res. | All Endocrine Journals |