A new mini-series on rating evidence for clinical guideline panelists

You're on a clinical guideline panel and it's time to rate the evidence.

No worries. Your panel should have one or more methodologists to do the heavy lifting here. But you should have some understanding of the process, not only to know what's going on but to actively contribute.

Let's start with an overview of the process for rating a body of evidence.

Start with your outcomes. You should have specified all the outcomes of interest when you developed your clinical questions. The quality of the body of evidence will be rated separately for each of the outcomes you have selected.

Rating the body of evidence is different than rating the quality of the individual studies (though that is one of the components). There are several other criteria but most rating systems agree on at least these factors:

  • An aggregate of all the assessments of the quality of the individual studies. The focus is on validity, or the minimization of bias in measurement within the studies
  • The reproducibility of the the results from the individual studies. This focus is on consistency of effect and requires multiple tests or studies
  • The power or precision of the measurements around the estimated effect and the strength of the effect

There are a number of systems to rate the evidence available to guideline developers. But one system has become commonly used or adapted for use by most guideline developers. It is known as the GRADE approach.

In a series of short blogs, TheEvidenceDoc will briefly describe some of the features and steps of the GRADE approach, particularly those that are often misunderstood by guideline panelists. 

TheEvidenceDoc November 7, 2017







Composite Endpoints - Canny or cunning use of #healthoutcomes data?

You are on a guideline panel and you're at the PICO (Population - Intervention - Comparison - Outcome) stage of development. It's time to choose important outcomes.

A reminder - important outcomes are those important to the patients who are affected by the disease or condition. So for studies of diabetes, patient important outcomes are things like premature mortality or heart attack but not blood sugar levels. Lowering blood sugar levels is an intermediate step in the process to better health for diabetics, so would be considered a surrogate or intermediate outcome. This only indirectly measures what we are interested in, so the evidence wouldn't be rated as strong as the evidence for outcomes of direct importance.

So how do composite endpoints fit into this? What are composite endpoints? Composite endpoints (CEP) or composite outcomes are combined endpoints used in some clinical trials, particularly common in cardiology trials.

 According to a systematic review by Ferreira-Gonzalez et al, the most common reasons cited for using CEP are the smaller study size requirement and to evaluate the net effect of an intervention. Avoiding adjustments for multiple comparisons was also cited as a rationale for use. Disadvantages to using included misinterpretation when the components differed in patient importance or in size and direction of the effect.

A systematic review by Cordoba et al of 114 RCTs published in 2008 that used CEP found that changes in the definition of the composite outcome during the trials were common. Selection of components was often not pre-specified and definitions were inconsistently described throughout the study reports. Those trials also failed to report treatment effect for the individual components in a third of the publications. The less important components often had higher event rates and larger effects associated with treatment. Cordoba and colleagues recommended that "composite endpoints should generally be avoided, as their use leads to much confusion and bias. If composites are used, trialists should follow published guidance."

Fortunately, there is published guidance to direct decisions on how to create composite endpoints.  We can use this guidance to help us in determining whether or not composite endpoints may be valid and utilized in our guideline development.

Freemantle and colleagues use examples to demonstrate the problems with composite outcomes, including the presumption that the benefit described may be attributed to all the components when in fact, it is derived from only one component. The opposite also occurs; measures of a positive treatment effect for a critical outcome can be diluted by an outcome with no effect. And they provide data showing that CEP including clinician driven outcomes - where physicians order the intervention - were twice as likely to be associated with statistically significant results for the composite outcomes. Examples would include things like revascularization, hospitalization, and initiation of new therapy.

Montori and colleagues have produced an educational paper using examples to summarize three major considerations for evaluating the validity of composite endpoints. They are:

  1. Ensure that the component endpoints are of similar importance to patients. Most patients would not equate serious endpoints like death or heart attack with need for change in therapy.
  2. Ensure that the more and less important endpoints occur with similar frequency. If the more important events are uncommon (as is often the case for mortality) the composite measure is likely to be driven by the more common though less important events.
  3. Ensure that the component endpoints are likely to have similar risk reduction. Individual components should be similarly affected by the intervention.

There's another challenge when systematically collecting and summarizing the evidence on a given topic. Since CEP definitions frequently change, even within studies, it is very difficult to find standard definitions used across studies. This limits your ability to collect and combine the data from multiple studies for your guideline.

The easy answer for many guideline panels will be to simply exclude CEP from your outcome selections. But if you decide to consider their importance for your topic, you now have some guidance for evaluating that CEP.

And if you want to ponder that proposed benefit of using CEP to evaluate net effect by accounting for competing risks, I suggest you read this systematic review by Manja and colleagues

And though this very brief summary is directed at guideline developers, it wouldn't hurt trialists to learn a bit more about CEP.

TheEvidenceDoc August 7, 2017


Is your #FallsPrevention Program Maximizing Inpatient Falls Risk Assessment?

Some patients will fall while they are in the hospital. How many? Bouldin and colleagues used National Database of Nursing Quality Indicators (NDNQI) data to find an overall fall rate in the U.S. of 3.53 falls per 1,000 patient days. The highest rates were in medical units, at 4.03 falls per 1,000 patient days. Overall, one fourth of the falls were associated with injury.(1) Collectively, the problem is large, and is estimated to impact 2% of all hospitalizations.(1)

Beginning in 2008, the Centers for Medicare and Medicaid Services (CMS) decided to stimulate U.S. hospital efforts to reduce inpatient falls by ending payments to hospitals for the additional costs associated with injury from inpatient falls.(2) As a result, fall prevention is one of the top priorities for most hospitals.

Nearly all fall prevention activities include the use of falls risk assessment tools. In the US, wristbands and bed signs are common warning labels to indicate patients identified as being at increased risk of falling through the use of these tools. But if risk identification is the main focus of your falls prevention program you are missing an opportunity to individualize fall prevention activities and reduce inpatient falls.


Fall risk assessments are not very good at risk prediction.

Does this surprise you? Several comprehensive systematic reviews have evaluated the performance of risk assessment tools for predicting an individual inpatient’s risk of falling.(3,4,5) None of the tools perform better than clinical judgment.

How can this be? You may remember from your epidemiology training that sensitivity and specificity respectively measure the proportion of fallers who tested positive with your tool and the proportion of non-fallers who tested negative with your tool. But did you remember other measures of the value of a screening or diagnostic tool? The positive predictive value is a better measure to evaluate the likelihood that a person who tests positive will have the condition. The negative predictive value measures the likelihood that a person who tests negative will not have the condition. Both measures are dependent on the validity of the tool and they are also dependent on the prevalence of the condition, which in this case is falls. How prevalent are falls in your organization?

Let’s look at some data presented in the NICE guideline to see how a tool with sensitivity and specificity of at least 70% (one of the inclusion criteria for studies for the NICE guideline) can have much lower predictive value.

For the Hendrich Fall Risk Model (data extracted for the NICE guideline from Hendrich et al 1995 as reported in the NICE guideline) (5)

There were 102 total patients who fell out of 338 patients. The test correctly labeled 79 of the 102 as falling, and 169 of the 236 as not falling

The sensitivity is 79/(79+23) = 77%.          The specificity is 169/(169+67) = 72%

But the positive predictive value is much lower. This measure looks at how many of the patients predicted to fall actually fell.

So the positive predictive value is 79/(79+67) = 54%.

Remember the predictive value is dependent on the condition prevalence, and even though falls are among the most common adverse events in the hospital, most inpatients do not fall.  

The negative predictive value, 169/(169 +23) = 88% is much better, but still misses the opportunity to identify and prevent falls in 12% of the people who were labeled low risk.


Using fall risk assessments to label patients as high or low fall risk misses the opportunity to individualize care. Should falls risk assessment be abandoned as a patient strategy? No, but the use of risk assessment as a screen to simply label patients should be abandoned. That use misses the opportunity to tailor interventions to patient needs to reduce the risk of falling during their hospital stay.

It’s not just that the quality of the evidence for inpatient risk prediction using any risk assessment tool is low or very low. Risk prediction produces a simple label that by itself does not guide staff action. After identifying a patient as being at increased risk, what is the next action step? Also, inpatient staff not regularly assigned to the patient, and other staff like lab, radiology and other technicians don’t know how to assist based on a simple warning alert. Patients, their family, and their friends may not have understanding of why the patient is labeled at risk nor what they can each do to change that risk. Risk predictions and resulting simple labels don’t provide actionable interventions. There is also potential for alert fatigue if too many patients are labeled with fall risk.  

There is moderate evidence from several recent systematic reviews that multi-factorial interventions can reduce inpatient falls, when multi-factorial is defined as interventions that are individually tailored to each patient’s modifiable risk factors. (6-11) How are the factors determined? Risk factors are identified through use of a risk assessment, whether tool or clinical judgment.

If risk assessment is used to identify why the patient is at risk of falling and then coupled with interventions directed to reduce those specific risks, it can be effective.  These systematic reviews and systematic overviews have concluded there is moderate evidence of the effectiveness of multi-factorial interventions. (6-11) Some of these review authors have lamented that the interventions vary so much from study to study that it is difficult to determine the essential elements. But that is just the point. The interventions must vary because falls are multi-factorial. Inpatient falls can be due to patient intrinsic factors like instability due to reductions in balance, strength, and agility or from loss of vision. They can arise from factors associated with hospitalization like unfamiliar surroundings, treatments and activity restrictions that add to confusion and instability.  And they can arise from combination of the above factors and additional factors like reduced bowel and bladder control leading to fear of toileting needs that can’t be met in the hospital environment.  To reduce preventable falls, interventions must be tailored to individual needs. The needs and plan of action must be communicated clearly to all staff who come in contact with the patient and with the patient and their loved ones so that everyone is empowered to reduce the opportunities for that patient to experience a fall.

Partners Health-Care System has tested this approach and successfully reduced falls from 4.18 to 3.15 per 1,000.  Dykes and colleagues used health information technology to incorporate risk assessment results (they used the Morse scale) with targeted intervention strategies. They developed specific signage and communication tools to use with patients and staff to clearly communicate the specific risks and the resulting actions to use to reduce falls. (12)

We have sufficient evidence from systematic review and from field-tested studies to stop using risk assessments for simple prediction and to start using them as the foundation to build individualized, multifactorial patient care plans. Is your organization using that evidence to build your fall prevention program?

TheEvidenceDoc 2015


1. Boudin ED, Andresen EM, Dunton NE et al. Falls among Adult Patients Hospitalized in the United States: Prevalence and Trends. J Patient Saf. 2013 March; 9(1): 13–17.

2. CMS Final Rule Federal Register August 19, 2008. http://www.gpo.gov/fdsys/pkg/FR-2008-08-19/html/E8-17914.htm accessed July 28, 2015.

3. Oliver D, Daly F, Martin /FC, McMurdo MET Risk factors and risk assessment tools for falls in hospital in-patients: a systematic review. Age and Ageing 2004;33:122-130.

4. Haines TP, Hill K, Walsh W, Osborne R. Design-Related Bias in Hospital Fall Risk Screening Tool Predictive Accuracy Evaluations: Systematic Review and Meta-Analysis. J Gerontol 2007;62A:664-672.

5. National Institute for Health and Care Excellence June 2013 Assessment and Prevention of Falls in Older People Developed by the Centre for Clinical Practice at NICE guidance.nice.org.uk/CG161 accessed July 28, 2015

6. Cameron ID, Gillespie LD, Robertson MC, et. al. Interventions for preventing falls in older people in care facilities and hospitals. Cochrane Database of Systematic Reviews 2012, Issue 12. Art. No.: CD005465. DOI: 10.1002/14651858. CD005465.pub3.

7. DiBardino D, Cohen ER, Didwania A. Meta-analysis: multidisciplinary fall prevention strategies in the acute care inpatient population. J Hosp Med. 2012;7:497-503.

8. Coussement J, De Paepe L, Schwendimann R, et. al.. Interventions for preventing falls in acute- and chronic-care hospitals: a systematic review and meta-analysis. J Am Geriatr Soc. 2008;56:29-36.

9. Oliver D, Connelly JB, Victor CR, et. al. Strategies to prevent falls and fractures in hospitals and care homes and effect of cognitive impairment: systematic review and meta-analyses. BMJ. 2007;334:82.

10. Shekelle PG, Wachter RM, Pronovost PJ, et. al. Making Health Care Safer II: An Updated Critical Analysis of the Evidence for Patient Safety Practices. Comparative Effectiveness Review No. 211. (Prepared by the Southern California-RAND Evidence-based Practice Center under Contract No. 290-2007-10062-I.) AHRQ Publication No. 13-E001-EF. Rockville, MD: Agency for Healthcare Research and Quality. March 2013. www.ahrq.gov/research/findings/evidence-based-reports/ptsafetyuptp.html accessed July 28, 2015.

11. Miake-Lye IM, Hempel S, Ganz DA, Shekelle PG. Inpatient Fall Prevention Programs as a Patient Safety Strategy A Systematic Review. Ann Intern Med. 2013;158:390-396.

12. Dykes PC, Carroll DL, Hurley A et al. Fall Prevention in Acute Care Hospitals: A Randomized Trial. JAMA 2010;304:1912-1918.