It's time to consider imprecision in evidence rating. 

First, let's consider the positive, or improving precision.

In epidemiology, when we talk of improving study precision we are talking about reducing random error. There are two main ways to reduce random error and improve precision. The most common method is to increase study size. Another way is to improve the efficiency of the study through its design in order to maximize information retrieved. Since the primary way to reduce random error is to increase study size, the common assessment of precision is based on determining if the sample size is sufficient and especially if there are a sufficient number of outcome events. Number of outcome events becomes important when the outcome of interest is rare. Rare outcomes require extremely large sample sizes (and generally a long time) to acquire sufficient outcome events to meaningfully study. 

So precision is an assessment of having a sufficiently powered, or sufficiently large, study to answer the question posed. 

In the negative and at its simplest, study results are considered imprecise when study size is small and few outcomes occur. When this happens, estimates of outcome events will have wide confidence intervals around them.

This is how GRADE recommends reviewers evaluate imprecision. GRADE suggests starting your review of imprecision by looking at the 95% confidence intervals around the estimate of the outcome. Are they so wide as to include a range of effects from very strong benefit to very strong harm? If so, rate down for imprecision.

This is not the same as a confidence interval that is rather closely bounded around the measure of no difference, in other words a risk or hazard ratio of one. In this case, the confidence interval around the measure of one, or nearly one, will range from benefit to harm, but the range of a precise study showing no difference will be very tight around the estimate of one. GRADE suggests that a 95% confidence interval ranging between  0.75 and 1.25 not be rated down for imprecision. This range better reflects a risk estimate of no difference between outcomes.

In addition to using the confidence interval width as an estimate of imprecision, GRADE provides additional guidance. Even when confidence intervals seem narrow enough, GRADE cautions reviewers to examine the study size. GRADE uses the term Optimal Information Size (OIS) to refer to a threshold of minimal study size. That threshold requires the total number of study subjects should not be less than the number derived from a sample size calculation for an adequately powered trial. If OIS is not met, GRADE suggests rating down for imprecision if the sample size is not at least 2,000 or perhaps even 4,000 subjects.

Remember, do not confuse lack of statistically significant difference with imprecision. Studies can and often do precisely measure no statistically significant difference. There may be no important clinical difference in outcomes between the comparisons you are interested in.

An additional note, don't forget what we've learned so far. If your evidence consists of many well-designed and conducted studies (low ROB), that have similar findings (low inconsistency), and that match your PICO questions (low indirectness) even if from individually small studies, when combined they may have sufficient power (low imprecision) to provide a reliable estimate of effect. That is our goal in systematic review of evidence.

As always, go to the GRADE HANDBOOK for more detail.

TheEvidenceDoc November 20, 2017