Judgements about the quality of evidence

A short glossary is included at the bottom of this page. For a more complete glossary click here: Glossary of terms.


In making health care treatment and delivery decisions, policymakers, patients and clinicians must trade off the benefits and downsides of alternative strategies. Decision-makers will be influenced not only by the best estimates of the expected advantages and disadvantages, but also by their confidence in these estimates; i.e. the quality of the evidence. The GRADE system, which we have used, provides a structured and transparent system for making judgements about the quality of evidence (1).

Using the GRADE system, we have made separate ratings of evidence quality for each important outcome. Like early systems of grading the quality of evidence, the GRADE system begins with the study design. Randomised trials provide, in general, stronger evidence than observational studies. Therefore, randomised trials without important limitations constitute high quality evidence. Observational studies without special strengths generally provide low quality evidence. However, there are a number of factors that can reduce or increase our confidence in estimates of effect.

The GRADE system considers five factors that can lower the quality of the evidence:

  1. Study limitations (risk of bias)
  2. Inconsistent results across studies
  3. Indirectness of the evidence
  4. Imprecision
  5. Publication bias

and three that can increase the quality of evidence:

  1. Large estimates of treatment effect
  2. A dose-response gradient
  3. Plausible confounding that would increase confidence in an estimate

Factors that can lower the quality of evidence

  1. Our confidence in estimates of effects decreases if studies suffer from major limitations that may bias their estimates of the treatment effect. These limitations include, for example, lack of allocation concealment; lack of blinding, particularly if outcomes are subjective and their assessment highly susceptible to bias; a large loss to follow-up; failure to adhere to an intention-to-treat analysis; or failure to report outcomes (typically those for which a “statistically non-significant” effect was observed).
  2. Widely differing estimates of effects across studies for which there are no compelling explanations reduces our confidence in knowing what the true effect is. Variability may arise from differences in populations (e.g., drugs may have larger relative effects in sicker populations); from differences in interventions (e.g., larger effects with higher drug doses); or outcomes (e.g., diminishing treatment effect with time). When variability exists but investigators fail to identify a plausible explanation, the quality of evidence decreases.
  3. Decision-makers must consider two types of indirectness that can lower the quality of evidence. The first occurs when considering, for instance, use of one of two interventions, A and B. Although randomized comparisons of A and B may be unavailable, randomized trials may have compared A to no intervention and B to no intervention. Such trials allow indirect comparisons of the magnitude of effect of A and B. Such evidence is of lower quality than head-to-head comparisons of A and B would provide. The second type of indirectness includes differences between the population, intervention, comparator to the intervention, and outcome of interest, and those included in the relevant studies. Most important here is consideration of whether important outcomes are measured directly or surrogate outcomes are used, such as a biochemical or process measure that may or may not accurately reflect what can be expected in terms of an important outcome, such as mortality or morbidity.
  4. When studies include relatively few patients and few events and thus have wide confidence intervals (or a large p-value), we are less confident in an estimate.
  5. The quality of evidence will be reduced if there is a high likelihood that some studies have not been reported (typically those with “statistically non-significant” results). The risk of such a “publication bias” is greater when published evidence is limited to a small number of trials, all of which are sponsored by people with a vested interest in the results.

 Factors that can increase the quality of evidence

  1. Even well-done observational studies generally yield only low-quality evidence, because of the many potential confounders that either are not known or are not measured. However, occasionally they may provide moderate or even high quality evidence. The larger the magnitude of effect, the less likely it is that this could be explained by confounders and the evidence is, thus, stronger. For example, a meta-analysis of observational studies found that bicycle helmets reduce the odds of head injuries in cyclists involved in a crash by about two-thirds, an effect that could not easily be explained by confounders, given the design of the studies.
  2. The presence of a dose-response gradient can increase our confidence in estimates of effects, for example if larger effects are associated with larger doses, as might be expected.
  3. When an effect is found, if all plausible confounding would decrease the magnitude of effect, this increases the quality of the evidence, since we can be more confident that an effect is at least as large as the estimate and may be even larger. Conversely, particularly for questions of safety, if little or no effect is found and all plausible biases would lead towards overestimating an effect, we can be more confident that there is unlikely to be an important effect.

GRADE provides a clearly articulated and comprehensive system for rating and summarising the quality of evidence supporting treatment and health care delivery recommendations. Although judgements will always be required for every step, the systematic and transparent GRADE approach allows scrutiny of and debate about those judgements.

[1] Guyatt GH, Oxman AD, Kunz R, Vist GE, Falck-Ytter Y, Schunemann HJ, and the GRADE Working Group. What is ‘quality of evidence’ and why is it important to clinicians? BMJ 2008; 336:995-8.


Short glossary (for a more complete glossary click here: Glossary of terms)

Allocation concealment

 The process used to ensure that the people deciding to enter a participant into a randomised controlled trial do not know the comparison group into which that individual will be allocated.


The process of preventing those involved in a trial from knowing the comparison group in which a particular participant belongs


A systematic error or deviation in results from the truth.

Confidence interval

A measure of the uncertainty around the main finding of a statistical analysis.


A factor that is associated with both an intervention (or exposure) and the outcome of interest. For example, if people in the experimental group of a controlled trial are younger than those in the control group, it will be difficult to decide whether a lower risk of death in one group is due to the intervention or the difference in ages. Age is then said to be a confounder, or a confounding variable. Randomisation is used to minimise imbalances in confounding variables between experimental and control groups. Confounding is a major concern in non-randomised (observational) studies.


A strategy for analysing data from a randomised controlled trial. All participants are included in the arm to which they were allocated, whether or not they received (or completed) the intervention given to that arm.

Loss to follow-up

The loss of participants during the course of a study.

Observational study

A study in which the investigators do not seek to intervene and simply observe the course of events. Changes or differences in one characteristic (e.g. whether or not people received the intervention of interest) are studied in relation to changes or differences in other characteristic(s) (e.g. whether or not they died), without action by the investigator.

Randomised trial

An experiment in which two or more interventions, possibly including a control intervention or no intervention, are compared by being randomly allocated to participants. In most trials one intervention is assigned to each individual but sometimes assignment is to groups of individuals (for example, in a household) or interventions are assigned within individuals (for example, in different orders or to different parts of the body).