Department of Pediatrics, University of Hawaii John A. Burns School of Medicine

March 2003

A 3 year old boy presents to the clinic with a cough for 2 days and a temperature of 99 degrees. He is noted to have a barking cough and other clinical findings consistent with a diagnostic impression of laryngotracheobronchitis or croup. After a discussion with the clinic attending, she mentions that dexamethasone may be a good treatment for this patient. You perform a literature search on PubMed and find an article entitled, "A prospective randomized double-blind study to evaluate the effect of dexamethasone in acute laryngotracheitis" (1).

One of the most exciting aspects of the practice of medicine is that it is continually evolving and changing. Every physician maintains the perpetual title of "Student of Medicine" as we are all constantly learning and absorbing new information. This, however, is also one of the most challenging and daunting aspects of the practice of medicine. Faced with thousands of articles every year, a practitioner can't help but feel overwhelmed at times. This is why the practice of evidence-based medicine is so important.

Evidence-based medicine (EBM) has been described as "a process of life-long, self-directed learning in which caring for our own patients creates the need for clinically important information about diagnosis, prognosis, [and] therapy" (2). It has also been described as "the process of systematically finding, appraising, and using contemporaneous research findings as the basis for clinical decisions" (3). The goals of evidence-based medicine are fourfold, and include: 1) improving the uniformity and standardization of care so that all patients receive optimal care; 2) helping providers make better use of limited resources by seeking the most effective treatments; 3) preventing harmful side effects or outcomes; and 4) making the literature accessible to all, thereby helping clinicians make the most informed decisions possible (3). Everyone, from the medical student to the most senior physician, can use the principles of evidence-based medicine. But, like any other worthwhile endeavor, it takes practice to become comfortable with and proficient in using these guidelines.

The basic tenets of evidence-based medicine are laid out in a series of articles published in JAMA, collectively entitled "Users' Guides to the Medical Literature" (4). There are over 25 different guides to EBM. The first two basic guidelines regarding articles on therapeutics (5, 6) and articles on diagnostic tests (7, 8) will be discussed here.

The basic process of evidence-based medicine involves seven steps (Table 1) (4). The first step occurs at the bedside, when a clinical question arises during the care of a patient. The question could be whether a test that was ordered will be likely to help make a diagnosis or if the present medication is the most efficacious for the patient's condition. The second step involves searching for sources of information. This might be as simple as asking a knowledgeable physician or looking in a textbook, but for the most comprehensive and up-to-date source of information, physicians turn to the medical literature. The simplest means of accessing the medical literature involves conducting a Medline or PubMed search using the internet. The third step is to identify the sources that are found, (i.e., identifying studies relating to the clinical question). The fourth step is to determine whether the results of the study being examined are valid. The specific guidelines for this will be outlined in the following paragraphs. The fifth step is to determine what the actual results are, for instance whether a test was able to accurately diagnose a particular condition. The sixth step is to determine whether the results are applicable to your patient, and thus helpful to you in caring for your patient. The last step is to the resolve the clinical question.

Table 1. Evidence-Based Medicine Approach to Clinical Problems

. . . . . 1. Identify the clinical question.

. . . . . 2. Search for sources of information.

. . . . . 3. Identify the source(s) found (relevant articles).

. . . . . 4. Determine whether the results are valid.

. . . . . 5. Determine what the results are.

. . . . . 6. Determine whether the results will help you in caring for your patients.

. . . . . 7. Resolve the clinical question.

The steps involved in evaluating an article on therapy are outlined in Table 2 (5,6). The first steps involve determining whether the results of the study are valid. Toward this end the article should first be scrutinized for randomization of patients. Many factors (e.g., age, sex, ethnicity, etc.), the least of which may be the therapy being studied, affect patient outcome. If the study population is large enough, randomization ensures that both known and unknown factors are evenly distributed between the treatment and control groups, making it more likely that any difference in outcome between the two groups is due to the treatment effect alone. In the croup article, the patients were randomized, as is noted in the title.

Table 2. Guide to an Article About Therapy

I. Are the results of the study valid?

. . . . . A. Primary guides

. . . . . . . . . . 1. Was the assignment of patients to treatments randomized?

. . . . . . . . . . 2. Were all patients who entered the trial properly accounted for and attributed at its conclusion? a) Was follow-up complete? b) Were patients analyzed in the groups to which they were randomized? ("intention-to-treat analysis).

. . . . . B. Secondary guides

. . . . . . . . . . 1. Were patients, health workers, and study personnel "blind" to treatment?

. . . . . . . . . . 2. Were the groups similar at the start of the trial?

. . . . . . . . . . 3. Aside from the experimental intervention, were the groups treated equally? Co-interventions (see below)?

II. What were the results?

. . . . . A. How large was the treatment effect? (see Table 3)

. . . . . B. How precise was the estimate of the treatment effect? (95% confidence interval)

III. Will the results help me in caring for my patient?

. . . . . A. Can the results be applied to my patient care?

. . . . . B. Were all clinically important outcomes considered?

. . . . . C. Are the likely treatment benefits worth the potential harms and costs?

Next, it is important to ensure that all patients enrolled in the study were properly accounted for at the end of the study. If there were a large number of patients "lost to follow-up," the results of the study may be skewed. To avoid having a therapy appear more effective than it is, assume that any "lost" patients from the treatment group had a "bad" outcome and those lost from the control group had a "good" outcome. It is also important to then evaluate whether the authors preserved randomization by using an "intention-to-treat analysis." This means that during the analysis of the study results, patients remain in the groups to which they were randomized in the beginning of the study, even if they are unable or unwilling to complete the treatment. If patients from the treatment group who were unable to complete the treatment because they got sicker are transferred to the placebo (control) group, the treatment may show more effect than is truly present, just because the placebo group has sicker patients. In the croup article, of the 29 patients randomized to the study, 28 were assessed at the 12 hour post-treatment mark, and 25 patients were assessed at the 24 hour mark. The reasons for the loss of the patients were given in the article. An intention-to-treat analysis appears to have been carried out by the simple design of the study, although this fact was not spelled out as such in the text of the article.

The next step is to determine whether patients and study personnel were "blinded" to treatment. It is well known that if a patient or worker knows that a patient is receiving the study medication, this will bias their assessment of the patient's outcome. It is then important to determine whether the two groups were similar at the start of the trial. If they were significantly different in any aspect other than the therapy (e.g., age, gender, ethnicity), this difference, and not the therapy, may account for any outcomes difference between the two groups. Next, it is important to ensure that both the treatment and control groups were treated equally in regards to any "co-interventions." Again, if one group received more of a co-intervention than the other, the outcome may be due to the co-intervention and not the therapy of interest. In the croup article, the patients and study personnel were both blinded. The groups did appear similar at the start of the study. In this study the rate of co-intervention use was one of the secondary outcomes measured, and the use of racemic epinephrine was found to be lower in the treatment group, but there was no difference between the two groups in rate of supplemental oxygen use.

The next set of steps involves evaluating the results of the study. This includes the computation of several formulas, listed in Table 3. Most trials evaluating therapy consider whether the therapy had a beneficial effect on some adverse outcome or event, such as hospitalization. One of the ways to express the difference in outcome is to calculate the absolute difference between the treatment and control groups: the absolute risk reduction (ARR). If "X" is the number (or percentage rate) of patients in the control group who were hospitalized, and "Y" is the number (or percentage rate) of patients in the treatment group who were hospitalized, then the ARR for hospitalization is "X-Y".

Table 3. Measurements of treatment effect

. . . . . X = outcome in control group

. . . . . Y = outcome in treatment group

. . . . . Relative risk (RR) = Y/X

. . . . . Relative risk reduction (RRR) = 1- Y/X

. . . . . Absolute risk reduction (ARR) = X-Y

. . . . . Number needed to treat (NNT) = 1/ARR

In the croup article, the primary endpoint was improvement in total croup score at 12-hour intervals after treatment. The severity of illness was measured using a "croup score," which was based on retractions, stridor, air entry, cyanosis, and level of consciousness. It was determined prior to the start of the study that an improvement in the total croup score of at least 2 points (out of a possible total of 17 points) would be clinically significant. At 12 hours after treatment, 13 of 16 patients (81%) in the treatment group had at least a 2 point improvement in their croup score, while only 4 of 12 patients (33%) in the placebo group had a similar improvement. A secondary endpoint was the need for racemic epinephrine aerosols, and whether there was a decreased need in the treatment versus the placebo group. In the placebo group 8/13 or 62% (X) of patients required an aerosol, while in the dexamethasone group 3/16 or 19% (Y) required similar co-intervention. The ARR for racemic epinephrine aerosol was (62%-19%) or 43% with respect to racemic epinephrine utilization as the comparison variable.

Another way to express the difference between the two groups is to calculate the relative risk (RR). The relative risk is the proportion of patients who experienced the adverse outcome in the treatment group as compared to the control group and is expressed as "Y/X". But the more common usage of RR is as the relative risk reduction (RRR). This is presented as a percentage and is calculated as [1-(Y/X)]. The larger the RRR, or the ARR, the more effective the treatment. However, it is important to understand the difference between the two values. If the results of a trial showed that 10 patients who received a placebo were hospitalized and only 5 patients who received a medication were hospitalized, the ARR would be (10-5) or 5. But the RRR would be [1-(5/10)], or 50%. A 50% reduction sounds better to most people than a reduction of 5, but in this scenario, the two results represent the same information. In the croup article, the RR for the requirement of racemic epinephrine aerosols would be calculated as 0.19/0.62, or 30%. The RRR would then be calculated as [1-0.30] or 70%.

The next step in evaluating the validity of a study's results is to determine how precise they are. This involves calculation of the confidence interval (CI). The CI is usually calculated as the "95% CI," which means that the true RRR lies within the range of the confidence interval 95% of the time. The CI speaks to the power of a study, and the factor which has the most impact on a study's power is its sample size. A study with 100 participants may have the same RRR as a study with 1000 participants, but the latter will invariably have a narrower CI and thus be more precise and the results more powerful. The 95% CI can be applied to absolute and relative values. Since a treatment would be deemed to be beneficial if the RRR (relative risk reduction) was greater than zero, the 95% CI would have to exclude zero in its range if the treatment is beneficial. For example, for a RRR study, a 95% CI of -0.1 to 0.4 cannot be statistically concluded to be beneficial since the value zero is contained within its confidence limits. However a 95% CI of 0.1 to 0.2, describes a statistically beneficial treatment, since zero is not included in the range (i.e., there is a less than a 5% chance that the treatment has no benefit).

The last set of steps involves determining whether the study you have just reviewed will help you to care for your patient. It is important to determine whether your patient is similar to the patients who were in the study you are investigating. If your patient would have met all the inclusion and exclusion criteria for the study, the results are likely applicable to your individual patient. It is also important to evaluate whether all significant outcomes were measured. The endpoint of decreased mortality as a result of treatment is always significant, but other outcomes, such as rate of hospitalization or subsequent morbidity, can also impact your patient's care. And lastly, the benefits and risks of the proposed treatment must be weighed for the individual patient. For the article on croup, you've decided that the results of the study are valid based on the study design, and you've evaluated the results of the study. You now determine that your patient is similar to those enrolled in the study, so the results can be applied to him. The study did not discuss any side effects or risks to the treatment, so the benefits of the treatment seem to outweigh the risks. You decide to treat your patient with a dose of dexamethasone.

The second set of guidelines entails the appraisal of articles on diagnostic tests. Table 4 outlines the steps involved (7,8). Again the first of these involves determining the validity of the study results. This includes evaluating whether there was a blind comparison of the test in question with a reference standard. This is important to determine how a new test measures up to the current "gold standard." Next it is important to determine whether the study included a sample of patients that is representative of the type of patients the test would be performed on in clinical practice. If the patients in the study differ from the type of patient who would require the test, the study may not be useful.

Table 4. Guide to an Article About a Diagnostic Test

I. Are the results of the study valid?

. . . . . A. Primary guides

. . . . . . . . . . 1. Was there an independent, blind comparison with a reference standard?

. . . . . . . . . . 2. Did the patient sample include an appropriate spectrum of patients to whom the diagnostic test will be applied in clinical practice?

. . . . . B. Secondary guides

. . . . . . . . . . 1. Did the results of the test being evaluated influence the decision to perform the reference standard?

. . . . . . . . . . 2. Were the methods for performing the test described in sufficient detail to permit replication?

II. What were the results?

. . . . . A. Are likelihood ratios for the test results presented or data necessary for their calculation provided? (see Table 5)

III. Will the results help me in caring for my patient?

. . . . . A. Will the reproducibility of the test result and its interpretation be satisfactory in my setting?

. . . . . B. Are the results applicable to my patient?

. . . . . C. Will the results change my management?

. . . . . D. Will patients be better off as a result of the test?

The next step is to ensure that all patients in the study underwent both the test in question and the reference standard. If only patients with abnormal results on the test being evaluated then underwent the reference standard, this would unfairly bias the results of the study, which is known as a "work-up bias." It is also vital to determine whether the methods used to perform the test were described with enough detail so that the results could be confirmed with a second study if necessary. If the test cannot be duplicated, it may be difficult to use in clinical practice.

The second set of steps involves evaluating the results of the study. The traditional method of defining the strength of a test is to determine its sensitivity and specificity. These are calculated using a "2x2 table" of the study results (Table 5). Sensitivity indicates the probability that a patient with a particular disease (as defined by an established reference method*, commonly called a "gold standard") will have a positive test. Specificity indicates the probability that a patient without a disease will have a negative test (think of this as the true negative rate). The "2x2 table" can also be used to calculate positive and negative predictive values. Positive predictive value indicates the likelihood that a positive test will indicate the presence of a disease in a patient. Negative predictive value indicates the likelihood that a negative test will indicate the absence of a disease in a patient.

Table 5. Formulas for sensitivity, specificity, predictive value, likelihood ratios

For the given 2x2 table of results:

Sensitivity = a/(a+c)

Specificity = d/(b+d)

Positive predictive value (PPV) = a/(a+b)

Negative predictive value (NPV) = d/(c+d)

Positive likelihood ratio (+LR) = [a/(a+c)]/[b/(b+d)] = sensitivity/(1-specificity)

Negative likelihood ratio (-LR) = [c/(a+c)]/[d/(b+d)] = (1-sensitivity)/specificity

Another method of evaluating a diagnostic test is the likelihood ratio (LR). LRs indicate the accuracy with which the test in question confirms the diagnosis of a particular condition. The first step in using the LRs requires the determination of a pretest probability, which is the clinician's "gestalt" about the chances that a patient has a particular condition based on clinical information such as symptoms, risk factors, and physical examination. The LR then determines how a diagnostic test will affect the pretest probability, making a disease more or less likely, the outcome of which is called the posttest probability. This can be calculated using Bayes' theorem (rather difficult), but an easier way to determine the posttest probability by applying the LR is via a nomogram described by Fagan (9). Although this concept is a very useful and clinically important concept for clinicians, it is mathematically (even with the nomogram) difficult to determine. This concept is best understood with an example. If a patient with worsening right lower quadrant (RLQ) abdominal pain and classic symptoms/signs of appendicitis undergoes an ultrasound which is "negative for appendicitis", a clinician would be wise to ignore the ultrasound result and still suspect appendicitis as an etiology. If the clinical risk is low; however, such as in a fully ambulatory patient with minimal abdominal pain, appendicitis is very unlikely. Essentially, the diagnostic certainty is improved when the clinical impression is confirmed by the diagnostic test. In other words, if there is a high clinical probability and a positive test, then the patient most likely has that diagnosis. If there is a low clinical probability and a negative test, then the patient is not likely to have that diagnosis. If the clinical probability and the diagnostic test do not agree, then the diagnostic certainty is intermediate. In most situations, clinicians have an appreciation of these probabilities, but the numerical values can be difficult to measure. Bayes' theorem and Fagan's nomogram which are used to calculate a posttest probability, can be difficult concepts to grasp and cumbersome to use for those not familiar with them, but their advantage over the more widely used "sensitivity" and "specificity" are that they allow the clinician to apply the results from a research study to his or her individual patient.

An LR of 1 means the test offers no help is making the diagnosis since this means that the pretest and posttest probabilities are the same. The magnitude of the LRs affects their power to influence the posttest probability, i.e. the larger a positive LR the greater the likelihood the disease is present, and the smaller the negative LR the less likely a disease is present. See Table 6 for relative strengths of different LRs. LRs can be calculated via different means, including from the sensitivity and specificity of a test, as in Table 5. LRs are different from sensitivity and specificity because they take into account each individual patient, using the pretest and posttest probabilities.

The last set of steps again involves determining whether the results of the study will help you care for your individual patient. It is important to determine whether the test in question is feasible to perform and interpret in your setting. If a test requires special expertise to perform or interpret, the test may be less useful to you and your patient. It is also important to determine whether the results are applicable to your particular patient. If your patient has different co-morbidities or a different severity of disease, the results of the study may be less applicable, and the diagnostic test less useful.

It is also important to determine whether the results of the test will change your management. If you will not use the test to initiate treatment or determine prognosis, depending on the test's risk:benefit ratio, cost, and complexity, you may decide against performing it. Ultimately you must determine if performing the test will benefit the patient and whether the patient will be better off as a result.

Evidence-based medicine is a method for critically appraising and applying the medical literature. It is a tool, just like a stethoscope or history-taking skills, and can be immensely helpful in the day-to-day care of patients. No one can ever master all there is know in medicine, but the principles of evidence-based medicine can get you one step closer, one article at a time.

Table 6. Relative strength of Likelihood Ratios

LR > 10 or < 0.1 : Large change from pretest to posttest probability.

LR 5-10 or 0.1-0.2 : Moderate change from pretest to posttest probability.

LR 2-5 or 0.2-0.5 : Small, but sometimes important, changes in probability.

LR 1-2 or 0.5-1 : Small, rarely significant, changes in probability.

LR = 1 : Pretest probability = posttest probability.

Questions

1. What are the 7 basic steps outlining the evidence-based medicine approach to clinical problems?

2. Why is randomization important?

3. What is an "intention-to-treat analysis?"

4. How do you calculate relative risk, relative risk reduction (RRR), absolute risk reduction (ARR), and number needed to treat (NNT), and what do these values mean?

5. What is the "95% confidence interval?"

6. Why is "blinding" important?

7. What are sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) and how do you calculate these values?

8. What are positive and negative likelihood ratios, and how do they differ from sensitivity and specificity?

9. How are pretest and posttest probabilities calculated and applied?

10. How can evidence-based medicine help you in your practice of medicine?

References

1. Super DM, et al. A prospective randomized double-blind study to evaluate the effect of dexamethasone in acute laryngotracheitis. J Pediatr 1989:115(2);323-329.

2. Sackett D. Evidence-based medicine. Sem Perinatology 1997;21(1):3-5.

3. Rosenberg W, Donald A. Evidence based medicine: an approach to clinical problem-solving. BMJ 1995;310:1122-1126.

4. Oxman A, et al. Users' Guides to the Medical Literature; I. How to get started. JAMA 1993:270;2093-2095.

5. Guyatt G, et al. Users' Guides to the Medical Literature: II. How to use an article about therapy or prevention; A. Are the results of the study valid? JAMA 1993:270;2598-2601.

6. Guyatt G, et al. Users' Guides to the Medical Literature: II. How to use an article about therapy or prevention; B. What were the results and will they help me in caring for my patients? JAMA 1994:271;59-63.

7. Jaeschke R, et al. Users' Guides to the Medical Literature: III. How to use an article about a diagnostic test; A. Are the results of the study valid? JAMA 1994:271;389-391.

8. Jaeschke R, et al. Users' Guides to the Medical Literature: III. How to use an article about a diagnostic test; B. What were the results and will they help me in caring for my patients? JAMA 1994:271;703-707.

9. Fagan TJ. Nomogram for Bayes' theorem. NEJM 1975;293(5):257.

Evidence Based Medicine Resources

1. Clinical Evidence, Issue 6. BMJ Publishing Group. December 2001. (Updated with a new issue biannually)

2. Clinical Evidence, Pediatrics. BMJ Publishing Group. July 2002.

3. Moyer VA, Elliott EJ, et al (eds). Evidence Based Pediatrics and Child Health. 2000, London: BMJ Books.

4. www.aap.org American Academy of Pediatrics official website with access to practice guidelines.

5. www.guideline.gov National Guideline Clearinghouse with access to guidelines from multiple medical agencies and societies.

6. http://depts.washington.edu/pedebm University of Washington and Harborview Injury Prevention and Research Center website with pediatric CATs (criticallly appraised topics) available for review.

7. www.urmc.rochester.edu/medicine/res/CATS/ped.html University of Rochester Combined Internal Medicine and Pediatrics Program website with pediatric CATs (critically appraised topics) available for review.

8. www.ped.med.umich.edu/ebm University of Michigan website with pediatric CATs available for review

9. www.pedsccm.org Pediatric Critical Care Medicine website with various resources and references for general evidence-based medicine.

10. The Cochrane Database may be accessed through the Hawaii Medical Library website (www.hml.org) and contains systematic reviews of topics in various medical fields, including Pediatrics.

Answers to questions

1. I) Identify the clinical question. II) Search for sources of information. III) Identify the source(s) found. IV) Determine whether the results are valid. V) Determine what the results are. VI) Determine whether the results will help you in caring for your patients. VII) Resolve the clinical question.

2. Randomization ensures that both known and unknown factors are evenly distributed between the treatment and control groups, making it more likely that any difference in outcome between the two groups is due to the treatment effect alone.

3. This means that during the analysis of the study results, patients remain in the groups to which they were randomized in the beginning of the study, even if they are unable or unwilling to complete the treatment.

4. Relative risk reduction (RRR) = 1- Y/X. Absolute risk reduction (ARR) = X-Y. Number needed to treat (NNT) = 1/ARR. See Table 3.

5. The "95% CI," which means that the exact RRR lies within the range of the confidence interval 95% of the time. The CI speaks to the power of a study, and the factor that has the most impact on a study's power is its sample size.

6. It is well known that if a patient or worker knows that a patient is receiving the study medication, this will bias their assessment of the patient's outcome.

7. Sensitivity = a/(a+c). Specificity = d/(b+d). Positive predictive value (PPV) = a/(a+b). Negative predictive value (NPV) = d/(c+d). See Table 5.

8. LR for a positive test result (+LR) = [a/(a+c)]/[b/(b+d)] = sensitivity/(1-specificity). LR for a negative test result (-LR) = [c/(a+c)]/[d/(b+d)] = (1-sensitivity)/specificity. LRs are different from sensitivity and specificity because they take into account each individual patient, using the pretest and posttest probabilities.

9. The pretest probability is the clinician's "gestalt" about the chances that a patient has a particular condition based on clinical information such as symptoms, risk factors, and physical examination. The LR then determines how a diagnostic test will affect the pretest probability, making a disease more or less likely, the outcome of which is called the posttest probability.

10. a) Improves the uniformity and standardization of care so that all patients receive optimal care; b) Helps providers make better use of limited resources by seeking the most effective treatments; c) Prevents harmful side effects or outcomes; and d) Makes the literature accessible to all, thereby helping clinicians make the most informed decisions possible.