June 2022

The editors and current author would like to thank and acknowledge the significant contribution of the previous author of this chapter from the 2004 first edition and subsequent second edition, Dr. Claudine Kimura. This current third edition chapter is a revision and update of the original author’s work.

*A 3 year old boy presents to the clinic with a cough for 2 days and a temperature of 99 degrees. He is noted to have a barking cough and other clinical findings consistent with a diagnostic impression of laryngotracheobronchitis or croup. After a discussion with the clinic attending, she mentions that dexamethasone or prednisolone may be a good treatment for this patient. You perform a literature search on PubMed and find an article entitled, "Prednisolone Versus Dexamethasone for Croup: A Randomized Controlled Trial" (1).
*

One of the most exciting aspects of the practice of medicine is that it is continually evolving and changing. Every physician maintains the perpetual title of Student of Medicine, as we are all constantly learning and absorbing new information. This, however, is also one of the most challenging and daunting aspects of the practice of medicine. Faced with thousands of articles every year, a practitioner can't help but feel overwhelmed at times. This is why the practice of evidence-based medicine is so important.

Evidence-based medicine (EBM) adds to the traditional definition of clinical medicine, which has been defined as the combination of medical knowledge, intuition, and judgment, by focusing on the life-long, self-directed process by which clinicians grasp and determine the efficacy of the most up-to-date and relevant information regarding diagnosis, prognosis, and therapy (2). It has also been described as "the integration of the best research evidence with our clinical expertise and our patient’s unique values and circumstances" (3). The common use of EBM has arisen from the recognition of a few points: 1) the daily need for well-founded quantitative data regarding diagnosis, prognosis, therapy, and prevention of disease; 2) the paucity of this information from traditional sources such as textbooks or experts, or its overwhelming volume within medical journals; 3) the discrepancy between diagnostic skill, clinical judgment and our up-to-date knowledge; 4) the incapacity to spend extended time finding and understanding data; 5) the gap between evidence and practice (3). Everyone, from the medical student to the most senior physician, can use the principles of evidence-based medicine. But, like any other worthwhile endeavor, it takes practice to become comfortable with and proficient in using these guidelines.

The basic tenets of evidence-based medicine are laid out in a series of articles published in JAMA, collectively entitled "Users' Guides to the Medical Literature" (4). The original guides to EBM have since been updated. This chapter describes updated guidelines to therapeutics and diagnostic tests.

The basic process of EBM has been refined to five steps described in Table 1 (3). The first step occurs at the bedside, when a clinical question arises during the care of a patient. The question could be whether a test that was ordered will be likely to help to establish a diagnosis or if the present medication is the most efficacious for the patient's condition. The second step involves searching for sources of information. This might be as simple as asking a knowledgeable physician or looking in a textbook, but for the most comprehensive and up-to-date source of information, physicians turn to the medical literature. The simplest means of accessing the medical literature involves conducting a Medline or PubMed search using the internet. The third step is to appraise the sources that are found by determining whether the results of the study being examined are valid, impactful, and applicable. The specific guidelines for this will be outlined in the following paragraphs. The fourth step is to apply the evidence to our patient. The fifth step is to assess the effect of the knowledge to our patient, as well as the efficiency of the EBM process. These five steps ensure life-long learning for each clinician and the most up-to date care for their patients.

Table 1. Evidence-Based Medicine Approach to Clinical Problems

1. Identifying and asking a clinical question. 2. Searching and acquiring the best source(s) of information. 3. Appraising the evidence for its validity, impact, and applicability. 4. Applying the evidence to our clinical expertise and our patient’s unique circumstance. 5. Assessing the effectiveness and efficiency of steps 1 to 4. |

The steps involved in evaluating an article on therapy are outlined in Table 2 (3). The first steps involve determining whether the results of the study are valid. Toward this end the article should first be scrutinized for randomization of patients. Many factors (e.g., age, sex, ethnicity, etc.), the least of which may be the therapy being studied, affect patient outcome. If the study population is large enough, randomization ensures that both known and unknown factors are evenly distributed between the treatment and control groups, making it more likely that any difference in outcome between the two groups is due to the treatment effect alone. In the croup article, the patients were randomized, as is noted in the title.

Table 2. Guide to an Article About Therapy

I. Are the results of the study valid? A. Was there an equitable start? 1. Was the assignment of patients to treatments randomized? 2. Was the randomization hidden? 3. Were groups similar (demographics, comorbidities, etc.)? B. Was there an equitable race? 1. Was follow-up sufficient (method of follow-up, length, completeness)? 2. Were patients analyzed in the groups to which they were randomized (intention-to-treat analysis)? C. Secondary guides 1. Were patients, health workers, and study personnel blind to treatment? 2. Aside from the experimental intervention, were the groups treated equally? Co-interventions (see below)? |

II. What were the results and is it important? A. How large was the treatment effect? (see Table 3) B. How precise was the estimate of the treatment effect? (95% confidence interval) |

III. Will the results help me in caring for my patient? A. Is our patient vastly different from the study population, so that the results of the study are inapplicable? B. Can the treatment be applied to my patient’s care based on the clinical setting? C. Are the likely treatment benefits worth the potential harms and costs? D. What are our patient’s values and expectations regarding the treatment and the outcome we are trying to prevent? |

Next, assess that all patients enrolled in the study were properly accounted for at the end of the study. If there were a large number of patients lost to follow-up, the results of the study may be skewed. To avoid having a therapy appear more effective than it is, assume that any lost patients from the treatment group had a bad outcome and those lost from the control group had a good outcome. Ideally, the authors should have preserved randomization by using an intention-to-treat analysis. This means that during the analysis of the study results, patients remain in the groups to which they were randomized in the beginning of the study, even if they are unable or unwilling to complete the treatment. If patients from the treatment group who were unable to complete the treatment because they got sicker are transferred to the placebo (control) group, the treatment may show more effect than is truly present, just because the placebo group has sicker patients. In the croup article, of the 1252 patients randomized to the study, 1231 patients were randomly assigned to dexamethasone, low-dose dexamethasone, or prednisolone treatment. All patients received and completed the allocated intervention, with no patients excluded from analysis. As a result, an intention-to-treat analysis was carried out.

In many trials, clinicians or study subjects will violate the treatment assignment. For example, if 20 of the patients who were assigned to prednisolone were instead treated with dexamethasone because the clinician who was managing these patients decided that dexamethasone was a better treatment, this would result in a protocol violation. Should these 20 patients be analyzed as part of the prednisolone group (their assigned group) or as part of the dexamethasone group (the treatment that they actually received). It could also be that for similar reasons, some of the dexamethasone assigned patients were treated with prednisolone. Perhaps if the pharmacy ran out of dexamethasone on a particular day this would force the patient to be treated with prednisolone. If the analysis is done based on the group they were assigned to, this is called an intention-to-treat analysis because that is the treatment that the randomization intended the patient to receive. If the analysis is done based on the treatment that the patient actually received, this is called a per-protocol analysis. This latter term can be confusing because according to the protocol (per protocol), the patient was supposed to receive prednisolone but instead received dexamethasone; so perhaps this should have been called an "actual-treatment-received" analysis. This would be a better name, but in order to understand the medical literature, readers must understand the difference between intention-to-treat and per-protocol since these are the terms that are used. There are pros and cons of both approaches. It seems that in this particular croup study, if such violations occurred, it would be better to use a per-protocol (actual treatment received) analysis.

In other trials, however, the best analysis approach is often not obvious. For example, in a hypothetical study of the use of agent X in very severe asthma bordering on respiratory failure, randomization would have to take place in a very short period of time. Due to the critical nature of the patient’s condition, the treating physician might decide to treat the patient with agent X even if the patient was randomized to standard treatment. Another possibility is that the patient might specifically ask for agent X because the last time this occurred to him/her, agent X was used (in the trial) and it was very successful. In such instances, the more severe patients are more likely to have a protocol violation and are treated with agent X. As such, the outcomes for agent X will look worse because it is being used for more of the very severe patients and just randomly so (based on randomization) in the not so severe patients. An intention-to-treat analysis will preserve the original randomization and is more likely to result in equal numbers of very severe patients in each group. A per-protocol analysis will have a disproportionately high number of very severe patients in the agent X group. Even if protocol violations are removed from the trial, it removes a disproportionate number of very severe patients that should have been treated with standard treatment from the study cohort, resulting in the opposite type of bias. Such protocol violations are a reality of clinical trials. Many clinical trials have crossover treatment options making the comparison of the two groups more complex.

The next step is to determine whether patients and study personnel were blinded to treatment. It is well known that if a patient or worker knows that a patient is receiving the study medication, this will bias their assessment of the patient's outcome. Determine whether the two groups were similar at the start of the trial. If they were significantly different in any aspect other than the therapy (e.g., age, gender, ethnicity), this difference, and not the therapy, may account for any outcomes difference between the two groups. Confirm that both the treatment and control groups were treated equally in regards to any interventions. Again, if one group received more of an intervention than the other, the outcome may be due to the intervention and not the therapy of interest. In the croup article, the staff responsible for administering and assessing treatments, as well as the patients were blinded. The groups were demographically and symptomatically similar at the start of the study. In this study, the rate of intervention via nebulized epinephrine or additional corticosteroid use was one of the secondary outcomes measured. The use of nebulized epinephrine was not significantly different between the three groups, however dexamethasone patients received the lowest amount of additional corticosteroid doses, while prednisolone groups received the greatest amount.

The next set of steps involves evaluating the results of the study. This includes the computation of several formulas, listed in Table 3. Most trials evaluating therapy consider whether the therapy had a beneficial effect on some adverse outcome or event, such as hospitalization. One of the ways to express the difference in outcome is to calculate the absolute difference between the treatment and control groups: the absolute risk reduction (ARR). If X is the percentage rate of patients in the control group who were hospitalized, and Y is the percentage rate of patients in the treatment group who were hospitalized, then the ARR for hospitalization is X-Y.

Table 3. Measurements of treatment effect

X = outcome in control group Y = outcome in treatment group Relative risk (RR) = Y/X Relative risk reduction (RRR) = 1- Y/X Absolute risk reduction (ARR) = X-Y Number needed to treat (NNT) = 1/ARR |

In the croup article, the primary endpoint was hourly improvement in a standardized croup score up to 6 hours after treatment and additionally at 12 hours for patients not yet discharged. The croup score is a validated measure of croup severity, which creates an objective score based on retractions, stridor, air entry, cyanosis, and level of consciousness. At 1 hour after treatment, the dexamethasone group decreased by 1.05, the low-dose dexamethasone group decreased by 1.07, and the prednisolone group decreased by 1.03 points. A secondary endpoint included the use of additional corticosteroid doses, and whether there was a decreased need in any of the treatment groups. In the dexamethasone group (standard treatment) 32/283 or 11.3% (X) of patients required additional corticosteroids, while in the prednisolone group 53/280 or 18.9% (Y) required additional corticosteroids. The ARR for additional corticosteroids was (18.9%-11.3%) or 7.6% with respect to additional corticosteroids as the comparison variable.

Another way to express the difference between the two groups is to calculate the relative risk (RR). The relative risk is the proportion of patients who experienced the adverse outcome in the treatment group as compared to the control group and is expressed as Y/X. But the more common usage of RR is as the relative risk reduction (RRR). This is presented as a percentage and is calculated as [1-(Y/X)]. The larger the RRR, or the ARR, the more effective the treatment. If the results of a trial showed that 10 out of 100 patients who received a placebo were hospitalized and only 5 out of 100 patients who received a medication were hospitalized, the ARR would be (0.10–0.05) or 0.05. But the RRR would be [1-(0.05/0.10)], or 50%. A 50% reduction sounds better to most people than a reduction of 5 out of 100 patients, but in this scenario, the 50% difference represents a difference of just a few patients and the two results represent the same information. These numbers can be deceiving. In the croup article, the RR for the requirement of additional corticosteroids would be calculated as 0.189/0.113, or 167%. The RRR would then be calculated as [1-1.67] or negative 67%. The ARR is 0.113 minus 0.189, or negative 0.076. The use of prednisolone increases the chance for additional corticosteroids.

The number needed to treat (NNT) calculates out as 13 (1/0.076). It’s not that applicable to this study but this means that in therapeutic trials comparing drug to placebo, the NNT is the number of patients who must be treated in order to get one patient to benefit from the treatment. An NNT of 6 means that 6 patients must be treated in order for one patient to receive a benefit (i.e., 5 patients are treated who receive no benefit).

The next step in evaluating the validity of a study's results is to determine how precise they are. This involves calculation of the confidence interval (CI). The CI is usually calculated as the 95% CI, which means that the true RRR lies within the range of the confidence interval 95% of the time. The CI speaks to the power of a study, and the factor which has the most impact on a study's power is its sample size. A study with 100 participants may have the same RRR as a study with 1000 participants, but the latter will invariably have a narrower CI and thus be more precise and the results more powerful. The 95% CI can be applied to absolute and relative values. Since a treatment would be deemed to be beneficial if the RRR (relative risk reduction) was greater than zero, the 95% CI would have to exclude zero in its range if the treatment is beneficial. For example, for a RRR study, a 95% CI of -0.1 to 0.4 cannot be statistically concluded to be beneficial since the value zero is contained within its confidence limits. However, a 95% CI of 0.1 to 0.2, describes a statistically beneficial treatment, since zero is not included in the range (i.e., there is a less than a 5% chance that the treatment has no benefit) even though the magnitude of the benefit is relatively small.

The last set of steps involves determining whether the study you have just reviewed will help you to care for your patient. Determine whether your patient is similar to the patients who were in the study you are investigating. If your patient would have met all the inclusion and exclusion criteria for the study, the results are likely applicable to your individual patient. It is important to evaluate whether the treatment is feasible in the current setting. If the treatment is not available in the current setting or if the patient and health care system cannot pay for the treatment, its administration, and monitoring, the study will not be applicable. Furthermore, the benefits and risks of the proposed treatment must be weighed for the individual patient. If a proposed treatment carries a heightened short or long-term risk towards the patient, better options may be available. Lastly, the patient’s values and expectations towards the treatment and the outcome must be identified. By facilitating and supporting shared decision making, we ensure patient autonomy. For the article on croup, you've decided that the results of the study are valid based on the study design, and you've evaluated the results of the study. You now determine that your patient is similar to those enrolled in the study, so the results can be applied to him. The study did not discuss any side effects or risks to the available treatment, so the benefits of the treatment seem to outweigh the risks. You decide to treat your patient with a dose of dexamethasone.

The second set of guidelines entails the appraisal of articles on diagnostic tests. Table 4 outlines the steps involved (3). The first of these involves determining the validity of the study results. This includes evaluating whether the study included a sample of patients that is representative of the type of patients the test would be performed on in clinical practice. If the patients in the study differ from the type of patient who would require the test, the study may not be useful. The next step is to ensure that all patients in the study underwent both the test in question and the reference standard. If only patients with abnormal test results being evaluated then underwent the reference standard, this would unfairly bias the results of the study, which is known as a work-up bias. Next, it should be evaluated whether there was a blind comparison of the test in question with a reference standard. This is important to determine how a new test measures up to the current gold standard. Lastly, the methods used to perform the test should be described with enough detail so that the results could be confirmed with a second study if necessary. If the test cannot be duplicated, it may be difficult to use in clinical practice.

Table 4. Guide to an Article About a Diagnostic Test

I. Are the results of the study valid? A. Patient representation 1. Did the patient sample include an appropriate spectrum of patients to whom the diagnostic test will be applied in clinical practice? B. Use of the reference standard 1. Regardless of the diagnostic result, was the reference standard used? C. Comparison 1. Was there an independent, blind comparison with a reference standard? D. Secondary validation (for clusters of tests or clinical prediction rules) 1. Were the methods for performing the test described in sufficient detail to permit replication? |

II. What were the results? A. Accuracy 1. What is the sensitivity and specificity of the test? (see Table 5) 2. Are likelihood ratios for the test results presented or data necessary for their calculation provided? (see Table 5) |

III. Will the results help me in caring for my patient? A. Setting 1. Is the test available and affordable in my setting? 2. Is the reproducibility of the test and its interpretation accurate and precise in my setting? B. Pretest probability 1. Is the pretest probability based on personal experience, prevalence statistics, practice databases, or primary studies? 2. Is our patient so vastly different to the study participants that the test is not applicable? 3. What is the likelihood that the disease probabilities have changed since the evidence was gathered? C. Management of the patient 1. Will the test affect the test-treatment threshold? 2. Does our patient want to carry out the test? 3. Will our patient be better off as a result of the test? |

The second set of steps involves evaluating the results of the study. The traditional method of defining the strength of a test is to determine its sensitivity and specificity. These are calculated using a 2 by 2 table of the study results (Table 5). Sensitivity indicates the probability that a patient with a particular disease (as defined by an established reference method, commonly called a gold standard) will have a positive test. Specificity indicates the probability that a patient without a disease will have a negative test (think of this as the true negative rate). The 2 by 2 table can also be used to calculate positive and negative predictive values. Positive predictive value indicates the likelihood that a positive test will indicate the presence of a disease in a patient. Negative predictive value indicates the likelihood that a negative test will indicate the absence of a disease in a patient.

Table 5. Formulas for sensitivity, specificity, predictive value, likelihood ratios

Sensitivity = a/(a+c)

Specificity = d/(b+d)

Positive predictive value (PPV) = a/(a+b)

Negative predictive value (NPV) = d/(c+d)

Positive likelihood ratio (+LR) = [a/(a+c)]/[b/(b+d)] = sensitivity/(1-specificity)

Negative likelihood ratio (-LR) = [c/(a+c)]/[d/(b+d)] = (1-sensitivity)/specificity

Another method of evaluating a diagnostic test is the likelihood ratio (LR), which indicates the accuracy with which the test in question confirms the diagnosis of a particular condition. The first step in using the LRs requires the determination of a pretest probability, which is the clinician's gestalt about the chances that a patient has a particular condition based on clinical information such as symptoms, risk factors, and physical examination. The LR then determines how a diagnostic test will affect the pretest probability, making a disease more or less likely, the outcome of which is called the posttest probability or predictive value. This can be calculated using Bayes' theorem, a rather difficult equation that uses the pretest probability, test sensitivity, and test specificity (2). An easier way to determine the posttest probability by applying the LR is via a nomogram (2). Although this concept is a very useful and clinically important concept for clinicians, it is mathematically (even with the nomogram) difficult to determine. The nomogram of Bayes' theorem predicts the posttest probability of disease via the right side of the nomogram, after it is lined up with the pretest probability via the left side of the nomogram, and the LR for a positive or negative test via the middle scale (2). For example, if a patient with worsening right lower quadrant (RLQ) abdominal pain and classic symptoms/signs of appendicitis undergoes an ultrasound which is negative for appendicitis, a clinician would be wise to ignore the ultrasound result and still suspect appendicitis as an etiology since the pre-test probability of appendicitis is high. If the clinical risk or pretest probability is low; however, such as in a fully ambulatory patient with minimal abdominal pain, appendicitis is very unlikely. Essentially, the diagnostic certainty is improved when the clinical impression is confirmed by the diagnostic test. When there is a high clinical probability (left scale) and a positive test (middle scale), then the patient most likely has that diagnosis, as evidenced by a greater posttest probability. When there is a low clinical probability and a negative test, then the patient is not likely to have that diagnosis. If the clinical probability and the diagnostic test do not agree, then the diagnostic certainty is intermediate. In most situations, clinicians have an appreciation of these probabilities, but the numerical values can be difficult to measure. Bayes' theorem and its nomogram version used to calculate a posttest probability can be difficult concepts to grasp and cumbersome to use for those not familiar with them.

An LR of 1 means the test offers no help in making the diagnosis since this means that the pretest and posttest probabilities are the same. The magnitude of the LRs affects their power to influence the posttest probability, i.e. the larger a positive LR the greater the likelihood the disease is present, and the smaller the negative LR the less likely a disease is present. A very high (>10) or very low (<0.10) LR indicates a high positive or negative predictive value, respectively (2). See Table 6 for relative strengths of different LRs. LRs can be calculated via different means, including from the sensitivity and specificity of a test, as in Table 5. LRs are different from sensitivity and specificity because they take into account each individual patient, using the pretest and posttest probabilities.

After reading the above two paragraphs, it should be evident that LRs are not very useful clinically. While LRs are often advocated by evidence based medicine proponents, they are too difficult to calculate mathematically or by nomogram. The concept of Bayes' theorem is that if your clinical suspicion is confirmed by imaging and laboratory studies, then the clinical certainty is high, but when clinical suspicion and supporting studies do not agree, then clinical certainty is low. In clinical practice, it is more important to utilize this concept rather than it is to calculate LRs. For research purposes, it is perfectly acceptable to calculate PPV and NPV instead of positive and negative LRs.

The last set of steps involves determining whether the results of the study will help you care for your individual patient. Assess whether the test in question is feasible to perform and interpret in your setting. If a test requires special expertise to perform or interpret, the test may be less useful to you and your patient. Given your pretest probability (gestalt of whether your patient has a particular disease), will this test help you to improve or reduce clinical probability based on your patient’s clinical factors such as symptom, signs, severity, co-morbidities, etc.

Assess whether the results of the test will change your management. If you will not use the test to initiate treatment or determine prognosis, depending on the test's risk:benefit ratio, cost, and complexity, you may decide against performing it. Ultimately you must determine if performing the test will benefit the patient and whether the patient will be better off as a result.

Evidence-based medicine is a method for critically appraising and applying the medical literature. It is a tool, just like a stethoscope or history-taking skills, and can be immensely helpful in the day-to-day care of patients. No one can ever master all there is to know in medicine, but the principles of evidence-based medicine can get you one step closer, one article at a time.

Table 6. Relative strength of Likelihood Ratios

LR > 10 or < 0.1: Large change from pretest to posttest probability. LR 5 to10 or 0.1 to 0.2: Moderate change from pretest to posttest probability. LR 2 to 5 or 0.2 to 0.5: Small, but sometimes important, changes in probability. LR 1 to 2 or 0.5 to 1.0: Small, rarely significant, changes in probability. LR = 1: Pretest probability = posttest probability. |

Questions

1. What are the 5 basic steps outlining the evidence-based medicine approach to clinical problems?

2. Why is randomization important?

3. What is an intention-to-treat analysis?

4. How do you calculate relative risk, relative risk reduction (RRR), absolute risk reduction (ARR), and number needed to treat (NNT), and what do these values mean?

5. What is the 95% confidence interval?

6. Why is blinding important?

7. Define specificity using English words. In other words, if you want a specific test, what do you mean by this? Is this consistent with the calculation described in Table 5.

8. As a clinician, I want a very specific test and by this I mean a test that if it is positive, the patient very likely has the disease. This might be an influenza PCR test. If it’s positive, the patient has a very, very high likelihood of having influenza and not some other disease. Is the influenza PCR test very specific? Is this the specificity calculation, or is this the positive predictive value calculation?

References

1. Parker CM, Cooper MN. Prednisolone Versus Dexamethasone for Croup: a Randomized Controlled trial. Pediatrics 2019;144(3):e20183772. doi:10.1542/peds.2018-3772

2. Mark DB, Wong JB. Decision-Making in Clinical Medicine. In: Loscalzo J, Fauci A, Kasper D, Hauser S, Longo D, Jameson J (eds). Harrison's Principles of Internal Medicine, 21st edition. 2022. McGraw Hill; Accessed May 31, 2022. https://accessmedicine-mhmedical-com.eres.library.manoa.hawaii.edu/content.aspx?bookid=3095§ionid=261076027

3. Straus SE, Glasziou P, Richardson WS, et al. Evidence-Based Medicine: How to Practice and Teach EBM. 5th edition. 2019. Elsevier.

4. Oxman A, et al. Users' Guides to the Medical Literature; I. How to get started. JAMA 1993:270;2093-2095.

Evidence Based Medicine Resources

1. www.dynamed.com DynaMed website provides accurate, evidence-based content for practicing physicians.

2.www.clinicalkey.com ClinicalKey: Clinical overviews (formerly First Consult) is an evidence-based clinical information resource for healthcare professionals. It specializes in providing data for use at point of care evaluation, diagnosis, clinical management, prognosis, and prevention.

3. www.aap.org American Academy of Pediatrics official website with access to practice guidelines.

4. www.guideline.gov National Guideline Clearinghouse with access to guidelines from multiple medical agencies and societies.

5. The Cochrane Database may be accessed through the Hawaii Medical Library website (www.hml.org) and contains systematic reviews of topics in various medical fields, including Pediatrics.

Answers to questions

1. 1) Identify the clinical question. 2) Searching acquiring the best sources of information. 3) Appraising the source(s) found. 4) Applying the evidence. 5) Assessing the effectiveness and efficiency of steps 1-4

2. Randomization ensures that both known and unknown factors are evenly distributed between the treatment and control groups, making it more likely that any difference in outcome between the two groups is due to the treatment effect alone.

3. This means that during the analysis of the study results, patients remain in the groups to which they were randomized in the beginning of the study, even if they are unable or unwilling to complete the treatment. In other words, the groups are defined by the treatments that they were supposed to get (not the treatments that they actually received).

4. Relative risk reduction (RRR) = 1- Y/X. Absolute risk reduction (ARR) = X-Y. Number needed to treat (NNT) = 1/ARR. See Table 3.

5. The 95% CI, which means that the exact RRR lies within the range of the confidence interval 95% of the time. The CI speaks to the power of a study, and the factor that has the most impact on a study's power is its sample size.

6. It is well known that if a patient or worker knows that a patient is receiving the study medication, this will bias their assessment of the patient's outcome.

7. Specificity in English means that a test improves the likelihood of a specific diagnosis based on the test’s result. Table 5 calculates specificity as the percentage of those who do not have disease, who have a negative test. In other words, specificity is equal to those who test negative and don’t have the disease (the true negatives), divided by all those who test negative (the true negative plus the false negatives). This requires a lot of deep thought, but it is probably not what you meant by the word, specific.

8. The influenza PCR test is very specific for influenza. But what was described is the positive predictive value (PPV), not the specificity. The PPV is the percentage of patients who actually have the disease, if the test is positive. Perhaps what we mean by "specific" (in English), is really the PPV and not the specificity.