Focus on Research:  Understanding and improving calibration accuracy

 

Linda Bol

Old Dominion University

 

Douglas J. Hacker

University of Utah

 

 

Kruger and Dunning (1999) argue “that when people are incompetent in the strategies they adopt to achieve success and satisfaction, they suffer a dual burden:  Not only do they reach erroneous conclusions and make unfortunate choices, but their incompetence robs them of the ability to realize it” (p. 3).  This conclusion rings true for those of us who are teachers.  The lowest achieving students are most surprised when their grades are dramatically lower than they anticipated.  And we continue to be surprised by their surprise.  The obvious research implications are to explore factors that contribute to calibration accuracy and investigate the effectiveness of interventions designed to improve accuracy. 

Previous attempts to improve students’ calibration accuracy have had mixed success.  Some studies have reported modest gains in participants’ ability to predict and postdict performance (e.g., Horgan, 1990; Koriat & Goldsmith, 1994, 1996; Pressley, Synder, Levin, Murray, & Ghatala, 1987; Walczyk & Hall, 1989).  Other researchers have reported no significant change in calibration accuracy after practice or other interventions (e.g., Gigerenzer, Hoffrage, & Kleinbolting, 1997; Koriat, 1997; Koriat, Lichtenstein, & Fischoff, 1980).   The discrepancies in findings may be partially attributed to whether the studies were conducted in more controlled laboratory settings or classroom settings.  Only two studies were identified that asked students to make metacognitive judgments about actual classroom tests (Shaughnessy, 1979; Sinkavich, 1995).  These researchers reported greater gains for higher versus lower performing students. 

Our first study conducted within a college classroom highlighted the importance of ability level on calibration accuracy (Hacker, Bol, Horgan, & Rakow, 2000).  We had predicted significant improvements in students’ ability to predict and postdict their exam performance because the course content emphasized self-monitoring strategies and included assignments that required practice tests and other self-assessments.  Our hypothesis was confirmed only for the high-achieving students.  Low-achieving students did not show significant gains across three exams on either predictive or postdictive accuracy.  The lowest achieving students were grossly overconfident, with mean judgments exceeding actual scores by as much as 31 percentage points.  

These findings inspired us to systematically investigate the impact of specific interventions, namely practice tests, on calibration accuracy and exam performance (Bol & Hacker, 2001).   The literature supports the effectiveness of practice tests for improving calibration (e.g., Maki, 1998; Pressley, Van Etten, Yokoi, Freebern, & Van Meter, 1998), leading us to conclude that practice tests may have contributed to improvements in calibration observed among higher achieving students in our previous study.  Using two sections of a graduate research methods course, we compared exam performance and calibration for students who received either practice tests or more traditional review to help them prepare for the exams.  Students who received the practice tests scored lower on the midterm multiple-choice items and were less accurate on their prediction and postdiction scores than students who received review.  We speculated that this unexpected finding might have been due to students limiting the content of their study to only what was covered on the practice tests.  This more narrow focus may have led to overconfidence in calibration and depressed exam scores.  Furthermore, previous findings about the importance of student ability were replicated.   Low-achieving students were significantly less accurate on their predictions and postdictions than were high-achieving students.  This was especially true for predictions on multiple-choice versus essay exams.

Given our counterintuitive findings with respect to practice tests, we have planned a series of future studies to investigate the effectiveness of other kinds of interventions.  In our first study, we will ask one group of students to reflect on why their predictions were inaccurate and how they can increase their accuracy on the next exam. Our measures will also focus on attributional styles for academic achievement and for calibration accuracy.  It seems plausible that attributional styles and calibration are correlated and could explain why accuracy is difficult to improve.   Our next study will include both reflection and incentives (bonus points) in a factorial design that allows us to investigate the individual as well as combined or interactive impact of both interventions.  Clearly, previous findings warrant a comparison of findings for high- versus low-achieving students on all measures.  Interventions for improving metacognitive judgments may need to be tailored to meet the needs of different types of students.  Finally and most importantly, we will empirically link calibration accuracy with performance in actual classroom contexts. 

In their lab study, Kruger and Dunning (1999) showed that subjects recognize their incompetence by improving their metacognitive skills.  This finding raises a paradox for educational practice.  On the one hand, we want students to have a healthy sense of academic self-concept and persist in their educational endeavors.  On the other hand, we hope that a more realistic understanding of their limitations will be the impetus for educational improvement.  The challenge for educators is to implement constructive interventions that lead to improved calibration and performance without destroying students’ self-esteem and confidence.  

 

 


References

 

            Bol, L., & Hacker, D.J., (2001).  A comparison of the effects of practice tests and traditional review on performance and calibration.  Journal of Experimental Education, 69, 133-151.

            Gigerenzer, G., Hoffrage, U., & Kleinbolting, H. (1991).  Probabilistic mental models:  A Brunswikian theory of confidence.  Psychological Bulletin, 98, 506-528. 

            Hacker, D. J., Bol, L., Horgan, D., & Rakow, E.  (2000).  Test prediction and performance in a classroom context.  Journal of Educational Psychology, 92, 160-170.

            Horgan, D. (1990).  Competition, calibration, and motivation.  Teaching Thinking and Problem Solving, 12, 5-10.

            Koriat, A. (1997).  Monitoring one’s own knowledge during study:  A cue-utilization approach to judgments of learning.  Journal of Experimental Psychology:  General, 126, 349-370.

            Koriat, A., & Goldsmith, M.  (1994).  Memory in naturalistic and laboratory contexts:  Distinguishing the accuracy-oriented and quality-oriented approach to judgments of learning.  Journal of Experimental Psychology:  General, 12, 3297-316.

            Koriat, A., & Goldsmith, M.  (1996).  Monitoring and control processes in the strategic regulation of memory accuracy.  Psychological Review, 103, 490-517.

            Koriat, A., Lichtenstein, S., & Fischoff, B. (1980).  Reasons for confidence.  Journal of Experimental Psychology:  Human Learning and Memory, 6, 107-118,

            Krugar, J., & Dunning, D.  (1999).  Unskilled and unaware of it:  How difficulties in recognizing one’s incompetence lead to inflated self-assessments.  Journal of Personality and Social Psychology, 77, 1121-1134. Retrieved March 1, 2000, from the World Wide Web: http://www.apa.org/journals/psp/psp//01121.htm

            Maki, R. H.  (1998).  Test predictions over text material.  In D.J. Hacker, J. Dunlosky, & A.C. Graesser (Eds.), Metacognition in educational theory and practice  (pp. 117-144).  Mahwah, NJ: Erlbaum.

            Pressley, M., Synder, B. L., Levin, J. R., Murray, H. G., & Ghatala, E. S.  (1987).  Perceived readiness for examination performance (PREP) produced by initial reading of text and text containing adjunct questions.  Reading Research Quarterly, 22, 219-236.

            Pressley, M., Van Etten, S., Yokoi, L., Freebern, G., & Van Meter, P.  (1998).  The metacognition of college studentship:  A grounded theory approach.  In D.J. Hacker, J. Dunlosky, & A.C. Graesser (Eds.), Metacognition in educational theory and practice (pp.347-366).  Mahwah, NJ: Erlbaum.

            Shaughnessy, J. J., (1979).  Confidence-judgment accuracy as a predictor of test performance.  Journal of Research in Personality, 13, 505-514.

            Sinkavich, F. J., (1995).  Performance and metamemory:  Do students know what they don’t know?  Journal of Instructional Psychology, 22, 77-87.

            Walczyk, J. J., & Hall, V. C.  (1989).  Effects of examples and embedded questions on the accuracy of comprehension self-assessments.  Journal of Educational Psychology, 81, 435-437.