Research School Network: Calibration Accuracy: What is it and does it matter? Is there any merit in asking students to estimate their assessment grade before receiving the marked total.


Calibration Accuracy: What is it and does it matter?

Is there any merit in asking students to estimate their assessment grade before receiving the marked total.

by Durrington Research School
on the

Mock exams have either been completed or are being completed up and down the country, and we as teachers are starting to think about what we do with the grades and papers themselves to maximise the learning from them. You only need to scroll through edu-twitter to see various forms of post-mock self-evaluation work sheets, getting student to reflect on their performance and preparation for the exam – with the obvious intention of trying to create more self-regulatory learners. Such learners can reflect on the strategies they used in the build up to and during their exams, determine where their own strengths and weaknesses are and plan accordingly to improve performance next time.

I have always believed that such tasks can hold great potential, but have also always worried about their delivery, and whether we actually get the most from them. One of the things that I have always wondered about when seeing such tasks is the merit of getting student to estimate/​predict their performance prior to receiving the actual grade – I mean as many researchers including Dumlosky, Kruger and Dunning have noted students often provide estimates far off their actual outcomes. If this is the case is asking them to estimate their score a worthy task or a pointless waste of time?

Wanting to find out more – I stumbled across a paper by Nederhand et al (2020) entitled Reflection on exam grades to improve calibration of secondary school students: a longitudinal study” – you can read the full paper for free here. This paper claims to be one of the first to look out how calibration accuracy (accuracy of estimates) can be improved in a secondary setting and the implications this has for developing self-regulatory students.

It is important to note that the research itself admits that there is much more work to do in this area, and that the study only involved 219 students from a Dutch secondary school. The study design, as is the case in much edu-research, did not act in a vacuum or with a control group, therefore any findings from the work should not be rolled out as gospel – however it does add to the body of literature that already exists on the benefits of grade estimation.

If we return to the point regarding the inaccuracy of student estimates, we need to consider the implications such mismatch” has – most importantly students may not recognize the need to change ineffective learning strategies, or fail to ask for help, leading to underachievement. This is a significant problem as we aim to develop more independent learners, and as such many researchers have called for more work into how we can improve student calibration accuracy – defined as the match between estimated and actual grade. A correct estimate to the actual grade means the students is perfectly calibrated, while a student that misjudges their performance is said to be miscalibrated. Supporting students to think about their performance may allow for more accurate calibration for the rest of their academic career.

Many of the self-regulatory work sheets I have seen and used, often ask students to make global judgement, such as the grade or number of marks they received on the entire exam. Local estimations are less common, but ask students to predict performance on individual tasks within the assignment. In the case of mock exam reviews, most estimates will be postdictions – estimates of performance after the exam, which allows students to incorporate knowledge about the test into their estimate. Unfortunately, it has been argued that when estimating performance, students often use poor cues such as how familiar the test felt.

This particular study looked at how students could be potentially guided to reflect on their performance with more accuracy and through better cues. The 219 students involved were divided into 3 groups: practice only, grade comparison and reflection. The practice only group were asked to estimate their grade after each French exam they completed over the course of the academic year, they were also asked to give a second order judgement in which they judged the confidence level of their estimation. The grade comparison group had to complete a similar form, but once they had received their actual grades were asked to calculate the difference. Finally, the reflection group did the same as the comparison group, but were then also asked to reflect on the cues used when estimating their grade. This was achieved by asking questions such as how did you come up with your grade estimate?”, how do you explain the mismatch between your estimated grade and actual grade?” and would they change their study behaviour and preparation for the next exam?”.

The researchers had hypothesised that the students in the reflection group would improve their calibration accuracy more than those students in the estimate only group, as they were being actively guided to monitor and evaluate their performance. Interestingly the findings of the study did not find any differences in the improvement of calibration accuracy between the 3 groups, with students in the estimate group making similar improvement to those in the reflection groups. It is argued that this may be due to the study using post diction meaning that all students had some knowledge of the task, plus some feedback from the class teacher meaning that reflection may have occurred unwittingly.

For all group’s calibration accuracy did improve after each exam throughout the year, no matter of their starting ability. At the same time when surveyed students seemed to show evidence of using better cues to estimate their grades as the year progressed.

Implications for Practice


Student miscalibration is a widely demonstrated phenomenon in a variety of settings and significantly contributes to student’s inability regulate their own learning and subsequent underachievement. The fact that all students seemed to improve their calibration accuracy after each exam would suggest that asking students to estimate their grade and providing feedback on this post assessment is an easy to implement strategy to improve student calibration accuracy and support self-regulation. The fact that calibration accuracy improved with practice suggests that while it is great to see so many teachers thinking of doing similar activities post mock, doing this as a one-off activity post mock is insufficient and needs to be come regular practice after all assessments. This advocates the importance of regular formative assessments to give students the opportunity to become better calibrated and as such improved monitors of their own performance.

While the study did not indicate that extra-reflection support led to further improvements in calibration accuracy, this does not mean that including reflective tasks post assessment is without merit. It is important to include the grade estimate and actual grade so students can determine their miscalibration, but asking them to reflect on the reasons for their miscalibration and how they can improve this is likely, if done over a long enough period, to have a positive impact. The authors suggest that their own intervention in the reflection group may not have been strong enough, with 25% of the group not reflecting at all on the mismatch, while nearly 40% did not provide any in depth reflection – suggesting too many students did not use the prompts/​reflect properly. In addition, it is likely that some reflection may occur at a later point – i.e. at home with parents.

Interestingly reflection prompts did impact on students second order judgement with the reflection group becoming more confident in their estimates as the year went out.

I started looking into this to see if there was any merit to getting students to complete post assessment reflection sheets and in particular getting them to estimate or predict their grade. The findings of this research are hardly conclusive; however, they do suggest that regularly asking students to estimate their grades at most/​all assessment opportunities leads to improved accuracy and student reflection. While the reflection group did not make the expected extra gains, again the authors suggest that carefully planned and sustained activities that ask students to determine how they devised their grade, why there was a mismatch (if it existed) between their estimate and how they will change strategies in the future, should positively impact student self-regulation.

More from the Durrington Research School

Show all news

This website collects a number of cookies from its users for improving your overall experience of the site.Read more