An Examiner's Perspective

Tuesday, June 02, 2009

I am currently marking my way through 70 exam scripts, for a couple of the introductory English Literature modules at my university. This blog post is the confession of this examiner, perhaps a bit risky if you find out who I really am, but nevertheless I hope worth making public as a way of demystifying a process between the end of the exam and the publication of marks that students do not often see or even understand.

If I remember my own, not-too-distant days correctly, students might want to imagine that examiners treat their scripts as sacred objects. A student has attended numerous lectures and read numerous books over the year, poured over revision notes long into the night, and then spent a few hours hunched over a desk in some dismal hall, frantically trying to pour out knowledge in the hopes that that brief exam will do justice to all the hours of work put in over the previous year. With this much invested on a few sheets of paper, surely examiners deal with them reverently, in a darkened room, with the white paper subject to the glare of an anglepoise lamp, as the examiner interrogates and teases that script to give up its worthy marks?

The reality is somewhat different. Naturally, I look after exams with the utmost care, and mark them as conscientiously as I can. However, there are certain unavoidable practicalities of marking, and of human psychology, that mitigate against any such pure, religious process described above.

The big practical issue is time. With a large number of scripts to be marked in a brief period, it is simply not possible to spend hours on each one. It would be nice if I could read an essay carefully, and then go for a walk, take a shower, and massage my temples as I try to weigh up whether to give it 66 percent or a 67. But that does not - it cannot - happen. Even marking a qualitative, essay-based subject like English, having read an essay I tend to place my mark quickly and instinctively. At my university, we work from very detailed guidelines that explain the characteristics that should be present in an essay for it to merit a First, 2:1, 2:2 or lower, with each band sub-divided into two, for example, a high 2:1 (65 to 69 percent) or a low 2:1 (60 to 64 percent). It is very rare that I ponder deeply what percentage to give an essay. Essays usually fall easily into a band, and the pressure of having perhaps a week to mark 50 scripts leaves me little time to deliberate at length whether it needs a 64 or a 63.

People often grumble that an essay-based exam cannot be marked as objectively and as fairly as something like mathematics, with a right or wrong answer. Certainly the personality of the marker may have an effect on a percentage point here or there. But on the whole it is always surprising from my examiner's perspective how easily papers drop into one of these assigned bands. The moral for university students, then, is not to lose sleep over percentages. It is the band that says everything about what sort of student you are, even if you are frustratingly just on the borderline. In many ways, a 69 percent is the most horrible mark an examiner has to give - and in the last few days I have been heard shouting at papers, because I was frustrated that a good student was not quite there, and could see that with a little nudge and feedback the student could go on to improve in subsequent essays. But my 69s are below that glass ceiling not because a few tiny details were overlooked by me, the examiner, not because I was tired, or because my football team had just lost, but because it read, argued, reasoned, discussed, evidenced in ways which said 2:1.

With this caveat about the band being everything, I will admit to some of the other factors that an examiner faces that may well lead to small variations in marks.

Imagine this scenario. I have just read two First-class essays. The third essay I mark is going to have to do something impressive not to look weaker in comparison (for those of a mathematical bent, this is an effect called regression to the mean). Perhaps I will dock it a few more marks than I might have done if marking it in isolation, because it compares worse against the previous efforts. But in the alternative scenario, marked after two solid but not particularly remarkable 2:1 essays, perhaps suddenly essay three looks better than that localised average. I know that I must be guilty, at times, of marking relative to other essays, rather than against the single standard of the mark scheme.

Luckily, there are a few ways to negate this effect. One of the most hotly debated is that fad of the 1990s, the bell curve. Perhaps I get a run of three weak essays before lunch, and then suddenly give three Firsts after lunch. Is it that I am in a better mood after my break? Is it that I have remembered those three earlier, average essays, so that those that come later are bound to look more positively in their light? I do get anxious when runs of unusually high or low results happen - as they have done this year - and that is why I find the bell curve a useful check. I may perceive that my marks are being affected by local circumstances, but taking a larger sample of my marks, I can see that they have fallen out in a normal distribution. Usually, there is a statistically good range, with a smattering of 2:2s and Firsts, and the majority bunching around the mid 2:1.

The reason that the bell curve, or normal distribtion, comes in for debate is that it is tempting to mark for the curve, rather than to construct the curve on the basis of marks. Out of ten essays I have given three 66s. Better make the next one a 59 or 71 just to smooth out the graph. This is a real risk for the individual examiner, whilst institutionally it may be tempting to adjust marks across the board to create a smooth curve with its apex at the point the university suspects most candidates should be at. In my institution, with most students coming with excellent A-levels, we would expect more high 2:1s and Firsts than another institution with a lower achieving intake, so our marks tend to have a peak around the high 60s.

Now I do not know - or have reason to believe - that my own institution does any sort of retrospective adjustment to bump our averages higher than the national baseline for English Literature degrees, but if they did the problem would be clear. Just as I get funny moments marking when there have been no Firsts for ages then three come along at once, an institution could quite feasibly have consecutive year groups which seem to achieve comparable marks, until one year is comprised of an unusually bright or slightly less well-performing group. By shoving that bell curve to fit expectations based on previous experience, the institution is engaging in a sort of social engineering, making results fit students, rather than the other way around, so that the unusually bright or underperforming group is down or upgraded unfairly. This is precisely the sort of complaint about "grade inflation" long levelled at A-Levels and GCSEs, and increasingly at universities. But as an examiner, I can sympathise with the faith in statistics and the normal distribution, because it offers subjects like English an objective foundation for marking, helping to cancel out those personal factors that do come into play, no matter how hard one tries to contain them.

The bell curve aside, students need to remember that the mark they get is not dependent on the individual examiner because other, less controversial, controls are there to restrict the impact any one examiner can have. I have admitted that time, my mood, marking an essay relative to previous results, the effect of statistics, all can affect what percentage an essay achieves, even though I would hope that these would not affect which broader band an exam falls into. But once they leave my hands, exams are filtered through layers of double-marking, moderation by other examiners from within the institution, oversight by external examiners outside of the university, anonymous exam codes, board meetings, appeals procedures, publicly displayed marks so that it is possible to see how each year's exams compare to previous ones and, finally, individual students can request copies of their exam papers and examiner's comments under the Data Protection Act. These controls too ensure that, when the best-willed examiner gets a mark an entire band out, it should be an isolated incident.

However, this last control - allowing students to see and hence to interrogate their own papers - is also controversial. My own university does not exactly make public the fact that students have a legal right to see their scripts after they have been marked. Personally, I think this right should become an expectation among students, who are still often fearful of approaching departments with what seem like trivial requests. The National Union of Students has a policy that feedback should be provided on exams, and have issued stickers for students to put on exam papers stating that "Exam Feedback Helps Me Learn." From an examiner's perspective, although in many cases it is not possible to indicate specific places where students might improve (again, partly because time pressure makes it impossible to write detailed comments), there are many papers about which I do note specific stylistic issues that could be quite easily addressed. Making these comments, though, seems like shouting into the wind, if students are never going to get the opportunity to see them. Having gone to the effort to mark a script as an examiner, why not at least allow students to get as much from your work as possible?

Besides the adminstrative burden, the reason universities are reluctant to provide exam feedback is, I suspect, from a fear of litigation or of students picking examiners up on every point to gain even more marks. Even if the fear of litigation is a little hyperbolic, the idea of student's challenging their papers may affect the exam process unduly. Those students prepared to go through the technical process of questioning their results may end up with better marks than those who are mostly concerned with studying their subject for the pleasure of it, and who are not so end-focused, and who simply accept the results given to them and look to the following year. In a system where exams are always open to challenge, results might become partly determined by a student's ability to work the system, rather than their ability in any given subject. On the other hand, is this issue not precisely the problem with exams overall, that not only are they testing knowledge but they are also testing one's ability to sit exams and to have good "exam technique" in the first place? Allowing students to interrogate and receive feedback on their own marks at the end of the process only mirrors the effect that happens in that artificial period called "exam season" at the start of it. At this time of year students who may have done less work all year sit down to cram and prepare model answers just to pass the three essay questions on an exam, whilst students who have conscientiously studied broadly throughout the year continue in their model approach to their subject in a way that does not always help them to focus on the specialised nature of an exam. As an examiner, I usually have a pretty good hunch which students have prepared to pass a few questions on the exam, and which have enjoyed studying their course as a whole, but it is a very difficult thing to prove, and it is not possible to adjust marks based on a hunch.

From my examiner's perspective, then, encouraging students to seek the written feedback from their exams would be a positive step, because it would add a qualitative report to the process, allowing those students who have worked well throughout the year even if not reflected in the pure exam percentage to seek guidance on how to improve. These sorts of students are more likely to incorporate these comments into their more holistic approach to the subject (such as their desire to write well), than those who simply aim to pass the exam as a technical challenge, and so hopefully some sort of levelling might be achieved.

If you are a student reading this post, then, I hope you feel some sense of schadenfreude. If you have been sat there feeling fed up about the fact that you have to work through exams which seem a disproportionate measure compared to the way you have worked throughout the year, it is worth knowing that this examiner at least feels the same way about marking the exams. It may be slightly disturbing that I have drawn attention to the human frailties of the marking process, but on the other hand I hope too students appreciate firstly that it is bands, not single percentages, that are the most important indicator of ability, and secondly appreciate the lengths institutions go to in order to mitigate against any widespread effect marks can be consistently misjudged, even though probably every examiner misplaces a percentage point here or there, and even occasionally gets a band wrong.

The trouble is, exams remain the most efficient system we have for testing even qualitative subjects like English. The good news is that even though there may be candidates who can work the exam system to their unrepresentative benefit, and even though examiners of essay-based subjects may be unable, as ordinary human beings, to mark every essay to its perfectly deserved percentage, on the whole, the system, tumultuous though it is during the early days of Summer, does work.

Labels: English Literature, essay marking, exams, University Life

Posted by Alistair at 2:13 pm

Recent Posts

Twitter @alibrown18

New Essay

An Examiner's Perspective

Tuesday, June 02, 2009

0 Comments:

Post a Comment