Hi, I'm Tor Thompson. I'm the Assessment Training Manager here at AQA. Today, I'm going to be talking about the principles that underpin good assessment. The advice I'm giving today comes from academic literature on assessment. We've worked closely with leading academics, namely Alastair Pollitt, Ayesha Ahmed and Zeek Sweiry. I hope you enjoy these videos and you find them useful.
In this video, I'm going to be talking about the principles of assessment. These principles underline good assessment, and is our sort of guiding practice on how to write good assessment. So, the first thing that we need to be mindful of when we're thinking about assessment is, fundamentally, what is the purpose of that assessment? Assessment is a form of measurement. So, much like you’d use a set of scales to measure your weight, we would use an assessment in order to measure what students know, understand and can do. But there's a really important difference between that set of scales and a test paper. What we're doing in assessment is an indirect form of measurement. We can't directly measure all that's in a student's mind; instead, we have to infer what they know, understand and can do from the results of that test paper. This means that academic assessment is fundamentally prone to error, because it is an indirect form of measurement.
The other issue that we have with academic assessment is we are always effectively just sampling what a student knows, understands and can do, because we can't possibly measure that full two years of study. So again, because it's a sample and because it's an indirect form of measurement, it is prone to error, and we have to work really carefully to make sure we minimise the error as much as humanly possible.
So, the most important principle when it comes to assessment is one called validity. And what we mean by validity is that assessment measures, and only measures, what it intends to measure. In order to get a valid assessment, there are some other principles that we need to be aware of in order to make sure that assessment is valid. So, we need to be thinking about differentiation; we need to be thinking about reliability; we need to be thinking about comparability; we need to make sure that we're avoiding bias; that we are avoiding construct irrelevant variance; and that we're avoiding construct under-representation.
So, in order to bring these principles to life, I have a task for you. What I'd like you to do in a minute is to see if you can marry up the scenarios on screen with the principles that they're problematic for. So, when you're ready, press pause on the video and see if you can work through marrying up the scenarios and the principles.
Okay, I hope you had a chance to run through all of those scenarios. I'm now going to match them up with the principles. So, in this first scenario, we have a problem with reliability. If that student is getting a different mark depending on who's marking them, then can we be really sure that that mark reflects what they know, understand and can do? There's a couple of factors that actually contribute to reliability. The first one is the student themselves, how they’re feeling on the day of the exam. Are they feeling a bit anxious? Are they feeling a bit under the weather? And could that affect how they perform on the day? The next factor is who's marking them. If we find that we have a marker who is a bit generous, or a bit severe, it might be that the student's mark doesn't really reflect what they know, understand and can do.
So, in order to avoid this, here at AQA we're really hot on making sure that we train all our examiners. We have something called standardisation, where all of our examiners are brought together and explained exactly how to apply that mark scheme. We also supply them with mark schemes which give really clear guidance about what is and isn't worthy of credit, and I'm going to be talking a bit about that later on in another video.
The last thing that can affect reliability is the assessment itself – the way that it is designed. So, if we have a situation where we have an assessment which is a modular exam, or it has optional routes, we want to make sure that the students get the right score and it doesn't depend on which optional route that they take. So, we make sure we build that in when we design and deliver the assessment.
So, what we have in this scenario is we have a problem with differentiation. So, if lots of students are taking an assessment, you'd expect to see the full ability range. And if we have a scenario where we have lots of students, but the marks are still quite bunched, what we might find is a problem with differentiation, because it will be hard to tease out those students in terms of who is the best students to who are the weaker students. And also, we might find difficulty in getting them in the right rank order. We're not using the best opportunity of the marks available. Ideally, in assessment, what we want to see is what is called a normal distribution, where we have most of the students around the middle with it tailoring off to the end. So, we have a nice bell curve using the full mark range.
So, in this scenario, what we might have is a problem of bias. So if we're setting the assessment and really what we're trying to measure is student's linguistic ability, but we've set it in a context of a polo match, what we might find is we've got a bit of a problem of a socioeconomic bias, where students who are more familiar with polo might have more to say about it, and might be able to access marks that students who, perhaps, haven't come across the game of polo before would be able to do in that assessment. That's inherently a problem. We want to make sure that our assessments are equally open to all students regardless of their background, and it's something that we work really hard here at AQA to make sure that the context in which we set the question is accessible to all, and we make sure that there's no systematic bias that can get in the way of students of any particular group, background, or culture accessing those marks.
So, in this scenario, we have a problem with something known as construct irrelevant variance. I know that sounds really technical, but if I break it down it makes a lot of sense, and it's a really useful concept to think about with assessment. So, in this question, what we're trying to measure, ultimately, is students' ability in mathematics. That's the underlying construct. But, if we have the question written in a way where the language is inaccessible, and students who perhaps are strong mathematicians but have sort of a slightly poorer reading age can't access a question because they can't read the question, it means that what you've ended up measuring is not only the student's ability in maths but also their ability in their reading level. So, reading level is not what you’re meant to measure, and it is construct irrelevant variance. And so, that's where the term comes in.
Construct irrelevant variance can kind of come in two forms. So, the scenario here is construct irrelevant difficulty, where there's something in that question that is getting in the way of students showing what they know, understand and can do. You can also have a scenario called construct irrelevant easiness, where there might be something in the question that gives the answer away to students who may not actually have a good knowledge of that underlying construct. So, for example, if you had a multiple-choice question which had a grammatical clue – so, it ended in 'which of the following is an', and then the only answer that worked grammatically worked with an 'an', meant the student could get the answer even though they didn't know the underlying construct – what you have there is construct irrelevant easiness. So, it's something that we always are very mindful of when we're writing questions, and it's something that we work through when we write question papers and mark schemes to make sure that we haven't either done anything which would stop students getting marks, which wouldn't be fair, because it's not what we're trying to measure, or accidentally give marks away to students who don't know the underlying construct.
So, in this scenario, we have a problem with construct under-representation. So again, the construct is the thing that you're trying to fundamentally assess in that assessment. If your assessment is about English ability, but it's only looking at spelling, then there's all sorts of elements of English that are not being measured in that assessment. And so you're only measuring one part of the skill set in order to be good at English, when actually there's a whole plethora of other things that you should be measuring. So, we need to make sure when we're writing our assessment that they are sampling all of the skills, knowledge and understanding related to that subject. The way we do that here at AQA is we have things called Assessment Objectives, which really outline what are the key skills that we're expecting students to do. And we're very mindful that we have the right weighting of those Assessment Objectives in our assessments. We're also quite mindful about sampling. We want to make sure that we sample the full specification across the question paper to make sure that we have a really good understanding of what students know, understand and can do across the whole of the course.
So, in this final scenario, we have a problem with comparability. That Grade Four needs to have a meaning, and in order for it to have a meaning, it needs to mean the same thing this year that it did last year. If we have the scenario where that Grade Four is easier this year than it was last year, then suddenly that meaning of what a Grade Four means changes. That means it's not as fit for purpose as it was before because, ultimately, that means that university entrance exams, or employers, means that that qualification has a slightly different currency for that Grade Four. Of course, between 2020 and 2023, we've seen the grade standards shifting from year to year. That was inevitable given the circumstances of the pandemic and disruption to schooling and examinations. But it is something that we work hard to avoid in normal years.
We've also got to think about comparability, not across years, but also across awarding bodies, as AQA are not the only providers of GCSE and A-level qualifications. There are other awarding bodies. And we work incredibly collegiately to make sure that it's no easier or harder to get that grade depending upon which awarding body you sit it with. When we say we don't compete on standards, we really mean it. It doesn't make sense for us to do that, because as soon as we do, it means that the meaning is eroded from our qualifications, and it means that they're not doing what students need them to do. So, that's something that we're very committed to doing.
The other thing that you need to think about with comparability is across optional routes, where you have a question paper which has either optional questions within the paper, or perhaps optional papers within the qualification. We need to make sure that students would get the same grade regardless of which optional route they take, and that they are comparable in difficulty and demand. And that's something that we really focus on when we are writing assessments.
So, whilst I've listed out some distinct different principles, you might have discovered when you were doing the exercise that actually those scenarios fit more than one of those principles. And that's because, whilst I've described them as sort of neatly indistinct, actually they’re really messy. They do interact with each other. And sometimes we find situations where some of those principles are almost at odds with one another. So, for example, if you wanted really reliable assessment, you might go, 'Ooh, let's have lots of multiple-choice questions where there's only one right answer and we can get some really consistent marking'. But, if you have an assessment just made of multiple-choice questions, you might have a problem of construct under-representation because you wouldn't be assessing, necessarily, all of the skills you need to in that qualification. Because multiple-choice questions, ultimately, can't be used to measure the student's ability to construct an argument, for example.
So, we're always doing a bit of a balancing act when we're writing assessments, and it is actually quite a tricky thing to do. But that fundamental guiding principle is validity, and being really mindful of all those factors and facets that contribute to validity help make sure that we're doing this in the best way possible, and minimising any measurement error that may form out of the nature of these assessments.
So, when it comes to the principles of assessment, there's a couple of things you might need to be mindful of as a teacher writing your own tests. So, be clear on the purpose of the test. What exactly do you want to assess? Assess enough of it and avoid assessing other things. Take measures to ensure teachers of the same subject are marking to the same standard consistently. Create assessments that will distinguish between the students, eg with questions of different levels of demand, and be mindful of the influence of bias that can warp assessment results.
So, I hope this video’s given you some insight on the assessment principles that we really focus on when we put assessment together here at AQA. We want to work really hard to make sure that the assessments we provide are valid and fit for purpose for our students.