Thursday, December 20, 2007
Is 85% a B? Grading by percentages is not the way to go
In a strictly non-scientific survey, 89% of all students and teachers indicated that they believe in traditional percentage-based grading, where an 85% would be a middling B, a 75% a middling C, and so forth.
Actually, I just made up that 89% figure. But it’s probably in the right ballpark, and it makes just as much sense as the common belief about the meaning of that 89%: obviously a high B, and very close to an A–. Now I suppose that I could devise a test in which a student who earned 89% of the available points really was doing work at the top of the B range — maybe. But devising such a test would be ridiculously time-consuming, and the validity of its results would be highly questionable anyway. In reality, if the objective quality of a student’s work was at the top of the Bs, s/he might get a 92 on one test and a 79 on another, depending on the difficulty of the questions.
So what do most teachers do? There seem to be three solutions to this problem:
What about option #2 then? Grading on a curve ensures that the teacher can ask challenging questions and still ensure that the median grade is a B– (or any desired median grade, as the case may be). But it results in a whole host of undesirable and probably undesired consequences. In particular, it discourages cooperative learning by pitting one student against another, and it makes the incorrect assumption that all groups of students are equivalent. In reality, all teachers know that it's perfectly possible — by the luck of the draw or the vagaries of the master schedule, where honors chemistry meets at the same time as one math class and conceptual chemistry meets at the same time as another — for the majority of one class to deserve As and the majority of another to deserve Cs. In the former case, grading on a curve unfairly gives excellent students low grades, and in the latter case it unfairly gives mediocre or poor students decent grades. Clearly not the way to go.
So we must go with option #3. It clearly allows for challenging questions and variations in the distribution of student populations. This choice in turn branches into three sub-options: determining a scale in advance, determining a scale after looking at students’ raw scores, and determining a scale after looking at student work. Combinations of these three are also possible.
Determining a scale in advance has certain attractions: principally, it keeps the teacher honest by preventing excessive generosity when student results are disappointing. Ideally I think this is the way to go, but it also requires unrealistic amounts of forethought and accuracy in predicting what good students will do; I like it in theory, but I have had to abandon it in practice. Going through each problem and deciding in advance how many points a B student is likely to earn feels too much like guessing.
The second sub-option, determing a scale after looking at the raw scores, is deservedly popular with many teachers, and it's what most of my colleagues and I have done with final exams for decades. It lets one draw lines between the As and the Bs, between the Ds and the Fs, and so forth, without regard to raw percentages and conscientiously avoiding excessive harshness or excessive generosity. But it still tends to even out the true differences in populations, so I’m not convinced by it.
My choice is the third sub-option, looking at student work in order to determine the scale. This is what we’ve been encouraged to do at Weston, and I’m sold on the idea. Based on research and recommendations of the Annenberg Foundation, Harvard Project Zero, and others, it takes the most time but has the biggest payoff. The idea is to determine raw scores first without regard to letter grades, and then to examine in detail a reasonable sample of student papers. For instance, sort them by raw scores and then pick three from the middle of the pack, three around the third quartile, and three around the first quartile; go through them problem by problem, and use professional judgment to determine whether those students have “got it” or not. Sometimes it’s hard to sort out conceptual misunderstandings from skill-based errors, but it’s always informative. In this way, we can make an informed decision that says that on this particular test a raw score of 68 is worth a low B. And on another test it might be an 79. In this way it’s possible to give truly challenging problems, while assigning fair and meaningful grades for them. I think it’s a clear win all the way around — except for the fact that it takes more time for the teacher, as do many good ideas.
Actually, I just made up that 89% figure. But it’s probably in the right ballpark, and it makes just as much sense as the common belief about the meaning of that 89%: obviously a high B, and very close to an A–. Now I suppose that I could devise a test in which a student who earned 89% of the available points really was doing work at the top of the B range — maybe. But devising such a test would be ridiculously time-consuming, and the validity of its results would be highly questionable anyway. In reality, if the objective quality of a student’s work was at the top of the Bs, s/he might get a 92 on one test and a 79 on another, depending on the difficulty of the questions.
So what do most teachers do? There seem to be three solutions to this problem:
- Use percentages anyway, and live with the questionable validity.
- Grade on a curve.
- Use your professional judgment to determine an appropriate scale.
What about option #2 then? Grading on a curve ensures that the teacher can ask challenging questions and still ensure that the median grade is a B– (or any desired median grade, as the case may be). But it results in a whole host of undesirable and probably undesired consequences. In particular, it discourages cooperative learning by pitting one student against another, and it makes the incorrect assumption that all groups of students are equivalent. In reality, all teachers know that it's perfectly possible — by the luck of the draw or the vagaries of the master schedule, where honors chemistry meets at the same time as one math class and conceptual chemistry meets at the same time as another — for the majority of one class to deserve As and the majority of another to deserve Cs. In the former case, grading on a curve unfairly gives excellent students low grades, and in the latter case it unfairly gives mediocre or poor students decent grades. Clearly not the way to go.
So we must go with option #3. It clearly allows for challenging questions and variations in the distribution of student populations. This choice in turn branches into three sub-options: determining a scale in advance, determining a scale after looking at students’ raw scores, and determining a scale after looking at student work. Combinations of these three are also possible.
Determining a scale in advance has certain attractions: principally, it keeps the teacher honest by preventing excessive generosity when student results are disappointing. Ideally I think this is the way to go, but it also requires unrealistic amounts of forethought and accuracy in predicting what good students will do; I like it in theory, but I have had to abandon it in practice. Going through each problem and deciding in advance how many points a B student is likely to earn feels too much like guessing.
The second sub-option, determing a scale after looking at the raw scores, is deservedly popular with many teachers, and it's what most of my colleagues and I have done with final exams for decades. It lets one draw lines between the As and the Bs, between the Ds and the Fs, and so forth, without regard to raw percentages and conscientiously avoiding excessive harshness or excessive generosity. But it still tends to even out the true differences in populations, so I’m not convinced by it.
My choice is the third sub-option, looking at student work in order to determine the scale. This is what we’ve been encouraged to do at Weston, and I’m sold on the idea. Based on research and recommendations of the Annenberg Foundation, Harvard Project Zero, and others, it takes the most time but has the biggest payoff. The idea is to determine raw scores first without regard to letter grades, and then to examine in detail a reasonable sample of student papers. For instance, sort them by raw scores and then pick three from the middle of the pack, three around the third quartile, and three around the first quartile; go through them problem by problem, and use professional judgment to determine whether those students have “got it” or not. Sometimes it’s hard to sort out conceptual misunderstandings from skill-based errors, but it’s always informative. In this way, we can make an informed decision that says that on this particular test a raw score of 68 is worth a low B. And on another test it might be an 79. In this way it’s possible to give truly challenging problems, while assigning fair and meaningful grades for them. I think it’s a clear win all the way around — except for the fact that it takes more time for the teacher, as do many good ideas.
Labels: teaching and learning, Weston
ARCHIVES
- May 2005
- June 2005
- July 2005
- August 2005
- September 2005
- October 2005
- November 2005
- December 2005
- January 2006
- February 2006
- March 2006
- April 2006
- May 2006
- August 2006
- September 2006
- November 2006
- December 2006
- January 2007
- February 2007
- March 2007
- April 2007
- May 2007
- December 2007
- January 2008
- February 2008
- March 2008
- April 2008
- May 2008
- July 2008
- November 2008
- December 2008
- January 2009