grading systems

This commit is contained in:
Wouter Groeneveld 2022-07-21 18:07:26 +02:00
parent f4e78d991d
commit af2f5e3ba7
1 changed files with 50 additions and 0 deletions

View File

@ -0,0 +1,50 @@
---
title: Grading Systems
date: 2022-07-21T18:07:00+02:00
categories:
- education
- braindump
tags:
- lists
---
Our latest research had us draw yet another normal distribution graph of grades from 1 to 10. We've [done that before](/post/2022/03/creativity-equals-messy-code/), and it's always interesting to see how uninteresting such a graph can be. Wait, what? Those bell curves, since they're _normal_ distributions, tend to concentrate on the average, which is usually in the 5--6--7 range. When we grade students' exams and course work, it's on a `/20` scale, making things even worse: nobody of the teaching staff likes to give very low or very high grades. In other words, grading on a big scale is useless.
The same problem occurs when trying to rate your favorite book, video game, ... We've all seen the flame wars when Gamespot dealt out an eight to a certain Zelda game when of course it should have been nine _point_ eight. I encounter difficulties when trying to come up with a number on a scale of 10 for a book or a game. What's the difference between a 5 or a 6? When to use 1 or 2, or possibly 3? Since I love lists, and therefore grading and sorting the items on that list, I'm always interested in how others grade their things. Let's take a quick look.
The first thing that springs to mind is social reading website [GoodReads](https://goodreads.com), where the scale is quite interesting. You're supposed to deal out stars, which of course get converted into a number for easier integer database storage (really?), but to me, the number distracts from the most interesting part, which are the labels:
1. Did not like it;
2. It was OK;
3. Liked it;
4. Liked it a lot;
5. It was amazing!
Note that awarding a book a 2 out of 5 means _it was OK_, it does _not_ mean it failed the test since it falls on the left-hand side of the distribution graph. I like this scale and use it myself to grade games, because it's much easier to think about those labels than it is to think about a number from 1 to 10. When you read a book or play a game, you instinctively know whether or not you _disliked_ it: there's the 1. You also instinctively know when it was one of the best books you've read that year: there is the 5. Did you like it (a lot), or was it meh but okay? There are the numbers in-between the 1 and 5.
Sometimes though, the difference between a 3 and 4, a 2 and 3, or even a 4 and 5 is confusing or hard to pinpoint. Brain Bankler of The Tao of Gaming explains in [a brief thought on game ratings](https://taogaming.wordpress.com/2011/12/03/a-brief-thought-on-game-ratings/) how he rates board and card games, using an only 4 instead of GoodReads' 5 item scale:
1. Avoid---won't play this;
2. Indifferent---I'll play this out of politeness, but won't suggest it;
3. Suggest---I like this game, and suggest it;
4. Enthusiastic---Play this often, suggest it.
His explanation as to why use the above system:
> The great thing about the guide (for me) is that Im constantly thinking “Is this game a 6 or 7?” but I have no trouble at all looking lumping games into those four categories. (Im pretty quick to avoid a game; but I have a large, varied game group where people dont take offense …)
Exactly.
In fact, that's even better, and fixed my problem with the GoodReads system. I guess these could be mapped as follows: the 4 and 5 is an _enthusiastic_, the 3 is a _suggest_, the _indifferent_ is a 2, and the 1 is _avoid_.
The most important part is that Brain never uses numbers on his blog. At the end of a review, he writes "suggest", for instance. There are no "`x` stars". I still do that, but might have to rethink my approach, as in the end, your readers will simply scan the text and remember the number---therefore throwing much-needed _context_ into the bin.
There's a bit of a hiccup though: everyone even remotely related to board games ends up at the [BoardGameGeek](https://boardgamegeek.com) (BGG) community, which requires a score on 10, and displays averages rounded down to one after the comma. If you're inclined to buy something but you're not sure, you check out BGG's average score. Guess which ones to watch out for? Indeed, 7+ = good, 8+ = amazing, 6-ish = meh. There's never any 9, 10, and almost never anything below 6 (or 5). So, again, what's the point? Even more troublesome, Brain has to translate his rating system into BGG's, of which he has a rough mapping for, explained in his article.
If you want to trim grades or ratings even more aggressively, you could go with the route of many contemporary video game review sites such as [EuroGamer](https://www.eurogamer.net/). They ditched the classic 10-scale system a long time ago, in favor of something rather minimalistic: a game is either not worth it or average, in which case there's no grade, or it's _Recommended_, in which would translate to Brain's _Suggest_. If it's really really good, then it'll be awarded the label _Essential_. The difference between _avoid_ and _indifferent_ has to be interpreted by effectively reading the review.
Some of our courses at the faculty are graded using a binary pass/fail system, but I'm not a fan. That way, there is no distinction between the average and the better. If I were to clean out my board game closet---like I promised I would---I'd have no way of expressing my enthusiasm for one game, while "just" agreeing to play another one.
And then there are specialized ranking systems, such as [the cRPG Addict's GIMLET](http://crpgaddict.blogspot.com/2010/04/ranking-and-rating-crpgs.html) abbreviation, where he judges each game in different categories: game world, character creation, NPC interaction, encounters & foes, ... Each category is a score on 10, and there are 10 categories: the sum is the global score on 100. Which, again, is totally useless, as admitted by the author: it reduces the rich information from the different categories into a single context-free number. His best games, such as Ultima Underworld, are awarded a 63. But what does that say about NPC interaction? You can't untangle those numbers once they're summed. Of course, at university, we do this all the time, and in the end, the administration office expects a number on 20, so we summarize and re-calibrate dutifully.
The more I think about grading, the more I'm inclined to pass on the numbers game as well, and instead focus on labels. I'll let this sink in for a while and come back to it to implement in my own future systems. Great talking to you, internet! Definitely a _suggest_ for you!