Is this the silliest scoring system ever?

In 2020, for the first time, the Olympics welcomed the sport of rock climbing. Alas, it wasn’t the warmest welcome, because for the three highly disparate events — speed climbing, lead climbing, and bouldering — the sport was allocated just one set of medals.

How to distribute one gold medal for three distinct achievements?

The committee hit on a peculiar solution (which I learned about from one of my fabulous students at Macalester College). In each competition, the competitors were ranked from 1st to 8th, and then those rankings were multiplied together. Lowest product wins.

More predictable, and less remarkable, would have been to add the rankings. It would still have been a bit troubled — such an approach can exaggerate tiny absolute differences (for example, if I beat you by a hundredth of a second), or suppress huge ones (for example, if you beat me by a full ten minutes). Pro tip: if you’re aggregating scores, wait until the end to collapse them down to rank-order.

But that weirdness is nothing compared to the effects of multiplication, which “cares” much more about differences at the top than differences at the bottom. Thus, 1st is much better than 2nd, but 7th is scarcely better than 8th.

I marveled at this oddity to my father, and he pointed out an even stranger effect: whether you outperformed me, or vice versa, depends on how other people performed! For example, say I finish 1st, 1st, and 7th, while you finish 2nd, 2nd, and 2nd. By a narrow margin, I take the gold, and you take the silver.

But wait! A secret wizard enters the competition at the last moment, and finishes 1st, 1st, and 1st. I am bumped to 2nd, 2nd, and 8th, and you to 3rd, 3rd, and 3rd. The wizard now takes the gold… and suddenly, without lifting a finger, you have retroactively become better than me, retaining the silver while I drop to bronze.

But hold on again! It turns out the wizard cheated in bouldering (the third competition), and is disqualified in that ranking.

So, do we keep the wizard’s ranking in the other two competitions, and simply ignore him for the final ranking? If so, your scores (3rd, 3rd, and 2nd) defeat mine (2nd, 2nd, 7th).

Or do we eliminate the wizard altogether? If so, we’re back to the original scenario, and my scores (1st, 1st, 7th) defeat yours (2nd, 2nd, 2nd).

The gold medal is dancing on the head of a pin, entirely dependent on the question of when and how a disqualified competitor is removed from the scoreboard.

And yet, guess what? Such theoretical oddities didn’t wind up mattering . The 2020 medals were awarded without controversy.

It reminds me of an odd pattern I’ve observed, when poking at spreadsheets of student scores at the end of a semester, before assigning final grades: Weighted averages are often surprisingly insensitive to re-weighting.

How you combine the sub-scores may seem like a matter of real importance. Perhaps even a matter of justice. Should good quizzes compensate for spotty tests? Should a great final exam make up for missed homework? And yet, trying the calculation various ways, you rarely see a major change. Sometimes, only one or two students are affected at all. The data is all more correlated than you’d expect.

So maybe the Olympics didn’t get it so wrong?

No. I cannot but laugh and grimace at the multiply-the-rankings system. It offends my mathematical sensibilities, in the same way that bad grammar offends some people’s ears — but the preference, I must confess, is largely a matter of aesthetics.

14 thoughts on “Is this the silliest scoring system ever?

    1. Ah, that’s great! That comment nicely captures the issue I’d been thinking about — independence of irrelevant alternatives (as in Arrow’s Theorem) is clearly too strong a criterion, since if you finish 1st, those “irrelevant alternatives” can be the difference between my finishing 2nd and 8th, which will obviously matter. But the comment gestures at a weaker criterion, “independence of unanimous winners,” and the multiply-the-rankings system’s failure to achieve that criterion does seem like a real drawback.

  1. The scoring is equivalent to adding up the logarithms of the rankings, highlighting even more how much first is weighted much more than eighth.

    1. Yes, my dad pointed this out to me! Log-sum the rankings remains a funny idea. You could just as easily square-sum the rankings, and super-exaggerate differences at the bottom!

  2. I like the way the last sentence mentions some people’s sensitivity to bad grammar given that the second-to-last sentence uses the stuffy-but-grammatical construct “cannot but” in place of the ungrammatical-but-common construct “cannot help but”.

    1. I was going to say the same thing. It’s as if each event is a voter, and each athlete a candidate.

      More abstractly, Arrow’s theorem will apply anytime you need to aggregate a bunch of ordinal rankings into one overall ranking.

      The simpler system of just adding the scores and taking the lowest is like the Borda count in voting. I haven’t seen multiplying the scores discussed before, so this was a fun post!

      The point in the post that sometimes who wins between a pair of athletes in a competition depends not only on their ranking relative to each other, but on their ranking relative to other athlete’s who may be competing, is a failure of the independence of irrelevant alternatives.

  3. I dispute that the 2020 medals were awarded without controversy. I seem to remember that whether Adam Ondra came 1st or 6th in the men’s ended up being determined by Jakob Schubert’s lead climb. Schubert ended up beating Ondra, putting Ondra into 6th place. Schubert came 3rd.

  4. Your discussion about weighted averages is why I have tried to move away from that kind of grading scheme. Final exam scores correlate very tightly to quiz scores, so I end up getting a bit of a feel-bad grading a bunch of quizzes which ultimately don’t seem to matter much for the final score in the class. I also get the impression that this kind of schmearing out of scores kind of obfuscates performance—it is harder for students to have a good sense of what they have done well or done poorly.

    Currently, I have a scheme that is closer to a “standards based” approach:

    1. Our curriculum forms indicate that (for example) our precalculus class has about 25 learning outcomes. So I quiz students on each of these objectives throughout the semester. If you want to get a C in the class, you must demonstrate C-level proficiency (or better) in 20 of the 25 objectives by the end of the semester (by answering quiz questions correctly, or by completing written assignments, or via oral quizzing in office hours, etc); to get a B in the class, you must demonstrate B-level proficiency in 10 of the objectives, and C-level proficiency in an additional 10 objectives. For an A, you need A-level proficiency in 5, B-level proficiency in 5 more, and C-level proficiency in 10 more.

    The level of proficiency is related to the kind(s) of question(s) you can answer, relative to a simplified version of Bloom—a C-level question might be “Give an equation for the line which passes through (1,-2) and (-3,2),” while an A-level question might be something like “Given a line with equation 4x+3y = 10, give the vertices of a square with one side along the line and a vertex at the origin.”

    2. I assign a somewhat more in depth written assignment every week. To get a C, satisfactorily complete 5 of those assignments (out of 15); to get a B, complete 10 of them; to get an A, complete 14 of them (you get one “bye”).

    3. I place a lot of value on attending office hours. To get a C, attend office hours at least 5 times during the semester; for a B, attend 10 times; for an A, attend every week. Alternatively, since a lot of my teaching is currently online / asynchronous, office hours can be replaced by any kind of meaningful mathematical interaction, e.g. via email. I’ve checked off this requirement for students who just email me interesting magazine articles which have some mathematical content.

    4. And so on.

    In order to get an A in the class, you have to get an A in each category (though I will sometimes fudge things a little if a student has As in all categories except one, and is close in that one category). To get a B, you have to get a B or better in each category (again, I might fudge things a bit for a student who is close to a cutoff in just one category, or who has As across the board except for one C). And so on.

    Of course, the difference between this and rock climbing competitions is that I am not trying to rank the students—each student’s grade depends only on their own work, and not how they compare to the other students in the class. So I suppose that this is rather impractical for judging a bouldering contest. 😀

    1. Hey, I’m sure the rock climbers would benefit from office hours, too!

      Your SBG-inspired approach sounds nice to me. These days I use a still-SBG-inspired-though-even-further-from-SBG approach myself, mostly built around making the grading less labor intensive for me (which seems like the perpetual complaint of standards-based-grading practitioners).

Leave a Reply to Ben OrlinCancel reply