The Quixotic Search for a “Fair” Math Test

It’s happened again: a math question made students cry.

This time it was in Scotland—very discouraging, as I’ve always assumed the Scottish raise a tougher, more stoic, northerly breed of mathematician. Alas; it seems they’re as skittish and frightened as the rest of us.

Here’s the offending question:


(My two cents? This question is more than just fair; it’s really good!)

But students panicked. Then they tweeted their panic. The BBC quoted a former examiner denouncing this question as “unfit for purpose.” And commentators leered at the spectacle—by this point routine—of students freaking out about a hard math question.

This sort of ordeal threatens to confirm our darkest and most cynical suspicions about students. They’re incurious. They’re mercenaries. They’re on a witch hunt for anything that pushes them out of their comfort zone. They worship at the Church of the Right Answer.

So who do we blame? The students? Their parents? Their teachers? Educational bureaucrats, who are always a fun punching bag?

I believe that this flawed state of affairs emerges not through someone’s sinister design, but by well-intentioned increments. The problem isn’t that we want too little from our tests.

It’s that we want too much from them.

We want our tests to be objective. So we stop testing fuzzy, hard-to-measure things like creativity, insight, and broader perspective.

We want our tests to be consistent. So we stop asking questions with any degree of novelty or surprise.

We want our tests to be fair. But deep and authentic understanding is hard to measure fairly—much harder than procedural fluency—and so in the end we abandon that, too.

Every step of the way, we’re trying to sanitize our tests, to cleanse them of idiosyncrasy and irregularity. But those rough edges—surprising questions, strange twists, subversion of expectations—are what make tests work. They’re how we measure actual thinking and understanding, rather than just the narrow ability to carry out prescribed steps in a mechanical way.

Eliminating the surprises from a math test is like eliminating the bacteria from your stomach. It might sound like a good idea, but only if you don’t know how a stomach works. Without those bacteria, you die. They’re the active ingredient, the metabolic engine, the weird but powerful secret to the organ’s entire functioning.

And, if you’ll forgive the somewhat gross analogy, the same is true in math education. The attempt to sanitize a math test is precisely what kills it.


My colleague Richard recently remarked that every math test is, at its heart, a Turing test. Is there a thinking intelligence behind those answers? Or are they just the mechanical replies of a robot, blindly executing an algorithm? Can the test-taker really reason about mathematics, or can they merely fill a few pages with the right symbols?

Such assessment is inherently messy, incapable of perfection. You need to ask students questions they’ve never seen before. You need to surprise them, to provoke them, to challenge them, to push them, to…

Well… to test them.

I’m saying nothing new, of course. Ever since we’ve had schools, we’ve had folks like me, wringing our hands and shouting through clenched teeth that This emphasis on rote learning has got to stop!

But perhaps there’s something different about this moment in history.

The UK now has many exam boards, each competing for customers. This seems to have created a “race to the bottom.” Nobody has any vested interest in making the tests authentic and challenging. Instead, everybody—parents, teachers, students, and administrators—wants tests that are predictable and coachable. The exam boards, forever in search of new clients, are only too happy to oblige.

In the US, meanwhile, the landscape is shifting. For more than a century, we’ve had a deeply idiosyncratic and localized system, suspicious of standardized tests. But now, before our eyes, it’s coalescing into something more unified and national. The accountability movement has placed a sharp new focus on big, standardized exams.

And so we, too, hunger for assessments that are objective, consistent, and fair. We’re designing our whole system around the belief that such things exist.

And hey, maybe they do exist. I’ve always respected the AP exams. I teach now at an IB school, which I love. Even the English A-Levels of a generation ago seem to have done a better job of assessing real mathematical thinking. I certainly don’t mean to oppose all testing or centralization. Goodness knows that, at times, the US could do a lot more to live up to its “U.”

But we need to temper our expectations for these tests. We seem to want from our exams the diagnostic precision you’d get from an X-ray scan or a blood test, but education isn’t like that. Our kinds of tests seek to measure something subtle and slippery, a nebulous but absolutely vital thing called “understanding.”

These tests won’t be perfectly objective.

They won’t be totally consistent.

They might not even be 100% “fair” (however you define that word).

Instead, think of a math test like an interview with a politician. You ask a question, and from the way they respond, you learn something about them—how they reason, how they frame issues. Ask easy, familiar questions, and you’ll get prepped, robotic answers. But ask something fresh, thought-provoking, and just a little bit weird, and you might get a glimpse of what’s really going on inside.

Are these methods perfect? Of course not. But, short of telepathy, they’re the best we’ve got. They’re far better than nothing, and far, far better than the lifeless, sanitized spectacle that British exams are in danger of becoming.

48 thoughts on “The Quixotic Search for a “Fair” Math Test

    1. Agreed. For what it’s worth, the English (and presumably Scottish) systems are less calculus-focused: they do some stats, probability, vector stuff, etc. unlike we calculus-obsessed Americans. But still, I’m with you.

    2. My thought exactly. I remember this question in my first college calculus class. Of course the crocodile was a human, and the zebra was a pub, but hey, same math. I loved that class.

    3. I think it is a very bad question: The formula says that the time for crossing the river is the same whether you swim upstream or downstream. That’s not my experience.

  1. > “My colleague Richard recently remarked that every math test is, at its heart, a Turing test. Is there a thinking intelligence behind those answers? Or are they just the mechanical replies of a robot, blindly executing an algorithm? Can the test-taker really reason about mathematics, or can they merely fill a few pages with the right symbols?”

    Thank you for the biggest smile I’ve had this week.

  2. I liked the question but I that might be because I am please that I got it mostly correct (forgot the units) after not having done anything similar for over 10 years.
    Since UK schools are ranked in league tables and get to choose with which exam board their pupils sit, there is pressure on the school to boost league position simply by choosing simpler exam. This year’s school league tables were a particularly egregious example since the maths iGCSE (traditionally considered tougher and offering better preparation for A-Level) was excluded from the league tables resulting that many schools that focused on learning ideas as opposed passing exams found themselves at the bottom of the table with 0% pass rates.

    1. That’s bizarre! We teach the IGCSE, and frankly I think the maths there is a bit superficial. I can only imagine the state of the regular GCSE. Both seem to suffer from the “a mile wide and an inch deep” problem – lots of topics, but not much meat to any of ’em.

  3. I agree. It’s a good question if all you want to see is can they do some calculus which is independent of the situation. I would add part c: How wide is the river?……10 marks

    1. Yeah, I like that. (Maybe for 3 marks, since it can be deduced fairly quickly from the scenario where x = 0.)

      Personally, I think it’s missing the appropriate part a: given the width of the river, show that this is the correct formula.

      1. I actually think that they should be asked to find the (a) formula for the time. What is interesting is that depending on the relative speeds (which are buried in this formula) the optimum value for x may be the zebra point and not the calculus point.

  4. It’s rather funny that in the mathematical upbringing I had, through high school Mu Alpha Theta competitions, questions like this one were quite common on calculus exams, and would be lumped into the category of “rote problems with an algorithmic solution”.

    1. Actually, the fact that the function governing the total travel time is explicitly given should make this an incredibly straightforward optimization problem.

      Though I suppose if students hadn’t been taught the concept of the calculus of “optimization problems” it could be quite difficult to recognize.

      Regardless, my point isn’t to shame people who were taken aback by the problem, but rather to point out the ironic difference in perspective. What one group considers a straightforward, rote algorithmic problem, another considers “bamboozling”

    2. Yes, the optimization part of the problem is pretty straight forward Although it’s not the easiest derivative, and it requires solving a radical equation (which I find trips many students up).

      The beauty of the problem comes from the first two questions in part a, which I think should be given some more weight than they are since they require the most insight and thought in the problem, (2,2,6) split? I do agree with an earlier commentor that asking about the width of the river would be a great addition to the question.

      1. Maybe I’m missing something, but the first two parts simply require plugging in 20 and 0, respectively, to the function given. Obviously there’s a small amount of interpretation necessary, but not enough that I’d consider it “the beauty of the problem” (unless I happened to be using the problem to teach the concept, in which case I think showing the end points of the interval we’re optimizing on is very valuable, instructively)

        1. Technically, because you are optimizing on a closed interval you MUST check the endpoints (unless you have an urge to test for convexity).

  5. FWIW, I don’t agree this is a good question because the context conflicts with the math. This is a common issue with artificially imposed “real world” contexts for math problems, so I’ve written a more detailed note about this particular problem and saved here for posterity: Crocodile Tears for Real World Math

    That disagreement about the quality of the question aside, the point in your post is excellent: tests are required to have a collection of impossible to achieve and self-contradictory properties. A related issue: standardized tests are required to simultaneously serve too many purposes.

    1. I have to agree with Joshua’s first paragraph. I couldn’t get past the question, truth be told, to read or follow the main point of the post. I couldn’t stop thinking about the problem and whether as a teacher I would accept an answer to the problem (one that I would ikely give as a student)

      1. Oops, hit enter before completing…
        I would likely have answered the problem with something along the lines of, “None of the above. A crocodile doesn’t stalkits prey like that. It would wait underwater at the river’s edge and then, surprising the zebra, come up and grab the zebra by the face.”
        And while it might seem a bit too focused on the realities (mis)represented in the question, I can see students getting stuck on the specifics, the realities of the question, especially if students know that’s not typically how things work between zebras and crocodiles.

        1. Variation on the idea being tested that I prefer: Replace the zebra with a power station, and the crocodile with an island in the river. Running power lines on land is cheaper than running them underwater, $140 per foot vs. $290 per foot. Draw the path (and label the distances) that the power lines should follow to minimize the cost of running power to the island.

  6. Reading the article, the “unfit for purpose” seems to be because of an unprecedented cut score of 34% as opposed to some specific worries about creativity.

    I’d like to see more data on the crocodile. As much as the problem looks fine, if only (let’s say) 5% of students got it correct there might be something fishy going on.

  7. Minor correction: you don’t have stomach bacteria that you rely on. The essential digestive bacteria are in the small intestine, while the stomach tries to kill off any bacteria that come in. (The main remaining bacteria in the stomach are Helicobacter pylori, which are responsible for ulcers and stomach cancer.)

  8. I only recently discovered your blog and I’m loving it. I also want to add that I had these kind of questions in pre-calculus, which I took just last year. They involved running to a friend’s house by sidewalk or sand but same principle. Also, if it’s on the test, then wasn’t it covered in class? Or did the class problems use different animals, thereby confusing the kids?

  9. The problem is the test itself, not if it is hard, fair, or anything else. The kids learn that they have to pass tests to be fit for the society, to get a job, to participate. Failing means failing life. It is thus no surprise to see them optimize their efforts. All your good teaching is only judged wether it helps to pass the test or not. And you join the game. You have to. Because otherwise the other teacher is absorbing the kids with his test pressure. It’s a problem of our system.

  10. Okay, this totally geek-sniped me. I tried to figure this out before I read the accompanying post. I figured out the answer pretty quick, but for the wrong reasons. Of course they’d used a 345 triangle. Who could resist? So the answer must be 8. Quick check, 7.9 and 8.1 are higher, done.

    But I had no idea how to go about getting the answer when it didn’t happen to appear on a pretty test. Turns out, I took algebra to calculus fight (I’d thought this was an algebra test). I hope my thought that “I need to find the point the slope is 0” would have pinged me to calc IRL. But IRL, I’d graph it and skip the whole solving thing.

    So the moral of the story is to know what test you are taking?

    1. “. But IRL, I’d graph it and skip the whole solving thing.”

      Given that in many US high school math courses and tests the syllabus requires procurement and use graphing calculators, it might seem the point of the question was to create, inspect and interpret such a graph.

    2. In the UK then maths is maths (horrible grammar for you US folks, but correct here, I promise). We don’t have algebra tests or calculus tests (even at 18 they are grouped together as Pure maths).

      And this is what life is like!!!! Aside from the point about crocodiles stalking within a river, a real world maths problem doesn’t tell you whether it is algebra or calculus; you have to understand the problem to know which maths weapon to use.

      [Excellent use of geek-snipe as a verb btw!]

  11. I think it’s a good question, with a good distribution of marks. I assume this is for GCSE Ordinary levels? What bothers me is that where I’m from (Singapore) this question would have an additional part (c) that asks
    “Now, show that your answer to (b) is accurate without using calculus or algebra in your answer” for a further 8 marks and that it is normal and expected to do so.

  12. Afraid you’re wrong about the UK – the regulatory body Ofqual monitors the tests set by the examining body and has done an excellent job of ensuring that there is not a race to the bottom. Indeed, the very existence of the maths question that you cite is proof itself that claims of a race to the bottom are unfounded.

  13. Fun fact: you can do this problem with Snell’s law, from physics, sin(a)/sin(b) = v_a/v_b. Because the angle of refraction (shore) is 90 degrees, and the velocity ratio is 4/5, you can find the angle of incident is 53 degrees in a snap and use a little trig to find x.

    Ben, I totally agree with your points about testing. Which is why I think this question isn’t a great question, though not an unfair or bad one. A better question would have required students to set up the equation in the first place. In my experience, THAT’S the part calculus students struggle with the most.

    1. Having taken UK exams, the question IS asking you to set up the equation. It is also giving you the equation so that when you get the answer you know that it is right.

      It is an odd phrasing if you haven’t seen it before, but this won’t have thrown students that have seen a couple of past papers.

  14. I’m a student who just took the AP Calc test last year and I actually sort of disagree with you about the AP test. I think there was probably about one or two problems that forced me to think, and the rest weren’t difficult but rather confusingly worded. Our teacher drilled us in being able to do calc quickly and efficiently, but we never did proofs and never really had to think for ourselves in the class. I did get good at arithmetic, and at taking integrals, derivatives, and their cohorts quickly, but I felt like the AP test was literally just more of that (besides, again, one [or maybe even two!] free response questions which actually forced me to think). We never even touched epsilon delta proofs.

    As for this problem, I’m not sure why there’s any fervor about it. I mean it’s literally a minimization problem, and an easy one at that. If your calc class is any good, you should be able to figure it out, and you don’t even really need to think.

    Side note: The sad part is, most US students never even get to calculus in high school. The average student knows remarkably little calc by the end of HS.

    1. What, exactly, is sad about students not getting to calculus in high school? 80 years ago, give or take, algebra was a college course that relatively few students learned in any depth, if at all, in high school (which means few learned it at all given the percentage of students who attended college before the end of WWII). The country managed to do quite swimmingly in a host of ways in the following decades. So I’m not sure why anyone would feel this was “sad.”

      I graduated high school in 1968 and knew zero calculus. When I decided I was interested in mathematics sufficiently to learn the subject, around 1985, I was able to do so without major difficulty. I was about 35 years old. Today, I have a masters in mathematics education from the University of Michigan. I’ve taught & coached mathematics teachers starting around 1989.

      Don’t overrate calculus. Most people who take it in college leave knowing just enough to have passed the classes. They never take more mathematics and they rarely, if ever, use the calculus they learned. Ask any MD or veterinarian or dentist, for instance, when they last used calculus. If it was after they graduated college, I’ll be very much surprised.

  15. I saw a video by MindYourDecisions on YouTube that used optics (i.e., Snell’s law) to solve this one. It was cool!

  16. I think it is a great, standard calculus question, but honestly they should have just given the rates that he swims and walks on land so that the students could derive the time function in terms of x. Giving them the function reduces this to a simple plug-n-chug, optimization problem.

  17. Thanks for the details. It’s easily understandable, You’ve explained very good. I’ll try this and see.

Leave a Reply to Michael Paul GoldenbergCancel reply