Everything Is Linear (Or, the Ballad of the Symbol Pushers)

What is the biggest problem facing humanity this week?

  • A. The threat of Grexit
  • B. The bittersweet knowledge that someday, when all of this has passed, we’ll have fewer opportunities to use the amazing word “Grexit”
  • C. People thinking functions are linear when they’re SO NOT LINEAR
  • D. Other (e.g., cat bites)

If you answered C, then congratulations! You are probably a teacher of math students ages 13 to 20, and we all share in your pain.

For everyone else (including you poor cat-bitten D folk), what are we talking about? We’re talking about errors like these (warning—mathematical profanity ahead):

20150708084840_00001What’s wrong with these statements? Well… everything.

20150708084840_00003I don’t blame the kids, of course. These errors are a natural—perhaps even inevitable—byproduct of the way we teach mathematics.

Every rule in secondary school math is actually a statement about objects—shapes, numbers, whatever. But they don’t feel like that to students. They feel like purely symbolic manipulations, rules for how x’s and y’s move around a page, not for how actual things relate to each other. When you see rules that way, it’s easy to fall prey to certain systematic errors.

Why is this particular error so tempting? Perhaps because it closely resembles a real rule students have learned:


This is called the Distributive Law, and it’s actually a deep fact about the numbers, an essential link between addition and multiplication.

You can discover it through numerical examples:


Or you can understand it with arrays:


Or you can use my colleague’s wonderful phrasing and think of an expression in parentheses as a “mathematical bag”:

20150708084840_00007The distributive law is crucial. It underlies most of what we do in algebra, such as factoring and “gathering like terms.” So what’s the problem?

It’s that students don’t learn the distributive law as a fact about numbers. They learn it as a fact about parentheses.

20150708084840_00008And unfortunately, mathematicians use parentheses in two very different ways: first, to group numbers; and second, to designate the inputs of a function.


The notation “f(x)” doesn’t mean “f times x.” It means “what I get when I put x into the function.”

But our faithful symbol-pushers don’t always catch the distinction. They see parentheses, and think, “Oh yeah! That rule I learned should apply here.”

What’s the solution? Here’s my current game plan:

  1. Teach the distributive law more carefully. Draw pictures. Work examples. Talk about “bags.” Make sure they understand the meaning behind this symbolism.
  2. Teach function notation much more carefully. Give them the chance to practice it. Think like Dan Meyer and seek activities that create the intellectual need for function notation.
  3. Keep stamping out the “everything is linear” error when it crops up. Like the common cold, it’ll probably never be entirely eradicated, but good mathematical hygiene should reduce its prevalence.

What’s the only thing giving me pause here, making me doubt my nice, tidy theory of this error? In two words: Jordan Ellenberg.

In How Not to Be Wrong, he catalogs a variety of cases where people falsely assume “all curves are lines.” His examples are never just symbolic. Each time, people are messing up on a conceptual level. They’re not just pushing x’s and y’s around a page, but genuinely believing their own wrong statements.

So is my approach off-base? Are people actually acting on false beliefs about functions, or (as I’ve posited) are they failing to think about them as “functions” at all?

I’d love to hear what you think. And if you’ve got solutions for the Greek economy or cat bites, well, I’m all ears.

UPDATED: edited to correct the fact that I do not know the difference between numbers and letters. Oh well, like I always say, “C strikes and you’re out.”

32 thoughts on “Everything Is Linear (Or, the Ballad of the Symbol Pushers)

  1. They wouldn’t write it if they didn’t believe it !
    Of course, the evidence for their belief is feeble in every case.
    Linearity is “nice”, but also there are situations with multiplication where it works:
    root(a x b) = root(a) x root(b)
    (a x b)^3 = a^3 x b^3
    Also at the back of their minds is Pythagoras
    a^2 = b^2 + c^2 , overlooking the fact that a is not equal to b + c

    I think you are right about the overloading of the bracket signs, and the confusion stemming from the distributive law, which I personally think should be written as (a + b)c = ac + bc ( I am on my own here!).

    A major problem in algebra generally, which kicks in hard with this stuff, is that the kids are not in the habit of asking “What does it look like?” and “What does it mean?” and “What does it say in ordinary language?”. For example, when looking at sin(x + y) do they see a circle with added angles, or do they just see the symbols?

    Of course, a partial cure in each individual case is “Shove some numbers in and see what you get”.

  2. The thing about math is that similar notations can have vastly different meanings in different contexts. All students of math need to be able to catch that and distinguish between the different notations. For example, in the Leibniz notation of differentiation dy/dx, I had a tough time getting someone to understand that the letter “d” is not another unknown which will cancel itself out giving only y/x. See, it is these things that students do not fully understand, that can possibly impede their learning. Of course, the linearity error is the most common, and yes, we will never be able to eradicate it.

  3. Some of the difficulties is that we spend so much time in mathematics explaining what is right and we do not train students well enough to recognize what is wrong, except to point out their own mistakes. We do point out fallacies when we discuss logic, but not so much in other parts of math. Perhaps we should devote much more of the curriculum (say 20%) training students to understand and recognize the sort of conceptual misunderstandings that can and do occur.

  4. As unofficial representative of the “I don’t get math” contingent math teachers keep insisting doesn’t/shouldn’t exist – let me say that your diagram of x(y) as product vs function is… brilliant.

    Such simple things help so much. y(x) can be two things, and I know what to do with both things, but realizing it’s the same symbol for two things, well, why did no one tell me this before? 5678 changed my life: 56=7*8 and why did no one ever tell me this before??? The log is the exponent: hey, now I don’t need a cheat sheet to figure out what to do with log_10(100)=x because the log is the exponent, and again, why did no one ever tell me this before?
    (now do you believe I don’t get math?)

  5. There is a related article I read, summarizing some recent research:
    “Solid Findings: Students’ Over-reliance on Linearity” at
    (page 53 of the PDF, numbered page 51).

    For example, even with word problems like “Sue and Julie were running equally fast around a track. Sue started first. When she had run 9 laps, Julie had run 3 laps. When Julie completed 15 laps, how many laps had Sue run?”, the students (who were actually teachers in this example) tended to solve them linearly (e.g. answer 45 for this case instead of 21).

  6. I teach my students that they should rescale their plots (often using log axes) so that their data falls along straight lines on the plot, since that is all anyone understands visually.

    1. Indeed, if your students can find a reparameterisation of their data that leads to a linear plot, the choices of parameterisation (they need not be linear) shall reveal something important about their data.

  7. You know, if you answer a question that asks you to pick 1, 2, 3, or 4 by using a letter, you’re definitely math-challenged in a different way.

  8. The ubiquitous fallacy of universal linearity is the source of many buggy algorithms (student misconceptions) in K-12 mathematics and beyond. One of the two key properties of linearity is that f(x + y) = f(x) + f(y). This fact is, of course, very important. . . as long as it’s being applied to things that are in fact linear. But as pointed out repeatedly in this blog piece, loads of things at the not very advanced level of mathematics are not in fact linear. And when students make the opposite assumption, they start creating rules for how things work in math that simply aren’t so.

    One thing that strikes me as fascinating when dealing with students who’ve “learned” logarithms is that they are mathematical beasties that are explicitly NOT linear in their very essence (probably we can say that about everything that isn’t linear, but hear me out). It seems to me that one of the main things that should grab students by the intellectual throat when they study logarithms is first that there’s no shortcut for dealing with the logarithms of sums. That is, log(a + b) = log(a + b). Not log(a) + log(b). But there IS something that equals log(a) + log(b). And that is the logarithm of a PRODUCT: log (a*b) = log(a) + log(b). And that is in part the whole POINT of logarithms. They allow us to turn “big” calculations (products) into more manageable calculations (sums).

    Of course, students should have seen this before, when learning the closely related laws for exponents. There’s no fast was to calculate sums of numbers raised to powers: x^a + x^b is just that, and you need to know x, a, & b if you’re going to be able to arrive at a numerical answer or simplify the expression. But with products, we do have a rule: (x^a)*(x^b) = x^(a+b) – that is, a given base (x) raised to a power multiplied by that same base to a power equals that base raised to the SUM of the powers. Multiplication gets turned into addition (sort of) again. And things that behave like that in mathematics are most decidedly NOT linear.

    So why do students miss this point? My guess is that teachers don’t stress adequately this aspect of the difference between linear and non-linear beasties. And we should. It’s not sufficient to look at graphical differences (though that is of course important and closely related). When kids are just looking at symbols, they need to perceive salient differences even when they don’t have graphs in front of them or readily accessible. I suspect that repeated explicit visits to that key property of linearity mentioned in my first paragraph and contrasts to it whenever non-linear things are being investigated would be very helpful indeed.

  9. If you answered “C”, you didn’t realized the list was numbered, not lettered 🙂

    And as a random fun fact, when working modulo a prime number p, we actually do get the lovely relation that (x + y)^p = x^p + y^p.

  10. The more I think about it, the more I start to believe that learning algebra in secondary school is a bit like learning to ride a bicycle. You can learn the theory all you want, but you can’t do it until you gain the muscle memory. In the case of Algebra in secondary school, you gain the muscle memory from learning to take timed tests.

    Regardless of how dutiful our studets may be in reading the section and learning what functions are, the poor souls are stuck between a tentative understanding of the concepts and an immense pressure to “find the answer” very quickly. If we’re honest, we don’t give our students time during tests to ponder the great universal truths of addition and multiplication when they are solving algebra problems. They just know that a D on a test will feel like a face full of gravel, so they develop reflexes.

    When we show them functions, suddenly their reflexes only make matters worse. I think this is much like learning to ride a backwards bike (like the smarter every day guy https://www.youtube.com/watch?v=MFzDaBzBlL0). I think we just have to be honest with our students. We forced you to learn to use the tricks of algebra until it was as natural as walking. Now we need you to learn to walk all over again, sorry.

  11. Or you simply should teach multiplication with the sign of middle dot: a · b
    a · (b+c)=a·b+a·c

    problem solved.

    1. Indeed, conflating the notation for multiplication (when the right operand is an additive expressions, so needs parentheses) with that for function application is clearly one of the causes of the problem; and making multiplication overt (with a symbol for its binary operator, rather than mere juxtaposition) is a good thing in its own right.

      It might also be worth introducing the idea of homomorphism sooner; an exponential function is a homomorphism from addition to a multiplication, a logarithmic function is a homomorphism from a multiplication to an addition, a power is an automorphism of a multiplication, linear maps are homomorphisms of addition that respect scaling and so on. When the student understands that there *are* functions with a property, but it’s a special property and only some functions do have it, they may be better prepared to pause and ask “is this linear” before applying linearity’s rule to a function.

      It probably also doesn’t help that we use “linear” in two related but distinct ways. One of them has y = a.x +b as exemplar; the other rejects its +b. Indeed, f(x +y) = f(x) +f(y) isn’t true when f is “linear” in that first sense, as a.(x +y) +b isn’t equal to (a.x +b) +(a.y +b) unless b = 0.

      I suspect an important part of this is to make it routine, when introducing a new property that *some* functions possess (e.g. linearity), to spend a bit of time exhibiting functions that *don’t* possess the function. Seeing a few counterexamples can make it much clearer that the property is a rare and special thing, counteracting the fact that – because they’re going to be learning about this property – they’ll tend to only see functions that do have it, which makes it easy to just assume any other functions they meet shall have it.
      In the course of teaching the consequences of a particular property, we tend to give lots of *examples* of it working; it’s important to pepper that story with some *counter*examples, that show how things lacking the property *don’t* behave the same way as things that do.

  12. It’s not just when a student sees parentheses that they decide to use nonexistent linearity. You already gave the example of sqrt(x+y)=sqrt(x)+sqrt(y) when it is written as it usually is in a non-ascii restricted way. Another example I’ve seen in grading is 1/(x+y)=1/x+1/y.

    My theory is that most students just never learned all the rules so that when they come to an expression they need to work, they guess on tests that the same rule should apply as in the distributive law. When they are doing homework they guess as well because they are too lazy to look up the law.

    In a way it’s not a dumb idea to guess linearity. Goodness knows physicists have gotten tons of milage discarding nonlinear terms. Of course the educated person knows when we can get away with this and when we can not! Bayes theorem is a classic example.

  13. For my last two years teaching, when I introduced functions I taught it with my own notation. I THINK it helped avoid the problems you discussed.

    After teaching functions as machines with inputs and outputs, I argued for the need for a more convenient notation than drawing little machines all the time, and introduced the notation f<>, where the <> represents the input slot on the machine. For quite a while we did everything with this notation. Eventually I told them what the real notation was and discussed why I thought it was poor notation. As I say I think it helped their understanding and helped avoid errors.

  14. I liked Jordan Ellenburg’s chapter on “all curves are lines,” but I think it’s unrelated to the problems you discussed. I think you’ve correctly diagnosed it as the result of students learning math as symbol manipulation not meaning, and parentheses being used for two unrelated things. To which I would add the multiplication in 3(4+5) which is implied but not made explicit. If students only ever saw 3x(4+5) or 3.(4+5) then f(a+b) would not look as similar. But as Bryan’s examples show, that by itself will not solve the problem.

  15. This is just one example of the many notation issues people make. We’ve all seen them: the division sign that grows or shrinks so other terms are now part of the fraction, the exponent that becomes another digit in the base number, the dot multiply that becomes a decimal point, etc. It could be argued that many notations should be rethought… but that’s too much work.

    What is very interesting is teaching mathematics along-side computer programming, where the concept of function is very well defined and becomes concrete. Have students play around with functions they write themselves and they’ll quickly see that f(g(x))g(f(x)) and certainly f(x+y)f(x)+f(y).

    1. You’ll also notice that the designers of computer programming languages have consistently forced a notational difference between “action of a function on its parameters” and multiplication. If you write f(g(x))f(g(x)) you’ll just get a syntax error. f(g(x))*f(g(x)) interposes * as a binary operator to combine two values. f(…) says to call function f with some given parameters. The syntax of programming languages does not leave open the possibility that a(b+c) might express an attempt to multiply a by b+c; a had better be a function, and b+c had better be a value a is happy to accept as an input. Designers of programming languages have had to be ruthlessly honest about what a text is meant to mean, without taking for granted that the reader shall be smart enough to “know what the author meant” – and attempts to design languages that “guess what the author meant” have taught us Not To Do That. Juxtaposition as multiplication, while parenthesised expression means a value, should not live in the same notation as expresses the action of a function by juxtaposing, after the function, parentheses that enclose its argument(s); doing so creates ambiguities that preclude reliable parsing by dumb software; and that fact on its own shows that it is inevitably a bad notation to teach to students new to the subject.

  16. I just finished How Not to Be Wrong today!

    I’m a science teacher, not a math teacher, but I do remember making this kind of error in middle school. I don’t think it was function notation that confused me, though – just an over-application of the distributive law. The kind of mistake my brain would make when it went on auto-pilot, like driving past the exit I want on the highway because normally I don’t take that exit.

  17. I like your idea that students think about it as a property of brackets, but it is also true that students do it when there are no brackets, such as with powers and fractions, so there is something more to it than just brackets. I think in general students just want to try *something* and it works in other situations, so why not give it a go?

    One thing that seems to help students is not (just) to point out the things that don’t distribute over +, but to point out that the *only* thing that *does* distribute over + is multiplication by a number! Students seem much more able to hold onto this one “this is where you can” than they are to hold onto all the separate “this is where you can’t”s.

    (And it is actually true that it’s the only place you can. It is possible to *prove* that if a continuous function f does the trick of f(x+y) = f(x) + f(y) then the only thing f can possibly be is f(x) = ax for some number a.)

    1. Oh, and PS: I think many students do actually believe that f(x) does mean “f times x”. Many of my students actually say the words “f times x”, especially with trig functions.

    2. Your “respects addition => is scaling” inference only works for the one-dimensional case and for continuous functions. In even two dimensions, f(x +y) = f(x) +f(y) works fine for shears and rotations as well as enlargements, after all.

      For the need for continuity, notice that {reals} are technically an infinite-dimensional vector space over {rationals}; so pick a basis, over {rationals}, of reals; and map each basis member to a random real; then extend this linearly to all reals by writing each in terms of the basis, times rational scalings, and scaling each component by its basis member’s separate factor, then adding the fragments back together. You now have a function from reals to reals, that technically *is* linear over rationals, but will typically be everywhere discontinuous, in which case it certainly isn’t linear as a mapping from reals to reals.

  18. I fear the worst case of this I’ve met is that economists assume the economy is governed by linear differential equations – mainly because they don’t know other kinds of dynamical system can exist. This may have improved in the last few decades, but it was the impression I got in the 1980s. They took for granted that systems tend towards an equilibrium, unaware of chaos, strange attractors and other weirdness that real dynamical systems (governed by non-linear differential equations) exhibit.

    Linearity is easy to reason about and easy to teach, so we teach it.
    We don’t expose students to how tiny a corner of the space of possibilities that actually describes.
    Unfortunately, this makes it very easy for students to end up just assuming it’s a property of all things.

    Almost all (in a measure-theoretic sense) of the reals are non-exhibitable.
    Almost all even of the exhibitable reals are irrational.
    Most of the reals we ever actually get to see in tests are whole numbers or, at worst, rational.
    Students tend, therefore, to be oblivious to just how rare and peculiar the cases they’re most familiar with really are.

Leave a Reply