Is Algebra Just a Series of Footnotes to the Distributive Property?

Last month, I was helping some 6th-graders prepare for their final exam when it became clear that their teacher had utterly failed them. “You poor souls!” I said. “Orphaned by intellectual negligence!”

They blinked hopefully, as children do.

“Who was it?” I demanded. “Who taught you – or should I say, failed to teach you?”

“Um…” they hesitated. “You?”


“Well, whoever it was,” I said, “he denied you the chance to experience the beauty and centrality of the distributive property. For this, he shall have my undying scorn.”

The philosopher Alfred North Whitehead once described European philosophy as “a series of footnotes to Plato.” Twist his arm enough, and perhaps he’d have consented to describe algebra as “a series of footnotes to the distributive property.”


We use it frequently in arithmetic, but rarely name it as such. For example, try calculating 17 x 6 mentally.

One common strategy is this:


Another approach is this:


Both are just applications of the distributive property. You’re exploiting the fact that “a+b” groups is the same as “a groups” + “b groups.”

In fact, to avoid using the distributive property, you’d have to do something a bit unusual, perhaps like this:


Now, my 6th graders are pretty slick at the numerical version of the distributive property. They gobble up problems like this:



But they falter at the same step that haunts students worldwide: converting numerical instincts into algebraic generalities.



The distributive property is like most algebraic facts: just the crystallization of a familiar thought pattern from arithmetic. But that’s not how most students learn it. The easier short-term path is to see it as a new, disconnected rule for manipulating symbols: a(b+c) = ab + ac.

I fear this purely symbolic understanding is a broken futon, liable to collapse under enough strain. In particular, it seems to breed the “everything is linear” mistake. (One of my recurring nightmares.)

It’s crucial to get this right, because (to mangle the words of William Carlos Williams) “so much depends upon the distributive property.” Just look:



Now, most of my colleagues aren’t as worried about this as I am. They see a natural order for learning math: First, become adept at symbolic manipulations. Then, later, come to understand the deeper meaning. Trying to do it all at once – manipulation and meaning from day one – leaves students befuddled, befogged, and belligerent. Better to reach for the big ideas only once you’re comfortable with the mechanics.

They’re not wrong. Lots of effective mathematicians learned their craft like this.

But I can’t figure out how to teach that way. “No black boxes” has long been my motto. Intuition before formalism. Never charge forward with symbolic manipulations until you understand what those symbols actually symbolize.

This approach generally works for me. It means puzzling out notions of area before introducing compact formulas like “A=bh/2.” It means playing around with prime factorization before developing an algorithm for finding a highest common factor. It means baking understanding directly into a student’s thinking, rather than waiting until the cake is settled and then trying to sprinkle a little understanding over the top.

But when it comes to the distributive property, I still don’t have this figured out. Area models don’t seem to do much for my students. My wordy explanations (“it’s seven bags, each containing x + 7!”) are little better. I’m still seeking tasks that can sharpen and hone their thinking about distribution.

But I guess that’s the perpetual state of the teacher: still seeking.

On a different note:

A few weeks ago, the folks at asked me to help announce their “100 Day Summer Challenge,” a free collection of math puzzles targeted at high schoolers to keep them sharp and math-enthused over the summer. I was about to say “no thanks” (as I do to all such publicity requests) when I realized that the problems are actually quite slick and fun. And what is this blog for, if not peddling addictive stuff to high schoolers?

So here’s the cartoon I drew for them. I encourage you to check it out!

brilliant summer 1.jpg

14 thoughts on “Is Algebra Just a Series of Footnotes to the Distributive Property?

  1. I think this one comic just taught me more about algebra than any of my years of high school or college. You’re kind of amazing like that

  2. I’m not saying this to brag. My math skills pre-college (actually, pre Dr. James Kasum somehow deciding a kid who got correct answers about 25% of the time on good days had some extremely well-hidden math potential) were nothing to brag about.

    When I was learning algebra as a kid, my problem with the distributive property was that it is so bleeping simple and obvious that I couldn’t grasp that it was a thing worth thinking about. For a while, I thought I didn’t get it because it couldn’t possibly be that simple.
    So now I’m thinking about how I might be able to reveal the depth of the distributive property to some kids next year. So they can take the “why are you bugging me with obviousness” and turn it into “hey, I can do this!” I’ve got nothing so far, but I have until September to come up with something.

  3. The distributive property is at the very definition of multiplication. As I remember the axiom is something along the lines of:

    a0 = 0
    a(b+1) = ab+a

    And that is it! Multiplication is an operation that distributes over addition.

    Perhaps that is how we should be teaching it in the second grade.

    And as we move into more abstract levels of mathematics, and devise wacky new operations of multiplication (vector products, matrix multiplication, etc) we still require that multiplication distribute over addition.

    1. Think of it another way.

      You have an addition. That’ll be a closed associative commutative binary operator that you can probably cancel. (So if you can add a+b then you can add it to stuff; when you do, (a+b)+c = a+(b+c), with a+b=b+a; and a+x = b+x implies a=b.) A mapping f from one addition to another is a *homomorphism* of additions precisely if it maps any sum to the sum of what it maps the parts to: f(a+b) = f(a) +f(b). A homomorphism from an addition to *itself* is an *automorphism*. An addition induces an addition on its automorphisms by (f+g)(a) = f(a) +g(a); and we can interpret composition of automorphisms as a “multiplication” of automorphisms, (f.g)(a) = f(g(a)). Repeated addition of a value gives us “whole scalings” as automorphisms; once(a) = a, twice(a) = a+a, thrice(a) = a+a+a and so on; we can denote these as multiplications by counting numbers, 1.a = a, 2.a = a+a, 3.a = a+2.a, 4.a = a+3.a, etc.. We thus see how automorphisms can “feel like” multiplication; and when we “scale by” a whole number, construed as multiplying by that whole number, the fact that these whole number scalings are automorphisms is represented by the multiplication distributing over the addition.

      Chose any collection of automorphisms that commute with all the rest (i.e., for any auto f, each chosen commuter c satisfies c.f = f.c) and be sure to include the identity automorphism as one of those chosen; extend your choice by additive and multiplicative closure; which, since you included the identity 1, means you also get every counting number scaling (which feels like multiplying by a whole number); since each of those you chose does commute with all automorphisms, the same is true for any composition (i.e. product) of them and also for any sum of them; so you get yourself a bunch of automorphisms (they feel like scaling) of your addition, that commute with all the automorphisms of your addition; and you can interpret them as “numbers” by which you (by analogy with the whole number scalings, which they necessarily include) can “scale” the things you started out knowing how to add. You can “multiply” these numbers (and they commute with each other) by composing the scalings as which you interpret them; this multiplication is, by construction, commutative. The multiplicative action of these “numbers” on the things you started out knowing how to add is the induced scaling of each; which, when interpreted as a multiplication (as is natural, now that you can multiply scalings), distributes over our original addition.

      So, in answer to this blog post’s title I’ll say: no, the distributive law is just a piece of notational sugar by which we represent certain automorphisms of an addition as scaling by numbers (and such scaling constitutes a multiplication when we can identify each of the things scaled with one of the values we know how to add, as arises when dealing with the addition on whole numbers); but yes, most of algebra is just a series of elaborations of the inevitable properties of automorphisms (most of which can be expressed in terms of their representation “as if” they were a multiplication, that distributes over the addition). Within that, group theory is the study of automorphims (of some mathematical structure) that shed no information (i.e. isomorphisms).

  4. The TERMS of the “properties” like associative and distributive and so on always rather barricaded me from understanding — much the way terms like gerund and participle interfered with my appreciation of grammar and zeugma and metonymy restricted my understanding of poetry. It seemed like much of the end-of-lesson testing, at least, was about memorizing the glossary and quoting back the definitions rather than using the concepts to demonstrate their utility.

    I don’t know, though, how to teach that “this sort of thing” is useful in these ways different fro THAT sort of thing which is useful in OTHER sorts of ways, without assiging the sorts of things and ways distincts terms.

    Maybe “Bob” distributes values and “Sally” associates them?

  5. I loved to meet the distributive law again in Boolean algebra, where AND distributes over OR, but then again OR distributes over AND equally well.

    1. Similarly, set theory can distribute intersections over unions and unions over intersections.

  6. I find it helpful to have a setting where the property does NOT hold, which isn’t so obvious for the distributive property. but a/(b+c) != a/b + a/c works for early math.

  7. So, as a mathematician, I read Algebra to mean the larger field (get it, ha) of study, that is Abstract Algebra for which I do not think is a footnote to the distributive property. Similarly, I found the use of the term “groups” to be a bit weird because in order to have the distributive property, you need two operators, you a need a ring, not a group: one cannot reasonably speak about the distributive property in the dihedral groups, for example.

Leave a Reply