*elaborations upon a blasphemous course review*

I am a pure mathematician by training, inclination, and marriage.

I am also a blasphemer, a heretic, and a traitor to my people.

What I’m saying is that I just finished taking a machine learning course full of rigorous mathematics, and in my course review, I advised the professor to stop worrying about all that rigor. (Sort of. What I said was more nuanced, but no less obnoxious.)

For the question “How can this course be improved?” I wrote:

I humbly propose rethinking the role of proof and mathematical derivations in the course.

A widely held view among mathematicians is that proofs deepen students’ understanding. In my opinion as a professional math communicator, this is wrong. Proof is better understood as the

laststep in mathematical work. As you know, a researcher attempts a proof only after a long process of probing examples and seeking intuitive principles. Until students have gone through a parallel process, seeing proofs will rarely benefit them.I also believe that, for understanding most material, proof is neither necessary nor sufficient. In this course, when I would ask the TAs about the proofs, I would sometimes find myself answering their questions instead. Yet I know they understood the models themselves better than I did!

For these reasons, I propose moving proofs from the beginning of each topic to the end, and treating them as parenthetical to the work of understanding and implementing the models.

In my view, this would not constitute a loss of rigor or depth, but the opposite. It would embrace the true role of proof (as an act of consolidation and intra-mathematical communication), while shifting student focus toward understanding the logic and limitations of machine learning methods.

Let me elaborate upon my heresy.

It’s common to treat a proof as a kind of explanation: a careful, formal, highly detailed answer to the question “Why is this true?” Under this view, proof is the essence of understanding, and an unproven statement is a black box. A class full of unproven statements is even worse: a shallow ditch, a clown show.

I say bah.

I say humbug.

I say this whole worldview rests on a confusion about the word “why.”

When the referee of a math research journal asks “Why?”, then you should answer with a proof. That’s well and good. But when a human being asks “Why?”, then you should answer with something subtly yet profoundly different: an explanation.

A proof and an explanation share a similar structure. Both show how a new statement fits comfortably in a pre-existing structure of old statements.

The question is: what exactly is this structure?

In the case of proof, it’s *the body of shared mathematical knowledge*. It consists of accepted axioms, earlier proofs, and agreed-upon definitions. It’s a tower built by a community of researchers, and we’re supposed to be *very* careful when making additions. Nary a loose brick is safe.

In the case of explanation, the pre-existing structure is *my personal knowledge about the world*. It consists of experiences, beliefs, mental models, and earlier mathematical understandings. Logic and rigor may help hold it all together, but for the most part, it’s made of squishier stuff: approximations, rules of thumb, illustrative examples, and vivid, memorable images.

To contrast proof and explanation, let’s take a famous statement:

On the one hand, we can prove this by induction. Such a proof cements the statement’s place in the body of mathematical facts. It is now as truly immortal and immortally true as any fact can be.

But does anyone really feel like this *explains *it?

Meanwhile, here’s an explanation of that same fact. Your mileage may vary, but for my part, this explanation deepens my understanding and helps me embrace the statement’s truth.

Our strategy is to estimate the sum: it’s the length of the list multiplied by the average number on the list.

What is the average number? Tempting to think it’s 1/2 of n^{2}, but that’s off. It’s closer to a 1/3 of n^{2}.

So that gives us our estimate: roughly n^{3}/3.

Have we proved the formula? Not at all. We’ve given a squishy argument for something vaguely similar to the formula. But that has its own benefits. Better yet, for those who know calculus, this sloppy formula hints at another explanatory connection: the sum of the first n squares closely resembles the integral of x^{2}.

Explanation is not just a fuzzy proof, and proof is not just a rigorous explanation. They are different species entirely: one logical, the other psychological.

Am I saying proof is bad? No! Proof is the architecture of mathematics. It’s our castle walls. Without proof, the rain would soak our hair, the wind would snuff out our candles, and the wild animals (read: physicists) would come wandering inside to eat our throw rugs.

Don’t abandon proof. Just acknowledge its true purpose – or rather, purpose*s*. Proof finalizes and formalizes understanding. Proof enables generations of scholars to collaborate on a single intellectual project. Proof serves as math’s final arbiter of truth.

And, as a kind of side gig, proof sometimes helps to explain why things are true.

But if we want students to understand mathematics, we can’t expect proof to do the heavy lifting. We need worked examples. We need well-chosen counterexamples. We need pretty pictures. Heck, we need ugly pictures. We need analogies, heuristics, and loose connections to more familiar ideas.

We need, in a word, explanations.

I felt silly writing that course review. It was a lovely class whose primary shortcoming (an uncritical embrace of proof as the be-all-end-all for mathematical reasoning) is shared by approximately 99.737% of similar courses. More to the point, I’ll be shocked if my professor is at all persuaded. That’s the nature of explanation: it’s a gradual process, a social process, and it needs to meet people where they are.

Still, if nothing else, I hope that I managed to make my blasphemy a little more legible – and perhaps even a little less blasphemous.

As a physicist I just want to state that I have never eaten a throw rug.

That’s because we kept the walls strong, silly physicist. 😛

Many, many years ago I did my undergrad as a joint degree between math and physics so spent a lot of time in both places. The math building didn’t _have_ rugs. Maybe the_mathematicians_ ate them. One the other hand, the rugs in the physics building were intact.

Gasp! All this time I thought the physicists were *eating* our rugs, and really they were *stealing* them, so as to make their own floors all cozy and warm, leaving our floors as barren and austere as our subject!

Very clever, physicists. You win this round.

You should try one, they’re tasty.

— a Computer Scientist

I never thought about it this way, but it makes complete sense. As a student, I arrived at proofs by exploring and playing around with things. It was the exploration that helped me understand things more than the proof itself.

Isn’t it odd how becoming a teacher completely changes the way in which we are critical about our own teachers?

For example, I was in lecture that was being teleconferenced and noticed the teacher do something that would have been completely off camera to the other room. I felt compelled to inform him of this and show him how to fix this and do it in a way that was visible on camera. I’m not sure my advice was well received. XD

I completely agree about the role of proofs in learning, and I’m glad you were able to put in words so well.

My own personal intuition for the sum of squares fact is to stack the squares in a pyramid shape, then recall the formula for the volume of a pyramid! (Although why the volume of a cone-like shape is 1/3Bh probably requires it’s own intuition…)

Ooh, I like that!

Stacking pyramids is the way to go! Here is a 52-second video where I show the 4-layer case:

https://youtu.be/Q1LjDNHaXr8

Hope you enjoy!

The most beautiful proof I saw of the volume of a pyramid was in Lockhart’s Lament: Consider a cube, and connect each vertex to the centroid. You get 6 neat little pyramids whose volume is 1/6 the cube.

Using scaling and Cavalieri’s principle gets you the general rectangular pyramid and then arbitrary bases.

Three pyramids? But either way, yes, that’s lovely! I can’t remember if I saw it from Lockhart first or somewhere else, but either way I worked it into my book on calculus.

I took a fair amount of manifold theory/algebraic topology/differential topology classes in college, and one was by a physicist, and it was by far and away the most clear class. He really understood the whole gestalt of it all and could explain things.

Are you familiar with this paper? You might enjoy it!

When the Problem Is Not the Question and the Solution Is Not the Answer: Mathematical

Knowing and Teaching

Author(s): Magdalene Lampert

https://mathed.byu.edu/kleatham/Classes/Fall2010/MthEd590Library.enlp/MthEd590Library.Data/PDF/Lampert%20(1990-3702203392/Lampert%20(1990.pdf /// https://www.jstor.org/stable/1163068?origin=JSTOR-pdf

Specifically as Lampert quotes Lakatos (1976):

“The zig-zag of discovery cannot be discerned in the end product” (Lakatos, 1976, p. 42). The product of mathematical activity might be justified with a deductive proof, but the product does not represent the process of coming to know.”

It’s easy to give a much better argument for the first-order term of the formula. Here it is: (x+1)^3-x^3 = 3 x^2 + …

Proofs, along with their explanations, are pretty central to understanding advanced courses. I don’t understand how proofs and explanations being different supports the argument that courses should forgo proofs as a centerpiece. It just means they should motivate and explain proofs better.

Perhaps, rather than ‘proof’, ‘derivation’?

Maybe, but I actually think a lot of illuminating explanations don’t really qualify as derivations, either.

Here’s one: how to explain the chain rule to calculus students?

My favorite approach is something like, “Imagine quantity supplied (Q) is a function of price (P) which is in turn a function of time (t). What’s dQ/dt?”

Well, it’ll be dQ/DP (units per dollar) * dP/dt (dollars per day) to give dQ/dt (units per day). Just by getting the units to work out, we can see how the chain rule needs to operate.

I always found the Chain Rule pretty intuitive, but the Larson & Hostetler Calc textbook has a good diagram involving gear ratios to illustrate it. Something like, if Gear A has a ratio of 2-to-1 and Gear B has a ratio of 3-to-1, then their joint ratio is 6-to-1. Or something like that.

This separation between proof and understanding has a lot in common with the paper “Mathematics, morally”, by Eugenia Cheng (http://eugeniacheng.com/wp-content/uploads/2017/02/cheng-morality.pdf). I suspect you’ll find it interesting.

Thanks – looking forward to checking it out!

This tallies with something I was thinking recently: I saw a YouTube video called “why every proof you’ve seen that 0.999… = 1 is wrong and mine is right” and perhaps predictably it was arguing that the classic proofs assume a lot of things about decimal expansions that are definitionally equivalent to saying 0.999… = 1, and that really we should be proving that decimal expansions work how we say they do and taking a step back into something more abstract. And like, fine, yes, mathematically that’s what you should do. But mathematicians aren’t the ones arguing that 0.999… < 1, and the people who are arguing that already agree that 0.333… = 1/3 and that 0.999… * 10 = 9.999… and so on, so assuming them is fine. The goal isn't to convince a sceptical mathematical journal, it's to persuade a layperson who's made a mistake. It's not enough to prove the opposite of what they believe, you have to show them why their belief is flawed. Otherwise they'll just have two competing beliefs that both seem convincing and will probably default to the one they held before.

To be fair, I think “Why Everyone is Wrong and I Am Right” makes an excellent title, but I’m totally with you.

(I have my best success when I try to tease out, Socratic-style, what people think “0.999…” means; often they’re implicitly picturing some very distant end to the sequence of 9’s. But it’s been a while since the last time I gave it a shot.)

yes but *they’re* wrong and *i’m* right and in *my* video i will—

I think the line between [proof] “it’s the body of shared mathematical knowledge . nary a loose brick”

and [explanation], “the pre-existing structure is my personal knowledge about the world.” is pretty much wafer thin. Many authors share an understanding with their readers in a proof which detracts from objective precision. Besides that, many teachers hide behind lifeless explanations, often just mouthing the words of a proof, perhaps not really understanding it themselves, surprisingly often wrong (look for the words “trivially” or “clearly” — that is often where the author is glossing over stuff they half-understand). Many of the great mathematicians provide explanations as they prove things [read, say, Terry Tao’s blog], or perhaps with a brilliantly chosen example beforehand that unambiguously highlights the key ideas in the forthcoming proof [read Gauss]. Being unimaginative is just that, not a necessary feature of a formal proof.

Thanks for reading and posting!

I suspect the distinction I’m drawing (proof vs. explanation) tends to vanish as one pursues a career in research mathematics. I find it more relevant for thinking about undergraduate education, or professional education in adjacent disciplines (data analytics, economics, etc.)

One (imperfect) analogy I like: proof is to mathematics what the scientific method is to science. It’s the basic methodology of the discipline, and undergraduates should learn it for that reason. But it’s not obvious that the disciplinary methodology will help undergraduates understand any particular piece of content. I learned about the Krebs cycle and Newtonian mechanics without any explanation of the experiments that established their validity. It might have been interesting to learn, but probably not the best use of finite lesson time.

I try to argue that proof and explanation to be the same. In your example, you showed us two example *proofs* of different statements – the first statement was (1+2^2+…+n^2) = n(n+1)(2n+1)/6 and the second statement was (1+2^2+…+n^2) ~ n^3/3. In the second proof there is a gap, when you say that the average element of the sequence of squares is n^2/3 – and for me, intuitively, there is also a gap – I don’t understand why average element is equal to n^2/3! So it doesn’t explain (doesn’t prove) it.

Maybe explanation = proof sketch?

Yeah, that’s a fair complaint! I have this instinct that an explanation is *not* just a proof sketch. But I probably need a better example.

How about this: explaining the chain rule in calculus by just referring to the units.

E.g., suppose we’re inflating a balloon, and want to know how the radius is changing. We know dV/dt. We calculate dr/dV by inverting the formula for the volume of a sphere. Then we get dr/dt by just multiplying dr/dV (cm per cm^3) by dV/dt (cm^3 per second).

Done properly on a board, this feels like an “explanation” to me – it shows why the chain rule needs to be what it is – but I don’t think it can be cashed out into a proof.

Love this, Ben. Thank you. Our use of proof with students should be judicious, with our central focus being on student “sense-making.” We need to be explicit that the proof is what comes after the writer has figured out what’s going on. Mostly in the classroom we should ask why it makes sense that a theorem would be true and then move on to using what it tells us to ask some new questions and figure out some new things for ourselves. Introducing some elegant proofs, having students work from time to time on writing some of their own, and teaching them to focus on hypothesis and conclusion in any theorem they encounter helps them to understand that “proof is the the architecture of mathematics,” and is that which distinguishes those of us who call ourselves mathematics from “the wild animals (read: physicists).

I am loving the initial exchange in the comments around the physicist paragraph, which is a truly fine piece of writing, and was my favorite part of the essay.

I’ve been a high school math teacher for 30+ years. My father was a physics professor and I was a physics major. Even way back in college, I said I became a physics major for the math. I wasn’t all that that interested in the science, but I’d loved almost everything mathematical I’d ever done until my sophomore year of college when I found myself spending virtually all my time in math classes trying to parse (or, as is often turned out, memorize) proofs. The math I got to do in physics was much more fun. Throughout my schooling my father rarely answered my questions directly, wanting to ensure I was sense-making for myself. I do, however, remember him making an exception when I asked him why it was “wrong” to multiply both sides of an equation by dx. He answered with a scoff and something like, “That’s just those unreasonable mathematicians. Multiply by dx.”

A function is continuous if you can make the graph without lifting your pencil. Not bad, entirely intuitive.

A function is continuous if and everywhere the function is defined the function equals its limit. Great, but what is this limit that you talk about?

For any epsilon greater than zero, there exists a delta greater than zero such that… Two Greek letters a backward E and and upside down A! ack!

Calculus did just fine in the 150 years that separate Newton and Leibniz from Cauchy and Reimann. Yes, there were a few controversies, but I think you can get through 2 years of calculus before you really need to get nitty gritty of the theoretical underpinnings. If you are into that stuff, after you have a good feel for calculus, then go on and take Real Analysis, and shore up that foundation. For classes beyond this: Motivation, definition, proof, example, format of lectures worked fine for me.

I will say that writing proofs did good things to establish my own knowledge, or lack there of. If I can prove it — not just repeating the book’s proof, but assemble my own argument, I must have a good sense of the underlying ideas.

Regarding the series 1^2 + 2^2 + …., I liked this explanation.

Create a triangle like the one below.

1

2 2

3 3 3

The sum of the entries of the kth row is k^2. Clearly you can extend this to n rows. Make two more copies one rotated 120 degrees clockwise, one rotate 120 degrees counter-clockwise. When we add the three triangles together we will get triangle with entries 2n+1 at every location.

I will also note, that I have met a few people who are entirely distrustful of intuition. Even arguing against intuition — It more likely to lead you astray than lead you to the correct path. I am not one of these people, but they definitely do exist.

I just stumbled over a website, that tries to gixe explanation not proof just like you ask for: https://betterexplained.com/articles/developing-your-intuition-for-math/

One of my favorite math quotes is “never attempt to prove something which isn’t almost obvious”. It’s a paraphrase of Grothendieck. If, after a grueling effort, I’m able to make it through a proof of a result I have no intuitive feel for then 9 times out of 10 I forget all the details by the next day, including the theorem statement 😅

I’m enjoying the discussion about sums of squares. I never had a good geometric understanding of this result. Here’s an algebraic approach I like. The sum of n squares is a cubic polynomial in n. Let’s call it P(n). Here are 4 things we immediately know about it (or any power sum)

1) A sum of 0 terms is 0, so P(0) = 0 and n is a factor of P(n)

2) In a sum of n terms, subtracting the n’th term gets you the sum of n-1 terms. Hence, P(n) – n^2 = P(n-1). In particular, P(0) – 0^2 = P(-1) and P(-1) = 0. Hence (n+1) is a factor of P(n)

3) P(1) = 1 so the sum of the coefficients of P(n) is 1

4) The coefficient of n^3 is 1/3 for the reason in the OP. This removes the last degree of freedom and lets you work out the formula.

There’s another function of seeing (and writing and discussing and improving on) proofs in a math course — namely, being able read an abstract proof and translate that into understanding via some process of “what is this really saying, let me fiddle with some examples” is an essential skill in any kind of advanced math. Good proofs based courses build this skill – although it’s best done by working a bunch of motivating examples and challenging problems alongside the proofs part. I’m easily swayed by a good heuristic argument, too easily though, and where real learning happens is when I have to take something that I can check is pretty line-by-line rigorous and ask myself what the deeper reason behind this is. Sometimes that leads to a new discovery!

But take this criticism as a compliment, I actually like this perspective enough that I may share your post as a discussion topic in my classes. 🙂

I see your point on the desire to skip proofs sometimes, but I kinda feel like your delivery was kinda out of it and would gladly read a second take on it if you want to write more or expand on it!

For example I am not so sure about the meaning of teaching proofs in a ML class in the first place. Showing the idea of gradient descent is one thing, but unless the proof is a constructive way to get the solution with an explicit error estimate it is problably meaningless.

Your example for the problem/useless proof feels weak because there are easier proofs of that formula and the only reason we use induction there is because that is a nice example of where one can possible use induction and thus understand how to use it and what are its implications.

Mathematics is a nice place where one doesn’t take anything for granted and prove their way through all the problems. Well except axioms of course, but logic is a different universe.

The point is that your explanation should give me motivation to believe that the statement is true without proving it, and that would be a nice thing. In reality you state more facts: “the average is close to one third” and the resoning “because the 7 numbers out of 10 are below half 100/2” doesn’t kinda cut it. Even if it did, why should I be happy knowing that $n^2/3$ is a rough estimate? Why would that explain anything? At that point one just gives the formula without any fuss and get it over with no?

A russian professor called Burenkov wrote a book on Sobolev spaces where before each proof he wrote “idea of the proof” and explained what was happening there. I feel that that is the direction we are supposed to be going to.

At the end of the day if we were to remove all the proofs from a book we would just have readers that say “it is so because my book said so.”.

That is why I would like to see you expand on it a bit more!

I didn’t understand the OP as wanting to “skip proofs” so much as shift emphasis away from them. Sometimes a good proof *is* also an excellent explanation, so works fine for teaching: but it’s not unheard of for a proof to leave the student with no clue *why* the result is true (or even relevant), and in those cases (at least) it’s best to explore the subject and look at examples and counter-examples to reasonable-sounding conjectures, so as to give a sense of why a stronger claim than the theorem you’re about to prove isn’t true, or a weaker claim might seem to be as much as you can hope for, that the proof miraculously shows you can do better than. It’s as important to *motivate* the definitions and proofs of mathematics as to formally state them.

There is a time and place for formal definitions and proofs; but it is often *after* the student has got a reasonable sense of what’s going on and why it should work.

Can I get an example motivating your statement?

One common kind of proof in the ML class I took was demonstrating that, in trying to estimate a particular parameter or distribution, we had arrived at the MLE. Generally such proofs involved a lot of notational machinery (e.g., you need to start by formulating what a generic estimate looks like) and not much insight (in context, our solution was often the obvious move, such as assuming that the population mean equals the sample mean).

It seems to me that such proofs serve a purpose, but it’s the opposite of the purpose generally assumed: rather than helping to demonstrate that our solution is correct, they help to demonstrate that our proof framework is trustworthy. Which is great if you’re interested in the epistemology of ML, but not so great if you’re driven by an object-level interest in ML itself.

Proof by induction is similar, I think. If you’re looking to rigorously prove formulas about sums of integers, then induction is often a quick and efficient way to do it. But it’s not explanatory, in the psychological sense; it doesn’t give you much insight into the formulas themselves. Which was half of my point: a good proof can fail as explanation.

But as you point out, I failed in the other half of my example. I wanted to show how a good explanation can fail as a proof, but my handwaving about averages wasn’t a great explanation, so all I really showed was that a bad explanation can fail as a proof – not a very surprising result!

Perhaps it’s generally true that good explanations can be cashed out as proofs. If that’s the case, then I suppose my position becomes, “In some pedagogical contexts, it is okay to give an explanation that could in theory be developed into a proof, but not actually bother with the proof itself.”

Typo: “roughly n3/3” (should be squared, not cubed).

That is, “roughly n³/3”. (Apparently comments strip out the tag.)

The tag. (Why is there neither a preview nor edit function. Why this.)