Don’t freak out, but we’re surrounded by normal distributions.
They’re in our heights; our weights; our sampling means; our fever-dreams; our Galton Boards…
Every normal is a variation on the same bell-curved theme. Just specify two parameters—the mean, i.e., the center of the distribution, and the variance, which measures its breadth—and you’ve got a normal distribution. They’re one big clan, with a strong family resemblance.
But—for me, at least—this raises a question: Who is the matriarch of the family? Which normal distribution is the founding member, the Mitochondrial Eve, the universal common ancestor?
There’s an “official” answer, one that I’ve taught to students: The standard normal distribution has mean 0 and variance 1.
Why? Because zero and one are very simple numbers. (No offense meant, Zero and One! It’s a compliment.) Here, the “zero” ensures the distribution is symmetric around the y-axis (a very nice property), while the “one” ensures that the standard deviation (i.e., the square root of the variance) is also one. That’s cool, because a lot of our work with normals consists in counting standard deviations. It’s easy to count by ones.
But there’s a downside. This lovely, balanced graph can be described only by a lopsided, fraction-infested equation.
It’s an all-too-common experience in math. Simplifying over here winds up complicating things over there. It’s like there’s a bubble under the rug of math: you can push it elsewhere, but you can’t make it go away.
Of course, “standards” are arbitrary. Some countries drive on the left; some, on the right; and in some, drivers weave like Quidditch players in pursuit of invisible, multitudinous snitches. Now, in the case of the normal distribution, some mathematicians have pushed for other standards. For example, Carl Gauss preferred the distribution with variance ½.
Why? Well, presumably because the equation for the graph is a good bit simpler:
Stephen Stigler, who has written books on the history of statistics, proposes yet another definition. He wants to dub as “standard” the distribution with variance 1/2π, which has the advantage of an even simpler equation:
In all this, we’re playing with notation. Do you want your obnoxious bubble of extra symbols to pop up over here, in the variance, or over there, in the density function? It’s a frothy, fun, splashy game—but perhaps a bit kiddie. Symbols only symbolize. They aren’t the math itself.
Is there any true substance to the bubble-pushing game?
I am happy to report: Yes! It leads right to the heart of mathematics: the act of building from simple assumptions to impressive conclusions.
From Euclid’s day, mathematicians saw this as a start-to-finish process: assume first, conclude later. The choice of axioms was thus all-important. Am I making the right assumptions? What can I truly take for granted? Is Euclid’s fifth postulate really necessary, or can we prove it from the other four? If cogito, does it follow, ergo, that sum? You had to start with solid truth—or everything would crumble.
Our modern perspective, as Barry Mazur explains, is different:
Nowadays, with Hilbert’s formulation of formal systems, we understand that we can start anywhere, provided – of course – that we don’t end up with a contradiction. This shift of emphasis is curious: the ancients worried so much about beginnings, the moderns about endings.
These days, we care less where we start. I can assume A to prove B, or assume B to prove A. I can choose rich, information-dense definitions, and then work hard to verify their consistency; or I can choose lean, minimalist definitions, and work hard to deduce the desired properties. The ancients built their logical towers upward from the foundation, but the moderns feel free to extrapolate downwards from the top floor—even to flip the structure upside down.
An example (again from Mazur—I’ve been feasting on his essays): There are two ways to define the bisector of an angle.
- The line that divides an angle into two equal angles.
- The set of points equidistant from the angle’s two rays.
I find the first definition more natural: to “bisect” something is to split it in half. I like it when the definition matches the name (a “walkie talkie” is for walking while talking; a “pooper scooper” is for scooping of pooping; a “candelabra” is a magic spell for lighting candles…).
But the second definition has its merits, too. It names a property just as essential as the “split an angle in half” property. The equidistance idea underlies the ruler-and-compass construction of the angle bisector, and lets you explain in a jiffy why a triangle’s three angle bisectors always meet at a single point:
Both definitions work. It’s a question of where you want to push your bubble.
This goes way beyond triangles. Take the calculus of exponentials and logarithms. One approach to the logical development of these ideas, roughly mirroring the curricular sequence, would be this:
But a different axiomatic development works like this:
These two approaches arrive at the same mountaintop, but offer very different journeys. The first path starts flat and easy, but has a daunting elevation gain in the middle. The second path involves some tricky, technical climbing—but maintains a steadier level of difficulty.
What makes mathematics so cool—well, thing #467, anyway—is the perpetual process of bubble-smoothing. Some bubbles, with cleverness, can be smoothed away entirely. Others cannot; they are inescapable features of reality, permanent wrinkles in the rug of reason.
Who can help but love finding a knot in the fabric of logic?