Why Baseball Statistics Matter

Last year, an interviewer asked me why baseball statistics matter.

First thought: They don’t. I just love ’em.

Second thought: That’s not a very good answer, Ben.

Third thought, this one aloud: Mumble mumble. Words?

2018.8.27 love about baseball.jpg

Now, having had a few months to indulge in l’espirit de l’escalier, I’ve got a tentative answer. It goes like this:

The field of data analytics is conquering the world. Our emotions, our behaviors, our precious bodily fluids – all are becoming subject to statistical analysis (and thence to algorithmic manipulation). It’s cool. It’s scary. And it raises questions.

Questions like: “What might our numbers leave out?”

Questions like: “Do the data confirm old wisdom, or upend it?”

Questions like: “Will experts become obsolete in an era of all-powerful, all-purpose machine learning?”

Questions like: “Do all these numbers dull the poetry of human life? Do they turn the fertile jungle of experience into something cold and gray, like lunar soil?”

And here’s why baseball matters: Because it entered the data analytics era two decades ahead of everybody else. The sport has spent 20 years negotiating, compromising, learning. It models for us the pitfalls and possibilities of statistics.

Baseball shows us how early adopters will grab low-hanging fruit.

It shows us how jealous rivals will blunder along, uncomprehending, in their wake.

It shows us how rich institutions will learn to leverage their resources, perhaps exacerbating inequality.

It shows us how old-school experts will prove wrong about a lot, and right about a lot.

It shows us how the best organizations achieve a synthesis of organic wisdom and statistical analysis, an alloy stronger than either approach alone.

In short, why does baseball matter? Because it shows us what the data analytics era will become.

8 thoughts on “Why Baseball Statistics Matter

  1. Looking at the numbers, it looks the the number of immaculate innings has increased in recent years. It has happened 25 times in the last 5 years, and 65 times in the previous 125. We have statistic that suggests we are witnessing a fundamental change in the approach to pitching (or hitting?). Or, it could be a a result of more detailed accounting of pitch counts (i.e. these innings were not well documented before the SABERmetric era.)

    Which do you think is more impressive. An “immaculate inning” or a 3 pitch inning?

    Many of the famous benchmarks follow closely what one would expect if they were generated by random chance.
    i.e. the number of no hitters is close to what we would expect if every batter that comes to the plate has a 0.260 chance to get a hit.

  2. In 1958 William Phillips published a paper that described an empirical relationship between wage inflation and unemployment. This relationship is called the “Phillips curve.”

    It was a hugely influential discovery in the field of economics and more particularly, central banking. It is also wrong. Well, Phillips wasn’t wrong, it was interpreted incorrectly. Central banks reasoned that it should be possible to lower unemployment if central banks targeted a higher rate of inflation. It didn’t work. Inflation rose and unemployment rose simultaneously. In an effort to save the Phillips curve methodology, economists said that the Phillips relation holds in the short-term but not in the long-term. Central banks still use it, but the empirical data post Phillips, suggests that there is not even a short-run correlation between unemployment and inflation.

    You could also say that the central banks were caught by the fallacy, “correlation does not imply causation.” But, there are some good reasons why there should be a causal relationship. I would instead chalk it up to Goodhart’s law. “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”
    Goodhart’s law has some unfortunate implications for data science.

    How does this relate to baseball. In the post-Moneball era, OBP was the statistic to target. In the time since, it seems that the value of OBP has been over-estimated.

Leave a Reply to gregsdennisCancel reply