We tend to think of crowds as stupid. “The idiocy of the masses.” “The thoughtlessness of the mob.” “The fact that everyone here is attending a Third Eye Blind concert.”
But it’s not always true.
I’m a fan of James Surowiecki’s 2004 book The Wisdom of Crowds. He opens with the anecdote of the 1906 Plymouth county fair, where 800 people estimated the weight of an ox. Their guesses were all over the place – some too high, some too low. They averaged out to 1,207 pounds.
The ox’s actual weight? 1,198 pounds.
What makes for wise crowds? Not everyone needs to be knowledgeable. In fact, it’s okay if nobody knows much at all. What’s crucial is that every person’s choice is independent of every other person’s. When calamity strikes – and by “calamity” I mean “stock market crashes” – is when everyone in the crowd is herding around the same strategy.
By that logic, Twitter is the last place you’d look for wise crowds. What with retweets, favorites, blue check marks, and visible follower counts, Twitter users are quite sensitive to status. Herding is the name of the game.
And yet, check this out:
Not bad, right?
This flies against my prejudices about Twitter users. Also, against longstanding results in psychology. Researchers have found, again and again, that people are terrible at acting randomly.
Ask people to generate a sequence of 100 random coin flips, and they fail. Their patterns look totally nonrandom. Folks act as if a “heads” on this flip slightly raises the odds of “tails” on the next one. That’s a fallacy. (To be precise, the gambler’s fallacy.).
Evolutionary psychologists suggest that we evolved to seek patterns wherever they lurk. If that means falling for spurious patterns, then so be it. Perhaps that’s why we are so fond of the false patterns of astrology, and why we’re all convinced that our music shuffle algorithms are operating according to some secret logic, even though they aren’t.
In short, we don’t have mental dice. Our thinking is decidedly nonrandom. But again, that raises the question: how can we explain results like this?
To be fair, Twitter won’t show you the results of a poll until after you vote (or the poll closes). You are thus forced to give an independent answer. Still, I would never have trusted the wisdom of crowds in cases like these. With everyone reading the same question. I’d expect one of two results:
- Twitter users systematically think of themselves as “special,” and thus will overselect the “unlikely” option.
- Twitter users will default towards the likelier option, and thus will overselect it.
It seems strange, bordering on miraculous, that people spontaneously pursue these two strategies in just about the right proportions! It’s as if we can design and then shuffle mental decks of cards, and then report the results.
While I’m at it, here’s a related result from an old throwaway account.
The outcome (50.4% to 49.6%) is spookily close to 50/50. If instead of having 345 strangers vote, you instead flipped 345 coins, you’d get a result this close to even only about 20% of the time.
Yet, in peculiar contrast, consider the failure of the crowds to achieve this far easier task:
Here, the information is public. You just need to check the current ratio. If it’s too low, retweet. If it’s too high, favorite. And yet, as of this writing, the tweet has 405 retweets and 153 favorites, for a dismal pi approximation of about 2.65.
Forget pi; that’s not even a good approximation of e!
Somehow, making the information public – which should in theory allow for perfect coordination – led to a worse outcome. This is perhaps a parable for how Twitter works in general. There really is a surprising amount of wisdom on that social network – as long as we’re forced to be independent.
But give us the chance to herd, and we will herd ourselves off the cliff.
I fill like the last example failed due to other factors.
1. people’s reluctance to share/retweet vs. a simple line.
2. people’s inability to generalize mathematical concepts, for me who is interested in math, who spends quite a bit of time learning and then utilizing math, I stare at the list of “examples” and I don’t find them very helpful at all. I’d know what to do if one of the numbers was exactly 7 but the other one wasn’t exactly 22, but I wouldn’t know what to do if both of them were correct, I’m honestly surprised that the numbers didn’t get to 22/7 and then just stop.
I think it’s a bit telling that the recent results are immediately after the example 355/113 but far from 103993/33102.
I can tell that both 417 and 153 need to be added to, but I couldn’t tell you without whipping out a calculator which number is in more need of an addition. Coupled with the fact that I’m reluctant to retweet means that by default I would probably “err” on the side of liking it, and moving on with my day.
I’d be interested to try producing a pi ratio with a poll.
and also try producing pi with giving more examples or guidelines than just a string of numbers.
The other problem with the last example lies in how Twitter surfaces the like/retweet stats to the user. Rather than go to the database each time the tweet appears in a user’s feed, it caches the calculations and shows THOSE to the users. Eventually the cache will “time out” and the numbers will have to be recalculated and recached. The issue is compounded by the fact that they actually have hundreds of these caches running on different servers globally, each with their own values, and each user is most likely to get the values from a server geographically near to them.
So while you may see 20/7 and so add a like, I might see 25/7, so I’d retweet. Neither of us really know how this truly affects the numbers. But having SOME insight sways our reaction.
Conversely, if 500 people see 20/7, they’d ALL add a like, severly skewing the numbers to 520/7.
In fact, you can remove the human factor by simulating this with multiple computers where each computer gets their own snapshot of the stats and reacts accordingly. The result will be similar.
So the problem is complex because
1, The technology be used isn’t real-time; instead it’s “eventually consistent.” This means that the correct data will eventually get into the database, but while it’s changing, the values you get can’t be considered exact.
2. The lack of randomization because the users have an insight (albeit marginally incorrect) into the data.
Here’s a really good video by Tom Scott specifically regarding the Twitter stats and how you can watch them change in weird ways: https://www.youtube.com/watch?v=RY_2gElt3SA.
That’s fascinating! I had no idea about the latency issue. That’s almost perfectly designed to make “hit this retweet-to-like ratio” impossible.
The roulette wheel would be nice to repeat with various colours to delve into colour bias.
Ben, are you familiar with Condorcet’s Jury Theorem? It’s one of my favorites.
Essentially, it speaks to the probability that a group of people will arrive at the correct decision using majority rule (by assumption, a correct decision exists). If each individual voter has a probability greater than .5, adding voters increases the probability that majority rule will arrive at the correct decision. If the individual probability is less than .50, then adding voters makes arriving at the correct decision less and less likely.
I didn’t know that one, and I like it! In one sense, it’s a simple consequence of the binomial theorem; in another sense, it’s a kind of profound model for collective decision-making.