The Professor with a Billion Students


This September in Germany, between talks at the Heidelberg Laureate Forum, I managed to catch a few minutes with Cornell professor John Hopcroft.

He’s a guy with bigger things on his mind.

“I’m at a stage in my life,” he says, “where I’d like to do something which makes the world better for a large number of people.”

Skimming Hopcroft’s C.V., you start to wonder: Um… hasn’t he done that already?


Born to a janitor and a bookkeeper, he grew up to become a foundational figure in computer science. Exhibit A: His textbooks on automata, algorithms, and discrete math have been adopted across the world. (His most recent one—on data science—is free online.) Exhibit B: He has a distinguished research record, highlighted in 1986 with a Turing Award— the closest thing to a Nobel for computer science. And finally, Exhibit C: During a decorated teaching career, he was twice named Cornell’s “most inspiring” professor.

With all this, you’ve got to figure he’s done at least a little good for a few people, right?

Well, Hopcroft has a larger number in mind: 1.3 billion.


Hopcroft has become an advisor to Li Keqiang, the Premier of China. He describes this as “the opportunity of a lifetime”: to transform Chinese education for the better.

“They have one quarter of the world’s talent,” Hopcroft says, “but their university educational system is really very poor.”

What makes Hopcroft—working-class Seattle-ite turned Ivy League professor—think he can leave his mark on a country as vast, distant, and internally diverse as China? Isn’t this like a swimmer trying to steer an aircraft carrier?


“A couple of things are going in my favor,” he says. First, he is apolitical. “I don’t have any special agenda to push in China,” Hopcroft explains. “I’m pushing education.”

The second is subtler, and carries echoes of Hopcroft’s engineering background.

“I understand the scale of the problem,” Hopcroft says.

Continue reading

Sometimes the Noise is Signals, Too

a dispatch from the fourth annual Heidelberg Laureate Forum

Early in his talk, computer scientist John Hopcroft noted a funny fact about clustering algorithms: they work better on synthetic data than real data. But this is more than an odd tidbit about software.

It’s an insight into the nature of our world.

When we invent our own synthetic data, we try to mimic real data by mixing true information with random distraction–combining “signal” with “noise.” But in real data, the divide isn’t so clear. What often looks like noise turns out to be the deep structure we haven’t grasped yet.


The noise is just signals you can’t yet hear.


Hopcroft’s insight: data doesn’t just have one structure. It has many. If I scanned notebooks from a hundred people, and made a database of all the individual letters, I could sort them lots of ways. Alphabetically. Capital/lowercase. Size. Darkness. Handwriting. Each of these is a different layer of structure.

And to understand data–and the world–you’ve got to reckon with all those layers.

Continue reading