From Harpy to LLMs: The Evolution of AI Learning

In 1971, the U.S. government tasked researchers with a seemingly straightforward goal: build a machine that could recognize 1,000 words with 90% accuracy. Five years later, the Carnegie Mellon team delivered Harpy. It was a triumph of 1970s engineering, but it was also a monument to a philosophy that AI researchers would eventually spend decades trying to dismantle.

The Architecture of Painstaking Logic

Harpy didn’t “learn” language in the way we conceive of it today. It was a rigid, rule-based knowledge graph. The system relied on a vocabulary broken down into 98 basic phonetic units, or “phones.” These were mapped into a massive graph containing over 14,000 nodes.

To navigate this, Harpy used a formal grammar defined by human linguists. It couldn’t handle arbitrary speech; it only understood valid pathways through its pre-defined structure. If you said “tell me about China,” the system chopped your audio waveform into blocks, compared the frequency content against its phone nodes, and traversed the graph. To avoid the trap of “greedy” searching—where the system picks the best local match and gets stuck—engineers implemented beam search, allowing the system to track multiple potential paths simultaneously and prune the weak ones.

It was a system of extreme manual labor. Linguists had to account for “junctures”—the subtle ways sounds bleed into each other, like the extra “Y” sound between vowels or the dropped “T” in “about China.” Every rule was hand-coded. It worked, but it was brittle. Scaling it to 20,000 words wasn’t just a compute problem; it was a management nightmare.

The Bitter Lesson: From Rules to Probabilities

By the late 1980s, the field abandoned Harpy’s manual knowledge graphs in favor of Hidden Markov Models (HMMs). The transition was controversial. HMMs replaced hand-coded linguistic rules with probabilistic edges learned directly from data.

Content hosted by YouTube

Content is not loaded until you have given consent.

Manage preferences

Watch on YouTube: https://youtube.com/watch?v=2hcsmtkSzIw

This shift validated what computer scientist Richard Sutton would later codify as “The Bitter Lesson.” Sutton’s thesis is simple: general methods that leverage massive computation and data always outperform human-designed heuristics in the long run. Building human knowledge into a system provides a short-term boost, but it eventually becomes an anchor, preventing the model from discovering patterns that human designers didn’t—or couldn’t—anticipate.

The Paradox of Modern Scaling

For years, we viewed Large Language Models (LLMs) as the ultimate vindication of the Bitter Lesson. By training on the entire internet, we seemed to have bypassed the need for human-encoded rules.

But a strange irony has emerged. In recent discussions, Sutton has suggested that LLMs might actually be a negative example of his own lesson. Because LLMs are trained on human-generated text, they are essentially massive mirrors of human knowledge. If we are training models to imitate human output, we are essentially building a new, more sophisticated version of Harpy—one that is still constrained by the limits of human thought.

The current frontier, exemplified by DeepMind’s AlphaGo and AlphaProof, suggests the next evolution isn’t just more data, but more experience. By moving from supervised learning (imitation) to reinforcement learning (interaction), these systems discover strategies—like Go moves that seem “alien”—that no human expert would have thought to teach them.

The engineering challenge of the 1970s was how to force a computer to follow our rules. The challenge of the 2020s is figuring out how to stop forcing them to be like us, and instead, how to build systems that can discover what we haven’t yet found. We are currently in the “imitation” phase of the AI era; the real test will be whether we can build architectures that learn from the world itself, rather than just the text we’ve left behind.

Sources

Beyond the AI Magic The Engineering Reality of LLMs

The Architecture of Painstaking Logic

The Bitter Lesson: From Rules to Probabilities

The Paradox of Modern Scaling

Sources

Related Notes