Andrej Karpathy argues we are not building animals. We are summoning ghosts. Large language models are not embodied, biological minds that learned to survive in a physical world. They are vast statistical systems. Impressive, yes, but they lack the grounded, persistent, evolving cognition of a living mind. That difference shapes how we think about progress, safety, and what comes next.
The Premise
Animals evolved over billions of years to navigate a physical world. LLMs were trained on text. The “squirrel test”—if you can make a squirrel, you are most of the way to AGI—shows that embodied, adaptive intelligence is the hard part. And we may have skipped it entirely.
A working demo is not a product. In safety-critical domains like self-driving or production software, the cost of failure is high. Getting from 90% reliability to 99.999% is a “march of nines,” and each nine takes constant effort.
The Mechanics
Why does the gap between demo and product stay so wide? Why are LLMs still ghosts despite their surface-level brilliance?
March of the nines: Every extra nine of reliability requires constant effort. A self-driving demo from 1986 or a perfect Waymo ride in 2014 did not mean the problem was solved. The same applies to AI writing production code. One catastrophic bug every seven years is still too many when you are generating code at machine speed.
No culture, no self-play: Biological intelligence thrives on culture—accumulated knowledge passed between generations—and on competition or self-play. LLMs currently have neither. They do not write books for each other, learn from shared scratchpads, or compete in environments that force rapid adaptation.
The niche problem: Intelligence arises when an environment rewards marginal increases in cognition and provides the tools to scale it. Humans had hands, fire, and social structures. Birds and dolphins hit physical limits. LLMs sit in a strange niche where they have vast memory but no embodied feedback loop.
The Execution
What does a healthier future look like? Karpathy’s answer is education—not AI tutoring slop, but rigorous, human-centered knowledge ramps.
Eureka / Starfleet Academy: Build elite, up-to-date institutions for technical knowledge. Start with state-of-the-art courses like LLM101N that prioritize “eurekas per second”—moments of genuine understanding.
Ramps, not cliffs: Good education is a technical problem of building ramps to knowledge. Material should never be too easy or too hard. The goal is to make the learner the only bottleneck.
AI as a collaborator, not a replacement: Current LLMs are useful for building course materials faster, but they cannot yet replace the creative design of knowledge. Use AI for the boring stuff, and keep humans in the loop for designing the intellectual scaffolding.
Post-AGI education as gym culture: Just as people go to the gym for fun and self-betterment even though machines do the heavy lifting, future education will be about human flourishing. If learning becomes trivial and enjoyable, people will pursue it for the same reasons they pursue a six-pack—attractiveness, health, and intrinsic satisfaction.