What is Grokking? The Geometry of LLM Intelligence

The industry has a habit of treating large language models as black boxes, invoking “emergence” whenever a model performs a task we didn’t explicitly code for. But beneath the veneer of human-like text generation lies a cold, mathematical reality. The phenomenon of “grokking”—where a model suddenly shifts from memorizing training data to generalizing via algorithmic insight—is the most compelling evidence we have that these systems are actually learning, not just regurgitating.

The Anatomy of Grokking

In 2021, researchers at OpenAI stumbled onto a bizarre training dynamic: small transformers tasked with modular arithmetic would initially memorize the training set, plateau in performance, and then, after thousands of additional steps, suddenly “grok” the underlying logic.

Content hosted by YouTube

Content is not loaded until you have given consent.

Manage preferences

Watch on YouTube: https://youtube.com/watch?v=D8GOeCFFby4

Mechanistic interpretability has since demystified this. By analyzing a single-layer transformer, researchers found that the model doesn’t just store lookup tables. Instead, it learns to map inputs into a frequency domain. It computes the sines and cosines of the input values, effectively creating a trigonometric representation of the problem.

Through the use of sum-of-angles identities, the model’s neurons perform a mathematical transformation that allows it to solve modular addition—a task that, to the model, is just a series of geometric rotations. The “plateau” in training wasn’t a stall; it was a period of intense internal reorganization where the model was building the necessary trigonometric infrastructure to replace its brittle, memorized shortcuts with a robust, generalized solution.

Beyond Toy Models: The Haiku Manifold

While grokking provides a clean, laboratory-grade look at how models learn, the question remains: does this translate to the massive, multi-layered architectures we use in production?

Recent work by Anthropic on Claude 3.5 Haiku suggests that the answer is yes, albeit at a higher level of complexity. The team identified a six-dimensional manifold within Haiku’s activations that tracks character counts and line lengths. This isn’t just a random cluster of neurons; it is a structured, geometric representation of the model’s “state” as it writes.

When the model decides to insert a line break, it isn’t guessing. It is navigating this manifold, using “QK twists”—helix-like rotations in high-dimensional space—to calculate the distance to the end of a line. This confirms that even in full-scale models, the “intelligence” we observe is grounded in the formation of specific, interpretable geometric structures.

The Ghost in the Manifold

The gap between a single-layer modular arithmetic model and a frontier LLM is vast, but the trajectory of interpretability research is clear. We are moving from observing input-output behavior to mapping the internal geometry of thought.

The danger, however, is in our tendency to anthropomorphize these systems. When we see a model “reason,” we want to see a human-like spark. What we actually find are sines, cosines, and six-dimensional manifolds. These models are not building a version of human intelligence; they are building an alien, mathematical proxy for it.

As we continue to peel back the layers of these “ghosts,” we should expect the results to be less like human cognition and more like an exercise in high-dimensional engineering. We aren’t summoning intelligence; we are building complex, non-linear calculators that have learned to mimic the structure of our own logic. The challenge for the next decade isn’t just making these models bigger—it’s proving we can actually read the code they’ve written for themselves.

Sources

https://www.youtube.com/watch?v=D8GOeCFFby4&t=779s

Beyond the AI Magic The Engineering Reality of LLMs

The Anatomy of Grokking

Beyond Toy Models: The Haiku Manifold

The Ghost in the Manifold

Sources

Related Notes