AI Agent Context Engineering: Beyond the Context Trap

Arise’s AI agent Alex was eating itself alive.

The team built Alex using Alex—a recursive strategy that seemed clever until the context window exploded. Every time Alex ran on the company’s trace and span data, the spans grew. Too much data. Context limit hit. Alex failed. The span retained that failure data. The team added more context to fix it. Alex failed again. The loop tightened.

“We knew that we needed to come up with some kind of strategy,” said Salian, head of product at Arise, during a recent talk. “The system analyzing the data was constrained by the data. That was a major problem.”

This isn’t unique to Arise. Every team building AI agents hits this wall eventually. The context grows until the model chokes, and naive fixes make it worse.

Content hosted by YouTube

Content is not loaded until you have given consent.

Manage preferences

Watch on YouTube: https://youtube.com/watch?v=esY99nYXxR4

The obvious solutions that failed

Arise’s first instinct: simple truncation. Take the first 100 characters, drop the rest.

It worked until it didn’t. Simple queries passed. But follow-ups broke. Ask Alex what the most common inputs were, then ask about input B specifically—it had no idea what you meant. The agent forgot everything between messages. Over-truncation destroyed reasoning.

Then came summarization. The LLM is good at summarizing, right? Compress everything into fewer tokens and send that.

Also failed. Too inconsistent. No control over what mattered. The model decided on its own what to keep and what to discard. Results became unreliable.

The solution: smart truncation + memory

The winning approach combined three elements:

Head + tail: Keep the first 100 characters and the last 100 characters
Memory store: Compress and store the middle section separately
Agent control: Alex can retrieve from memory when needed

Duplicate messages get collapsed. Tool calls—often the longest part of context—keep only the latest result. The system prompt never resets.

“We haven’t had to touch this in a few months,” Salian noted. “We found this combination really successful.”

The key insight: context decides what the model sees, memory decides what survives. They’re separate problems requiring separate solutions.

The sub-agent breakthrough

Even with smart truncation, long sessions broke things. Users don’t restart chats. They keep asking follow-ups across pages. Conversations grew to 20+ turns, and Alex started forgetting critical details late in the session.

Arise’s fix: distribute the work. Not all context belongs in the main agent.

For search tasks involving hundreds of spans, they offloaded heavy data operations to sub-agents. The main conversation stays light. Sub-agents handle the heavy lifting, then pass results back. This pattern—breaking context across multiple agents—became a “game changer,” according to Salian.

What still breaks

Despite progress, three problems persist:

Huge context still wins. Very large prompts or inputs hit provider limits. The pattern keeps returning to sub-agents—breaking problems into smaller context chunks.

Long-term memory doesn’t exist. Alex has context-window memory, not persistent memory. Start a new chat, and it forgets everything from previous sessions. Users want to reference issues discussed days ago. Arise is actively building this.

Context selection is still heuristic. The 100-character head/tail approach works, but there’s no principled budget or clear metrics for context quality. They use evals to measure whether context was right after the fact—not during selection.

The bigger shift

The lesson Salian emphasized: agents don’t fail because of prompts. They fail because of context.

“Agents don’t fail because of prompts, they fail because of context. In the early days, prompts were everything. But ourselves and all our users are really focused now on context engineering.”

The stack changed. Prompt optimization once ruled. Now context engineering determines success or failure. The best context strategy lets agents remember what they need and forget what they don’t.

Arise learned this the hard way. The recursive loop that nearly broke Alex became the foundation of their entire approach. Context management is iterative, evaluation-driven, and nowhere near solved.

Sources

https://www.youtube.com/watch?v=esY99nYXxR4