Solving Agent Laziness: MCP, Skills, and Behavioral Engineering

The debate over whether Model Context Protocol (MCP) or “skills” are the superior architecture has largely quieted. The industry has reached a pragmatic consensus: they are distinct tools for distinct jobs. MCP provides the interface for agents to interact with external systems, while skills provide the behavioral guardrails.

The real engineering friction has shifted. It is no longer about choosing one over the other, but navigating the implementation hurdles of integrating both. Pedro, an AI tooling engineer at Supabase, recently detailed the practical challenges of deploying these systems, highlighting a reality that many developers are currently hitting: agents are smart, but they are also profoundly lazy.

The Laziness Problem

The primary hurdle in agentic development is the model’s tendency to default to stale training data rather than fetching fresh, accurate information. Even when provided with the necessary tools to query documentation or external APIs, agents often exhibit a stubborn resistance to doing so.

Content hosted by YouTube

Content is not loaded until you have given consent.

Manage preferences

Watch on YouTube: https://youtube.com/watch?v=JT3OzDKrucU

Supabase’s internal testing confirmed this behavior. When tasked with performing complex SQL operations—specifically creating views on tables with Row Level Security (RLS) enabled—agents frequently bypassed security protocols. Without explicit, persistent guidance, the model ignored the requirement to set security_invoker = true, effectively exposing data that should have been restricted.

The solution wasn’t just “more context.” It was a shift in how that context is structured.

Strategic Placement: Why Reference Files Fail

Developers often offload instructions to bundled reference files to keep the main skill.md file clean. This is a mistake. Pedro’s experiments revealed that agents are highly selective about what they load into their context window.

If a task requires information from multiple reference files, the agent will almost certainly fail to load them all. Even a single reference file is frequently ignored if the agent deems it optional.

The engineering takeaway is binary:

Critical Guidance: If the agent cannot afford to miss a piece of information—such as a security checklist or a core product constraint—it must reside in the skill.md file.
Secondary Data: Only use reference files for auxiliary information that is non-essential to the agent’s core logic.

Security and Workflow Constraints

Being “opinionated” is a feature, not a bug. Developers must enforce specific workflows to ensure reliability. For Supabase, this meant forcing agents to perform direct DDL operations on development databases, running an advisor to catch security or performance pitfalls, and only generating migration files after the schema is validated.

By hardcoding this workflow into the skill, developers move from hoping the agent makes the right choice to ensuring it follows a proven path. This is the difference between an agent that “can” do a task and one that is “production-ready.”

Testing the Invisible

Perhaps the most significant shift in this space is the move toward evaluating documentation and behavioral prompts with the same rigor as code. Using evaluation frameworks like Braintrust, teams are now running CI-style tests on agent behavior.

Supabase’s testing across multiple models (Claude 3.5 Sonnet/Opus and GPT-4o variants) showed that the combination of MCP and structured skills consistently outperformed baseline models. The “test completeness score” improved not because the models got smarter, but because the guidance became inescapable.

The Path Forward

The industry is currently suffering from a lack of standardized distribution for these skills. While we have MCP as an open standard, skills remain fragmented, often tied to specific IDEs or vendor-specific registries.

The future of agentic development won’t be defined by the models themselves, but by the quality of the “instructional layer” we wrap around them. We are moving away from the era of “prompt engineering” and into an era of “behavioral engineering,” where the most successful products will be those that treat their documentation as a high-stakes, testable interface for the machines that consume it.

Sources

https://www.youtube.com/watch?v=JT3OzDKrucU