For the past year, AI developers (researchers, engineering teams, or indie developers) have been racing toward multi-agent systems (MAS). The idea makes sense: just like human teams divide work, why not split AI tasks into specialized agents like “planning agent,” “execution agent,” and “validation agent”?
But in early 2026, a research paper from UBC and insights from Anthropic’s engineering team point to a different conclusion: Simplifying agent architecture with a rich “Agent Skills” library can actually boost AI performance.

This post is my learning notes from reading the paper “When Single-Agent with Skills Replace Multi-Agent Systems and When They Fail” and Anthropic’s engineering team presentation. I’ll walk you through what a “skilled agent system” looks like, why this new approach works, the actual performance gains, the limitations, and practical design recommendations from the researchers.
Anthropic’s Recommended Architecture: Single Agent + Skills Library#
Anthropic’s engineering team proposes a three-layer architecture. Think of it as an ecosystem built around one universal agent:

- Bottom layer: AI model (computational core)
- Middle layer: Universal Agent Runtime (the OS coordinating work)
- Top layer: Skills Library (specialized knowledge for specific tasks)
When you give the AI a task, the universal agent first searches the Skills catalog for relevant skills, connects to external systems through MCP (like databases or APIs), and executes the task following the workflow defined in the Skill. The AI model powers every action—both the agent’s coordination and the skill’s execution rely on it.
This is a single-agent system (SAS). Instead of spreading complex tasks across multiple separate AI agents, one universal agent handles management and coordination, delegating execution to Skills.
In this three-layer architecture, the top-layer Skills Library replaces what used to be a bunch of specialized “mini-agents.” Instead of hiring 10 people who each do one thing, you train one genius (the universal agent) and give them 10 expert manuals (Skills).

Let’s use a real scenario: enterprise financial analysis and reporting.
- Old approach: Multi-Agent System (MAS)
- Design “data retrieval agent” (connects to database) + “financial analysis agent” (calculates metrics) + “compliance reporting agent” (formats output)
- Agents need to exchange complex spreadsheet data and context between each other
- New approach: Single Agent + Skills
- MCP server connects to the database, providing data access
- Valuation methodology Skill teaches the AI how to calculate metrics like IRR
- Report formatting Skill defines the company’s standard output format
- One universal AI agent completes all steps in unified context. No cross-agent communication is needed
I know what you’re thinking: this “three-layer architecture” just turns 10 agents into 1 agent with 10 Skills. Sounds like six of one, half dozen of the other, right? The AI’s task complexity is similar, but this change brings huge engineering advantages that boost execution efficiency. So let’s talk about those technical benefits.
Terminology reference: MCP / Skill / MAS / SAS (click to expand ▼)
- Multi-Agent System (MAS): Tasks split across multiple specialized AI agents that need to communicate and coordinate with each other
- Single-Agent System (SAS): One universal AI agent handles all tasks by switching between different “skills” for specialized work
- Agent Skill: A file structure encapsulating specialized knowledge and workflows. Like giving the AI an “instruction manual” that teaches it how to execute specific tasks (see this Skills tutorial for details)
- MCP (Model Context Protocol): Standardized interface connecting AI to external systems (like Google Drive, databases), handling “connectivity”
(Further reading: Plain English explanation of MCP)
Why Simplified Agents Work Better#
1. Progressive Disclosure#
Traditional multi-agent systems or MCP tools dump all tool definitions into the context window at once when a conversation starts. This can eat up tens of thousands of tokens, causing context rot—the model gets overwhelmed by too much information and gets dumber.
The Skills mechanism uses progressive disclosure. Normally it only loads about 100 tokens of skill descriptions (name and brief intro). Only when the AI judges a specific Skill relevant to the task does it dynamically load the full instructions or code. This lets the AI (theoretically) carry unlimited specialized capabilities without the “heavy backpack” making it forgetful or losing intelligence. This is the main reason multiple Skills are more resource-efficient than multiple Agents.
(Further reading: What are Skills and progressive disclosure?)
2. Saves Tokens and Time#
The performance boost from Skills’ progressive disclosure is backed by research data.
According to UBC’s research, after rewriting multi-agent systems as “single agent + Skills” architecture, across tasks like math reasoning, code generation, and Q&A:
- Average 54% reduction in token consumption
- 50% lower latency
- API calls reduced from 3-4 times to just 1 time
- And here’s the thing: efficiency gains don’t sacrifice accuracy at all—accuracy stays the same or even improves
(See detailed data in the next section)
3. Code as the Universal Interface to the Digital World#

Remember how I mentioned Skills can contain code scripts in addition to prompts? Skills let AI agents communicate with diverse tools through code, which better solves agent stability and extensibility issues. This explains how a “single” universal agent can flexibly use skills without being rigid.
Anthropic emphasized in their talk: instead of designing complex UIs or scaffolding for each task, let the AI interact with external systems by “writing code.” Because LLMs are trained on tons of code, they’re really good at writing programs! Letting AI write code to call tools is more reliable and powerful than forcing it to follow strict JSON schemas.
Instead of designing specialized tool interfaces for every operation (search, compile, format, etc.), give the agent bash access. It can write code, execute it, check the output, and self-correct when errors occur. This lets the agent directly call existing Unix tools (like grep or ffmpeg) or third-party SDKs without developers hand-wrapping complex tool interfaces.
So Anthropic thinks developers should focus on designing deterministic tools (code scripts), while the model decides “when” and “how” to combine those tools. This lets agents operate computers like engineers, not like automated programs that can only press fixed buttons.
Will Skills replace MCP? (click to expand ▼)

Skills won’t replace MCP. They’re complementary, not substitutes.
- MCP handles “connectivity”: It’s the “universal adapter” between AI and the external world (Google Drive, Slack, databases), providing AI with context access
- Skills handle “specialized knowledge”: They’re the “instruction manuals” teaching the AI how to use that context, what workflow to follow, what format to output
MCP is like the hardware store shelves that let you grab tools; Skills are the experienced craftsman’s wisdom teaching you how to assemble things correctly.
Right now in the AI dev community, Skills replace “some” MCP functionality. When your MCP server has tons of tools, you can wrap them as Skills to optimize token efficiency using progressive disclosure. But for simple, predictable single tool calls, traditional function calling is still faster and more reliable.
(Further reading: Will MCP + AI Agent Make Phones Disappear in Five Years?)
Comparison: Multi-Agent vs Single Agent#
In 2024-2025, multi-agent systems (MAS) became the mainstream in AI development. The logic makes sense: human teams divide work to handle complex tasks, so designing multiple specialized AI agents (like “UI designer agent” + “backend engineer agent” + “test engineer agent”) should work similarly, right?
But in practice, people gradually noticed multi-agent systems exposed serious performance problems:
- Token consumption explosion: Agents need to repeatedly pass task descriptions, intermediate results, and context between each other, leading to massive redundant computation
- Cumulative latency: Each step requires a new API call, with wait times constantly adding up
- High communication costs: Agents exchange complex data structures using natural language, which is error-prone and inefficient
These MAS performance problems can all be significantly improved by switching to single agent (SAS) + Skills architecture. According to research data, after moving from multi-agent to single agent:
- Token consumption: Over 50% savings
- Latency: Nearly 50% reduction
- API call count: Reduced from multiple to just one

The New Architecture Isn’t Perfect: Limitations of Simplified Agent + Skills#
AI Gets “Choice Paralysis” Too: Skills Can’t Scale Infinitely!#
Research found that single AI agents have cognitive limits when choosing skills. When the Skills library stays below a certain threshold, AI selection accuracy remains stable. But once it exceeds the threshold (experiments show around 50-100 skills), skill selection accuracy takes a cliff dive.
Figure 2 from the research paper provides real experimental data. The chart shows actual measurements for the GPT-4o-mini model—as the x-axis skill count increases, the y-axis selection accuracy drops. We can simplify the conclusion: if you give Claude Code more than 50 Skills, the AI will easily pick the wrong skill!

The main cause of accuracy collapse is not just skill quantity. Semantic confusability is another problem, which happens when multiple skill descriptions (description) are too similar or broad.
If I asked you to search Amazon or eBay for “laptop,” you’d face 100 laptops that all look basically the same, and you’d fall into “choice paralysis,” right?
AI does too! If you give the AI multiple Skills with similar content and ask it to decide which Skill to pick, it’ll also fall into “choice paralysis” and grab the wrong tool. Imagine if your agent had “finance skill” + “money management skill” + “wealth building skill” with vague descriptions sitting in front of it… well, picking the wrong tool is understandable.
When NOT to Use Single Agent (SAS)#
Not all multi-agent systems can be simplified to single agent + Skills architecture. Some situations still work better with multiple agents (MAS). The key is whether you need “multiple independent contexts”:
- Need multiple agents to independently attempt multiple solutions: For example, you need to research several solutions simultaneously, then pick the best one
- Adversarial goals: Like debates or opponent analysis, requiring conflicting opinions from different perspectives
- Private state information: Agents need hidden states that can’t be fully shared, like data permission isolation between different company departments
Expert Recommendations: How to Successfully Use Simplified Agent + Skills Architecture#
To prevent single agents from collapsing as skill count increases, the research paper offers multiple AI design principles:
- Monitor skills library size: Continuously track Skill count to avoid exceeding the threshold (remember around 50 skills), or your selection accuracy will tank
- Avoid semantic confusion: In other words, write clear Skill
descriptions! Carefully review descriptions for overlap before adding new skills, prioritize merging similar skills rather than mindlessly stacking new ones, and avoid generic terms. For instance, change “process data” to more specific “calculate 7-day moving average” - Use multi-tier skill structure: When you genuinely need many skills, build a “category → specific skill” two-stage selection mechanism (Hierarchical Routing), ensuring each decision has fewer options than the threshold
- Choose models based on task complexity: More capable models (like Claude Opus 4.5) have higher skill capacity thresholds and better resistance to confusion. When you really need lots of skills at hand, choose a more powerful AI model
- Consider alternative architectures: Do you really, really need tons of Skills simultaneously? Then go back to considering multi-agent systems (MAS)!
Honestly, looking at these recommendations, they boil down to one sentence: try not to let one agent carry more than 50 Skills!

(Screenshot from the movie Shaolin Soccer)
Conclusion#
Multi-agent systems (MAS) and single-agent systems (SAS) + Skills don’t have absolute winners. MAS remains irreplaceable in scenarios requiring independent contexts like private state handling or adversarial goals, while SAS + Skills architecture shows impressive efficiency gains in most linear workflows.
Ultimately, you need to experiment yourself to know which architecture fits your application best.
But regardless of which architecture you choose, understanding what Agent Skills are and how to design high-quality Skills is required coursework for AI developers in 2026. If you’re not familiar with Skills yet, I recommend starting with this Skills tutorial.
References:
- Research paper: When Single-Agent with Skills Replace Multi-Agent Systems and When They Fail (Xiaoxiao Li, UBC, 2026 Jan.)
- Anthropic talk: Don’t Build Agents, Build Skills Instead
- Anthropic official post: Extending Claude’s capabilities with skills and MCP servers
- Anthropic official post: Equipping agents for the real world with Agent Skills
Hope this post helped you. I’ll keep sharing Claude Code news and trends regularly.

