The Model Context Protocol (MCP) has become the open standard for connecting AI agents to external data, but scalability has been a serious problem. Traditional integration required loading all tool definitions upfront. When you hit hundreds or thousands of tools, you get severe context bloat. Claude Code recently rolled out MCP Tool Search, marking a major shift from “static preloading” to “dynamic discovery.” This change aims to let AI agents seamlessly handle thousands of tools without stuffing tokens before every conversation even starts.
Why MCP Was (Once) a Token-Consuming Monster#
Before we get into solutions, we need to understand what MCP is and why it used to cause such a massive token problem.
MCP: The Universal Adapter for AI#
MCP (Model Context Protocol) is an open standard launched by Anthropic in November 2024. Think of it as a “USB-C port” for AI applications—a universal connection method that lets AI models safely and standardly access external real-time data and tools. If you’re not familiar with MCP yet, I recommend reading this article to get the basics.

(Image source: Norah Sakal)
The “Dictionary” Problem: Why Token Consumption Was So Bad#
Here’s a helpful analogy: Traditional MCP clients load all tool definitions (schemas) into context at startup, just like asking an AI to “memorize the entire dictionary” before starting work. Each MCP server comes with detailed tool definitions, parameters, and descriptions, and all this info takes up a ton of (context) space before the conversation even begins.
Sure, loading tool definitions is important for using MCP, but here’s the thing: not every task needs ALL those tools!
Real-world data (from Anthropic) shows how bad this problem gets:
- Connecting 5 common servers (like GitHub, Slack, etc., totaling 58 tools) can burn through 55,000 tokens before you even say a word
- In extreme cases, unoptimized tool definitions can eat up 134,000 tokens
- That’s 15-50% of your effective context window
Beyond space consumption, this creates serious side effects:
- Higher costs and latency: Every conversation has to transmit massive tool definitions
- Worse tool selection accuracy: When tool count exceeds 30-50, Claude’s ability to accurately choose the right tool drops significantly (Context Rot)
- Semantic confusion: Similarly named tools (like
send-uservssend-channel) easily get mixed up in huge lists
The Solution: How Tool Search Tool Solves MCP’s “Token Explosion”#
Anthropic launched several features to improve MCP execution efficiency in November 2025 (Advanced Tool Use), with Tool Search Tool being the core solution to the “dictionary problem.”
Core Concept: From “Memorizing the Dictionary” to “Looking It Up”#
Tool Search Tool’s core idea is simple: don’t make the AI memorize all tool definitions upfront—let it search dynamically when needed. Developers just mark tools with defer_loading: true, and those tools won’t be preloaded; instead, they go into a “to be discovered” directory.
When Claude needs a specific function (like sending a Slack message), it first searches for the relevant tool definition, and only then does the system load that tool’s detailed schema into context. It’s like going from “memorizing the dictionary” to “looking it up,” massively reducing the initial burden.

(Made by Wilson and Nano Banana Pro)
Two Search Variants: Regex and BM25#
Tool Search isn’t a single technique. It offers two main search variants that make “dynamic discovery” work:
Regex (Regular Expression)
Lets the AI construct Python re.search() patterns to precisely retrieve tool names. For example, creating get_.*_data can find all “get data” related tools at once (e.g., get_user_data). This works great when developers clearly know the tool naming conventions, allowing efficient batch searches for similar functionality.
BM25 (Natural Language Search)
Lets the AI use intuitive keywords (like “weather”) to find corresponding tools. This is a natural language query-based algorithm, especially important for environments with huge numbers of irregularly named tools. It lowers the AI’s search barrier, letting the model use everyday language to find tools.
Impressive Performance#
Tool Search Tool’s real-world results are significant:
- Initial loading: Reduced from 55,000 tokens to about 500 tokens
- Savings ratio: Up to 95%
- Scale support: Single directory can hold up to 10,000 tools
Advanced: Other techniques Anthropic launched simultaneously (click to expand ▼)
Programmatic Tool Calling (PTC):
- Lets the model write Python code to chain tools together, rather than having the model intervene for each call
- Intermediate data stays in the sandbox for processing, only final results return to the model
- Complex task token consumption reduced by about 37%
- Reference: Code execution with MCP
Tool Use Examples:
- Provide concrete JSON examples to guide the model in understanding complex parameters (like date formats or ID conventions)
- Accuracy improved from 72% to 90%
- Note: Currently incompatible with Tool Search Tool, need to choose one
- Reference: Introducing advanced tool use on the Claude Developer Platform
Implementation Timeline: Can Claude Code Users Use It Now?#
Yes, we can!
Anthropic engineer Thariq officially announced on January 15, 2026 that Claude Code would gradually roll out MCP Tool Search functionality, responding to the development community’s strong demand for “MCP server lazy (dynamic) loading.”

(Made by Wilson and Nano Banana Pro)
Auto-enabled by Default: Major Update in v2.1.7#
In v2.1.7, Tool Search mode became auto-enabled by default. Claude Code implemented a smart trigger mechanism:
- 10% threshold auto-trigger: When the system detects that MCP tool descriptions occupy more than 10% of the Context Window, it automatically switches to search mode loading
- Smart balance: Below the threshold, it maintains preloading for faster response times
In the updated v2.1.21 version, Anthropic further optimized the user experience:
- Changed Tool Search’s search process to a brief notification instead of displaying it directly in the conversation flow
- Keeps the interface clean, letting users focus on the task itself
Usage Tips: How to Get the Most Out of It#
Tool Search Tool is powerful, but you need some configuration strategies to maximize its benefits.
1. Preload Some Core Tools Anyway#
There are definitely some MCP tools you use all the time—so often that you’d think the AI shouldn’t even “consider” whether to load them; just use MCP, period.
That’s why the Claude development team recommends setting 3-5 most commonly used tools to defer_loading: false (preload). This makes sure the most common tasks (like file reading, basic queries) don’t need extra search reasoning steps, striking the best balance between maintaining system response speed and saving tokens.
2. Writing Precise Descriptions#
Tool Search needs to match names and descriptions, so developers should include clear semantic keywords in descriptions that match how users describe tasks:
- ✅ Good description: “Search customer order records and return detailed information”
- ❌ Bad description: “Execute query”
For the Regex variant, you can use (?i)slack for case-insensitive searches, increasing matching flexibility.
3. Custom Trigger Threshold (auto:N)#
Advanced users can use auto:N syntax to adjust the auto-enable percentage threshold:
- For example:
auto:20means only enable search mode when occupying 20% - Makes tool management fit individual project needs better
4. Cache Optimization Reminder#
Tool Search is specifically designed to not break Prompt Caching:
- System instructions and 3-5 core tools can be fixed in cache
- Dynamically added tools automatically expand and get reused in subsequent conversations without invalidating the cache
Non-Claude Code Users Have a Solution Too: mcp-cli#
For non-Claude Code users, there’s an open-source project called mcp-cli developed by Philipp Schmid, a senior AI engineer at Google, that implements the Tool Search solution.

Compatible Tools#
mcp-cli works with the following AI tools:
- Gemini CLI (Google)
- Custom-built AI Agents (with Bash execution capabilities)
- Any coding agent that supports command-line tool invocation
How It Works: Exactly Like Claude Code#
mcp-cli’s core logic is exactly the same as Claude Code Tool Search: dynamic discovery + on-demand loading. It simulates the Tool Search search mechanism through a CLI interface, providing a three-step workflow:
- Discover:
mcp-cli infolists all available servers and tools - Inspect:
mcp-cli info <server> <tool>gets the specific tool’s JSON Schema and parameter definitions - Execute:
mcp-cli callofficially starts the tool invocation
For more details, check out Philipp Schmid’s introduction article.
(Equally) Impressive Performance#
According to the author, mcp-cli’s real-world results are even better than Claude Code’s official implementation:
- 6 MCP servers (60 tools): Reduced from 47,000 tokens to 400 tokens
- Savings ratio: Up to 99%
This proves that “dynamic discovery” doesn’t have to rely on model built-in functionality. Through a well-designed CLI bridge, even non-Anthropic AI Agents can enjoy the benefits of reduced token consumption.
Conclusion#
Context management has become a core capability in AI development. Through lazy loading and dynamic discovery, AI agents’ capability ceiling will no longer be limited by token counts. This improvement lets AI agents move toward scalable applications. Whether it’s Claude Code’s built-in Tool Search or mcp-cli’s open-source implementation, both give AI developers a clear direction: future AI agents will be able to handle thousands of tools, and developers won’t need to worry about context bloat anymore.
References:

