
Everyone building production agents has started paying attention to the MCP Tax. The numbers are hard to ignore.
GitHub's own engineering team published the measurement: a 40-tool GitHub MCP server adds 10-15 KB of schema per turn. Connect three enterprise services — GitHub, Slack, a data warehouse — and you've consumed 30,000+ tokens of metadata before a single reasoning step. Run those workflows at scale and schema overhead alone costs real money every day.
The standard response to this is correct as far as it goes: prune unused tools, compress descriptions, implement lazy loading. GitHub cut agentic workflow token costs by 62% on production CI workflows using exactly this approach. Valid work. Expected outcome.
The problem is that framing treats all schema tokens as the same kind of waste. They aren't.
There are three distinct token overhead categories inside a typical MCP tool definition:
The third category is not a verbosity problem. It's an architectural mistake that happens to carry a token cost. The standard MCP Tax optimization playbook doesn't touch it — and that's the gap this post addresses.
Consider a send_email tool definition that embeds credentials the way many agent pipelines default to doing it:
Compare it to what the schema should contain:
The two credential parameters (oauth_token, client_id) add approximately 80-120 tokens per schema definition. Across five connected tools — Gmail, Slack, GitHub, Salesforce, Notion — that's 400-600 tokens of auth overhead injected into every LLM call, including calls where the agent never invokes a single one of those tools.
That's not the headline number. The headline number is what those tokens are: an oauth_token field in an LLM context window is an OWASP LLM02 exposure surface. It doesn't matter if the LLM never reads it directly. Once a credential string enters LLM context, it can surface via prompt injection, be embedded in a reasoning trace, get logged via stdout (which agent frameworks feed back into context), or leak through a downstream tool's output.
An empirical study examining credential leakage in LLM agent skills found that debug logging — print and console.log statements that agent frameworks capture into LLM context — accounts for 73.5% of credential exposure cases. Leaked credentials were immediately actionable in 89.6% of affected cases during normal execution, no elevated privileges required.
The MCP specification community raised exactly this issue: in the standard flow, raw OAuth access tokens presented to the MCP server are exposed to the agent logic and the underlying LLM, directly aligning with OWASP LLM02. The discussion frames a gateway model as the mitigation — one that terminates the user's raw OAuth token and never exposes it to the LLM or backend servers.
Two damages, one root cause. You fix both or neither.
The design principle: a tool schema should contain exactly what the LLM needs to decide whether and how to call the tool. Credentials are not part of that decision. They are an execution concern, resolved after the LLM makes its choice.
What belongs in a tool schema:
What does not belong:
The test is simple: would this parameter value change if the same task were run by a different user? If yes, it's a credential. Keep it out of the schema.
Applied to three common enterprise tools:
None of the removed parameters affect the LLM's tool selection logic. The LLM picks github_create_issue because the task calls for creating a GitHub issue — not because a token was present. Removing credential parameters makes schemas leaner without changing the reasoning signal.
For the auth-side architecture that governs what replaces those parameters at execution time, see Tool Calling Authentication for AI Agents.
If credentials aren't in the schema, how does the tool call authenticate?
The answer is: the execution layer resolves credentials from the vault using the connected_account_id, not from any parameter the LLM passed. The resolution path looks like this:
The LLM never sees the token. The schema never carries it. The vault is the only place it lives.
Three properties this guarantees:
One clarification on scope: vault-backed injection eliminates the credential-parameter subcategory of schema overhead. It does not address description verbosity or unused tool definitions — those are separate problems requiring tool filtering, lazy loading, and description compression. Credential cleanup is the most security-critical optimization; description compression is the highest-volume one. Both are necessary; they're not the same fix.
For the full architecture behind token vault design in agent workflows, see Token Vault: Why It's Critical for AI Agent Workflows.
Three steps: one-time connection setup per service, per-user authorization, per-call execution with vault resolution.
What execute_tool does that a home-built pipeline typically doesn't:
The schema your LLM sees has three parameters: to, subject, body. No oauth_token. No client_id. The vault resolved them; the agent never knew they existed.
Tenant isolation is enforced at the vault level: the credential store for user_123 is isolated from user_456. One user's token cannot leak to another's execute_tool call, regardless of how those calls are batched or scheduled.
Get started: AgentKit Quickstart
Framework-specific integration examples: LangChain | Anthropic
No. Tool selection is driven by the tool name, description, and semantic parameters. Credential parameters carry zero selection signal — the LLM picks github_create_issue because the task calls for creating a GitHub issue, not because a github_token field was present. Removing them makes the schema surface smaller and the selection signal cleaner.
Neither framework requires credentials in the tool schema. The schema describes the tool interface; authentication is handled inside the execution function that gets called when the LLM invokes the tool. The execute_tool call inside that function is where Scalekit resolves credentials. See the LangChain integration example and the Anthropic integration example.
No. Even single-tenant agents benefit: credentials out of schema means credentials out of LLM context, eliminating OWASP LLM02 exposure regardless of tenant count. The token efficiency benefit compounds with scale; the security benefit is absolute from day one.
Execution-time credential injection eliminates the auth-parameter subcategory of schema overhead. It does not address description verbosity or unused tool definitions. Those require separate techniques: tool filtering, lazy loading, description compression. Credential cleanup is the most security-critical fix; description compression typically delivers higher token volume reduction. A complete token efficiency strategy requires both.