Jun 2, 2026

Token-Efficient Tool Calling: Hidden Cost of Auth Overhead in Agent Context Windows

TL;DR

Every tool schema injected into an LLM context window carries a token cost; for a 40-tool MCP server, that's 10-15 KB of schema per turn before the agent does anything useful — documented by GitHub's own engineering team.
Within that overhead, credential parameters are the worst subcategory: they're not just token-wasteful; they land OAuth tokens, API keys, and secrets inside LLM context, directly mapping to OWASP LLM02 (Sensitive Information Disclosure) and LLM07 (System Prompt Leakage). An empirical study of agent skills found 89.6% of cases with embedded credentials are exploitable during normal execution — no elevated privileges required.
The fix is architectural, not cosmetic: tool schemas must carry zero credential metadata; credentials are resolved at execution time via vault-backed injection, never at schema-definition time.
Clean schema design eliminates the entire auth-parameter overhead category; execution-time injection means the agent calls a tool, gets a result, and the credential never touches LLM context.
Scalekit's ^execute_tool implements vault-backed execution-time injection so the schema your agent sees contains tool shape only — no auth surface, no token waste.

Everyone building production agents has started paying attention to the MCP Tax. The numbers are hard to ignore.

GitHub's own engineering team published the measurement: a 40-tool GitHub MCP server adds 10-15 KB of schema per turn. Connect three enterprise services — GitHub, Slack, a data warehouse — and you've consumed 30,000+ tokens of metadata before a single reasoning step. Run those workflows at scale and schema overhead alone costs real money every day.

The standard response to this is correct as far as it goes: prune unused tools, compress descriptions, implement lazy loading. GitHub cut agentic workflow token costs by 62% on production CI workflows using exactly this approach. Valid work. Expected outcome.

The problem is that framing treats all schema tokens as the same kind of waste. They aren't.

There are three distinct token overhead categories inside a typical MCP tool definition:

Schema token category

Origin

Compressible?

Security risk?

Tool descriptions

Your ^description field

Yes; trim without losing semantic signal

Parameter type and enum definitions

JSON schema typing

Yes; strip non-essential enums

Credential parameters (^api_key, ^bearer_token, ^access_token)

Auth design mistake

No; must be removed entirely

Yes — OWASP LLM02, LLM07

The third category is not a verbosity problem. It's an architectural mistake that happens to carry a token cost. The standard MCP Tax optimization playbook doesn't touch it — and that's the gap this post addresses.

What Credential Parameters in a Schema Actually Cost

Consider a send_email tool definition that embeds credentials the way many agent pipelines default to doing it:

{ "name": "send_email", "description": "Send an email via Gmail", "parameters": { "to": { "type": "string" }, "subject": { "type": "string" }, "body": { "type": "string" }, "oauth_token": { "type": "string", "description": "Gmail OAuth access token" }, "client_id": { "type": "string", "description": "OAuth client ID" } } }

Compare it to what the schema should contain:

{ "name": "send_email", "description": "Send an email via Gmail", "parameters": { "to": { "type": "string" }, "subject": { "type": "string" }, "body": { "type": "string" } } }

The two credential parameters (oauth_token, client_id) add approximately 80-120 tokens per schema definition. Across five connected tools — Gmail, Slack, GitHub, Salesforce, Notion — that's 400-600 tokens of auth overhead injected into every LLM call, including calls where the agent never invokes a single one of those tools.

That's not the headline number. The headline number is what those tokens are: an oauth_token field in an LLM context window is an OWASP LLM02 exposure surface. It doesn't matter if the LLM never reads it directly. Once a credential string enters LLM context, it can surface via prompt injection, be embedded in a reasoning trace, get logged via stdout (which agent frameworks feed back into context), or leak through a downstream tool's output.

An empirical study examining credential leakage in LLM agent skills found that debug logging — print and console.log statements that agent frameworks capture into LLM context — accounts for 73.5% of credential exposure cases. Leaked credentials were immediately actionable in 89.6% of affected cases during normal execution, no elevated privileges required.

The MCP specification community raised exactly this issue: in the standard flow, raw OAuth access tokens presented to the MCP server are exposed to the agent logic and the underlying LLM, directly aligning with OWASP LLM02. The discussion frames a gateway model as the mitigation — one that terminates the user's raw OAuth token and never exposes it to the LLM or backend servers.

Two damages, one root cause. You fix both or neither.

Schema Design That Contains Zero Credential Metadata

The design principle: a tool schema should contain exactly what the LLM needs to decide whether and how to call the tool. Credentials are not part of that decision. They are an execution concern, resolved after the LLM makes its choice.

What belongs in a tool schema:

Tool name and description (enough for the LLM to select it correctly)
Semantic parameters (the data the tool operates on; user-provided and task-specific)
Return type hints (optional; helps the LLM interpret outputs)

What does not belong:

^api_key, ^bearer_token, ^access_token, ^{client_secret}, ^oauth_token
Any parameter that is identity-bound rather than task-bound
Any parameter whose value is static across calls for the same user or org

The test is simple: would this parameter value change if the same task were run by a different user? If yes, it's a credential. Keep it out of the schema.

Applied to three common enterprise tools:

Tool

Remove (credential)

Keep (semantic)

^{github_create_issue}

^github_token, ^{installation_id}

^repo, ^title, ^body, ^labels

^{slack_send_message}

^bot_token, ^{oauth_access_token}

^channel, ^text, ^thread_ts

^{salesforce_create_record}

^access_token, ^instance_url

^object_type, ^fields

None of the removed parameters affect the LLM's tool selection logic. The LLM picks github_create_issue because the task calls for creating a GitHub issue — not because a token was present. Removing credential parameters makes schemas leaner without changing the reasoning signal.

For the auth-side architecture that governs what replaces those parameters at execution time, see Tool Calling Authentication for AI Agents.

Vault-Backed Credential Injection at Execution Time

If credentials aren't in the schema, how does the tool call authenticate?

The answer is: the execution layer resolves credentials from the vault using the connected_account_id, not from any parameter the LLM passed. The resolution path looks like this:

LLM sees: execute_tool("send_email", { to, subject, body }) Execution layer: resolves gmail_oauth_token from vault for connected_account_id API call: POST /gmail/v1/users/me/messages Authorization: Bearer {resolved_token} LLM receives: { message_id, thread_id, label_ids }

The LLM never sees the token. The schema never carries it. The vault is the only place it lives.

Three properties this guarantees:

Schema integrity: Every LLM call carries the same schema shape regardless of which user the agent is acting for. No per-user token injection into context; no schema variation across tenants.
Context isolation: The credential never lands in LLM context, which rules out prompt injection exfiltration of the token itself and eliminates the ^stdout-logging exposure vector.
Token lifecycle transparency: Refresh, rotation, and revocation are handled at the execution layer. The schema doesn't change when a token expires; the vault resolves a fresh one.

One clarification on scope: vault-backed injection eliminates the credential-parameter subcategory of schema overhead. It does not address description verbosity or unused tool definitions — those are separate problems requiring tool filtering, lazy loading, and description compression. Credential cleanup is the most security-critical optimization; description compression is the highest-volume one. Both are necessary; they're not the same fix.

For the full architecture behind token vault design in agent workflows, see Token Vault: Why It's Critical for AI Agent Workflows.

How Scalekit Implements Execution-Time Injection

Three steps: one-time connection setup per service, per-user authorization, per-call execution with vault resolution.

# Step 1: Create a connected account once per user connected_account = scalekit.create_connected_account( user_id="user_123", connector="gmail" ) # Step 2: Tool schema — zero auth parameters tools = [ { "name": "send_email", "description": "Send an email via Gmail", "parameters": { "to": { "type": "string" }, "subject": { "type": "string" }, "body": { "type": "string" } } } ] # Step 3: execute_tool resolves credentials at call time, not schema time result = scalekit.execute_tool( tool_name="send_email", connected_account_id=connected_account.id, params={"to": "...", "subject": "...", "body": "..."} )

What execute_tool does that a home-built pipeline typically doesn't:

Fetches the user's OAuth token for Gmail from the vault; proactively refreshed before expiry so the token is never stale mid-call (Slack rotates on every use; Google expires in one hour; GitHub PATs have configurable expiry — each handled per-provider without custom logic in your agent)
Executes the authenticated API call against Gmail
Returns structured output to the agent
Logs ^{connection_id}, ^tool_name, and ^{connected_account_id} to the audit trail — the ^{connection_id} is the immutable link back to the original authorization event

The schema your LLM sees has three parameters: to, subject, body. No oauth_token. No client_id. The vault resolved them; the agent never knew they existed.

Tenant isolation is enforced at the vault level: the credential store for user_123 is isolated from user_456. One user's token cannot leak to another's execute_tool call, regardless of how those calls are batched or scheduled.

Get started: AgentKit Quickstart

Framework-specific integration examples: LangChain | Anthropic

FAQs

Does removing credential parameters from schemas change how the LLM selects tools?

No. Tool selection is driven by the tool name, description, and semantic parameters. Credential parameters carry zero selection signal — the LLM picks github_create_issue because the task calls for creating a GitHub issue, not because a github_token field was present. Removing them makes the schema surface smaller and the selection signal cleaner.

What if I'm using LangChain or the Anthropic SDK, which wrap tool definitions?

Neither framework requires credentials in the tool schema. The schema describes the tool interface; authentication is handled inside the execution function that gets called when the LLM invokes the tool. The execute_tool call inside that function is where Scalekit resolves credentials. See the LangChain integration example and the Anthropic integration example.

Is this only relevant for multi-tenant B2B agents?

No. Even single-tenant agents benefit: credentials out of schema means credentials out of LLM context, eliminating OWASP LLM02 exposure regardless of tenant count. The token efficiency benefit compounds with scale; the security benefit is absolute from day one.

How does this interact with the broader MCP Tax problem?

Execution-time credential injection eliminates the auth-parameter subcategory of schema overhead. It does not address description verbosity or unused tool definitions. Those require separate techniques: tool filtering, lazy loading, description compression. Credential cleanup is the most security-critical fix; description compression typically delivers higher token volume reduction. A complete token efficiency strategy requires both.

No items found.