Announcing CIMD support for MCP Client registration
Learn more

The Tool Calling Auth Stack: What You Need at Each Stage of Agent Maturity

TL;DR

  • The tool-calling auth stack has 5 layers. They are sequentially dependent: skipping layer N means layers N+1 through N+4 cannot be properly implemented. Most teams discover this one production incident per layer.
  • The credential ownership decision — agent code owns the token vs. vault owns the token — is the only architecturally irrecoverable choice in this stack. Every other layer can be retrofitted at cost. This one cannot.
  • The second customer exposes missing tenant isolation. The first enterprise customer exposes missing delegation and audit trail. The SOC 2 review exposes all five.
  • Proactive token refresh is not "good hygiene." It is a hard requirement. Slack, Google, and HubSpot each have undocumented refresh behaviors that produce different silent failure modes per provider. You find them one production outage at a time.
  • Only 21.9% of teams treat AI agents as independent, identity-bearing entities. The other 78.1% retrofit auth architecture at enterprise contract time.

Every tool-calling agent needs the same 5 auth components. They are not optional modules. They are a dependency chain: you cannot properly implement layer N without having correctly built layer N-1.

In a demo, none of this is visible. The demo works with a single access_token hardcoded in an environment variable, a single user, a single provider, no refresh logic, no tenant boundaries, no audit trail.

In production — specifically the moment a second customer organization appears — every skipped layer becomes a live failure.

The sketch below is the full auth stack your tool-calling agent converges on. This post maps each layer: what it requires, what breaks when it's absent, and which pressure event exposes the gap.

Layer
What it governs
Pressure event that exposes the gap
Credential Ownership
Where access_token and refresh_token physically live
First production incident involving token exposure
Token Storage
Encryption, isolation, log exposure
First security review or penetration test
Tenant Isolation
Per-user, per-org credential scoping
Second customer
Scope Enforcement
What actions each credential can take
First enterprise customer or SOC 2 Type II
Token Lifecycle
Refresh, rotation, revocation
First long-running background agent in production
Audit Trail
Attribution, delegation chain, compliance evidence
Enterprise procurement security questionnaire

The layers are ordered by dependency, not by importance. All five are mandatory in production. The ordering describes which ones you can temporarily defer and which ones, if deferred at the wrong point, create structural rework.

Layer 0: Credential Ownership — The Decision That Determines Everything Else

Before the first line of auth code, there is an architectural choice that most agent engineers treat as a developer experience preference. It is not.

When you choose how to connect your agent to external tools, three integration patterns are available:

Integration Pattern
Where access_token lives
Who owns refresh
Blast radius on compromise
Direct function calling
Agent process memory (or env var; same blast radius)
Agent developer
All tenants sharing the credential
MCP server
MCP server process memory
MCP server developer
All clients connected to that MCP server
Vault-backed connector layer
Encrypted vault; never in agent runtime
Vault infrastructure
Single ConnectedAccount; cryptographically isolated

The direct function calling pattern feels like the correct choice at prototype stage. Low setup cost. Full control. Fits any LLM SDK. All true. The problem is what this choice makes mandatory for everything that follows.

When the agent process owns the access_token and refresh_token, every auth layer in the stack becomes your engineering team's problem to build and maintain:

  • Tenant isolation is now a custom database schema with row-level access control (or a custom vault implementation)
  • Token refresh is now a custom scheduler with distributed locking logic
  • Scope enforcement is now custom middleware before every tool call
  • Audit attribution is now manual instrumentation in every tool execution path
  • Revocation handling is now a polling loop or a custom webhook listener per provider

None of these is a documentation problem. Each is a full engineering surface that scales linearly with the number of providers your agent integrates with.

The MCP server pattern moves credential ownership from the agent process to the MCP server process. The credential problem relocates; it does not disappear. Additionally, MCP introduces a second auth boundary — agent to MCP server — that is frequently left completely unauthenticated in production. The credential ownership post documents the precise failure mode: three concurrent agent threads, one shared refresh_token, Slack token rotation enabled. At 2:47am, all three threads detect an expired access_token simultaneously. All three call the OAuth token endpoint with the same refresh_token. The first thread succeeds; threads two and three receive invalid_grant. They fail silently. The investigation takes four days.

That failure is not a bug. It is the documented behavior of any architecture where the agent process owns the refresh_token, specified in RFC 6749 §6: refresh tokens are single-use.

The vault-backed pattern resolves this structurally. The access_token and refresh_token never exist in agent runtime memory or LLM context. The agent holds an opaque connected_account_id. The vault owns refresh, rotation, revocation detection, and audit attribution as infrastructure primitives.

# Pattern 1: agent owns the token access_token = os.getenv("GITHUB_TOKEN") # one token; all tenants; no refresh; no audit headers = {"Authorization": f"Bearer {access_token}"} response = requests.post(f"{GITHUB_API}/repos/{repo}/issues", headers=headers, json=payload) # Pattern 3: vault owns the token result = await scalekit.agents.execute_tool( connected_account_id=user.connected_account_id, # opaque reference; no raw token in scope tool_name="github_create_issue", tool_input={"repo": repo, "title": title, "body": body} )

The execute_tool call above does not contain a credential. The agent never receives one. This is the only pattern where all five subsequent layers can be implemented without custom infrastructure.

Layer 1: Token Storage — Encryption, Isolation, and Why "Secure" Is Not a Boolean

Once credential ownership is resolved, the next question is where tokens are stored and what "secure storage" actually means at the implementation level.

Three distinct failure modes appear in production systems that describe their token storage as "secure":

1. Application-layer encryption without cryptographic tenant isolation

Tokens are encrypted at rest in a shared database table. Encryption key is a single application secret. A misconfigured query — SELECT * FROM connected_accounts WHERE user_id = ? missing an AND org_id = ? clause — returns tokens from a different tenant's ConnectedAccount. The token is encrypted, but it's the wrong tenant's encrypted token, and the agent decrypts and uses it correctly. This is not a data bug. It is a data leak that produces correct-looking behavior.

Correct storage requires separate key material per org, not application-layer encryption with a shared key.

2. Raw tokens appearing in structured logs

The Authorization: Bearer eyJhbGci... header in an outbound API request appears verbatim in access logs if the HTTP client logs request headers. This is the most common and least-discussed token exposure vector in agent systems. The token is encrypted at rest in the database. It is plaintext in the log aggregator.

Correct storage requires that access_token values never appear in any log path: application logs, HTTP client debug logs, LLM completion logs, or structured telemetry.

3. Tokens in LLM context

An agent that receives a raw access_token as part of its tool call input — even temporarily, even for a single turn — exposes that credential to the full prompt injection attack surface. Not just to adversarial extraction; to any input that instructs the agent to include the token in a response, a summary, or a tool call to a second system.

Correct storage requires that credentials never enter the LLM context at any point in the tool execution path. The agent calls a tool by name with business-logic parameters; credential injection happens inside the connector layer, after the LLM has produced its output and before the API call is made.

Only 37% of organizations enforce purpose binding for AI agent credentials. Storage misconfiguration is the primary mechanism through which agent credentials reach unintended destinations.

Storage approach
Encryption model
Tenant isolation method
LLM context exposure risk
Env vars / .env file
None
None (shared across all)
High (visible in process environment)
Application DB, shared encryption key
At rest, application layer
Row-level access control
Medium (query filter bug = cross-tenant)
Application DB, per-tenant key material
At rest, per-tenant
Cryptographic
Low (query bug returns encrypted blob unusable without correct key)
Dedicated vault, opaque references
At rest, per-tenant, separate key rotation
Cryptographic + reference-only exposure
None (raw token never in agent runtime)

Layer 2: Tenant Isolation — The Second Customer Is the Stress Test

The transition from one customer to two is not an incremental change. It is the first time your agent runs against two different organizations' data simultaneously, on two different sets of credentials, where a routing error exposes one organization's data to the other.

The specific failure is structural, not careless. A fintech team wires an AI triage agent across four engineering Slack channels, one per team. Three months into production with a dozen customer orgs, the backend team starts finding issues landing in the frontend repo. Root cause: a shared GitHub access_token with no tenant boundary. Every channel, every customer org, used the same identity to act. (Full pattern analysis here.)

At the tool call level, multi-tenant isolation requires a precise distinction between two concepts that most teams initially conflate:

Connection: The developer's OAuth app credentials for a provider. Configured once, in the Scalekit dashboard, per provider. Contains the client_id and client_secret for your OAuth app registration with GitHub, Salesforce, Slack, etc. One Connection per provider, shared across all your customers.

ConnectedAccount: A specific end user's authorization grant. Created when a user completes the OAuth flow for a provider. Contains that user's access_token, refresh_token, authorized scopes, and token expiry — all encrypted and isolated to that user's org. One ConnectedAccount per user per provider.

The agent must resolve every tool call against a ConnectedAccount, never against a Connection. The Connection is infrastructure. The ConnectedAccount is identity.

# Wrong: resolves against Connection (developer's app credentials, shared across all customers) result = await execute_github_tool( tool_name="github_create_issue", tool_input={"repo": repo, "title": title} # implicit: uses the developer's GitHub OAuth app token ) # Correct: resolves against ConnectedAccount (this specific user's authorization grant) result = await scalekit.agents.execute_tool( connected_account_id=user.scalekit_connected_account_id, # bound to this user's org tool_name="github_create_issue", tool_input={"repo": repo, "title": title} )

The user.scalekit_connected_account_id is the tenant isolation boundary. It is the identifier that binds every tool call to a specific user's authorization event in a specific customer org. Without it, every tool call is potentially cross-tenant.

What tenant isolation requires in the connection config:

# Creating a ConnectedAccount — this is the per-user OAuth authorization event auth_url = scalekit.agentkit.get_authorization_url( connection_id="conn_github", # developer's Connection, configured once user_id=current_user.id, # your application's user identifier org_id=current_user.organization_id, # org boundary — cryptographically scopes the stored token redirect_uri="https://yourapp.com/oauth/callback" ) # User completes the OAuth flow; Scalekit stores the token isolated to (org_id, user_id) # Returns: connected_account_id — the opaque reference your agent uses on every call

Layer 3: Scope Enforcement — Reading Is Not Writing; Background Is Not User-Delegated

Per-tenant isolation guarantees that the right credential is used for the right organization. Scope enforcement guarantees that the right action is authorized for that credential.

These are different problems. A ConnectedAccount with full admin scopes, correctly isolated per tenant, can still allow an agent to perform actions the authorizing user never intended to authorize.

Two distinct categories of agent tool calls require different OAuth flows:

User-delegated actions (Authorization Code Flow, RFC 6749)

The agent acts on behalf of a specific user. The user completed the OAuth flow; the agent inherits exactly that user's permissions — no more, no less. If the user can't write to a repository, the agent can't write to a repository. If the user's Jira access doesn't include PROJ-PRIVATE, the agent's Jira queries return exactly what that user's Jira session would return.

This is the correct model for any action that creates, modifies, or accesses data attributable to a specific user: creating issues, sending emails, updating CRM records, posting Slack messages.

Org-level background actions (Client Credentials Flow, RFC 6749)

The agent runs on org-level credentials. No user in the loop. Correct for syncing records, updating dashboards, running scheduled reports — actions where the operation is systemic and the org is the principal, not a specific employee.

The structural problem: many teams use a service account (Client Credentials or a shared OAuth token with broad scopes) for all agent actions, including actions that should be user-delegated. The consequence is not just an audit failure; it is an active security gap.

GitHub get_repository_secrets is available to senior developers. An agent running on that developer's over-scoped service account can query secrets the task never required. Not because the LLM was instructed to; because "gather context" and "read available data" are reasonable agent behaviors that the absence of scope enforcement turns into privilege escalation. (IAM and agent access control deep dive here.)

Scope enforcement at the tool call boundary requires:

  • Scope bound at ConnectedAccount creation time. The OAuth authorization request specifies exactly the scopes the agent requires: github:repo:read, not github:admin. The token stored in the vault carries only those scopes.
  • Scope enforced before the API call, not after. A ConnectedAccount authorized with read-only GitHub scopes cannot execute github_create_issue regardless of what the LLM decides to try. The scope check happens inside the connector layer; the API call is never made.
  • Scope mismatch returns a structured error to the agent. The LLM receives: "This action requires write scope. The connected account is authorized for read only." It does not receive a 403 from GitHub with no explanation.
Action type
Required OAuth flow
Token scope
Blast radius if misconfigured
User creates a Jira issue
Authorization Code
write:issue for that user's workspace
One user's workspace
Agent syncs Salesforce records to dashboard
Client Credentials
Org-level CRM read access
All records in the org
Agent posts Slack summary on user's behalf
Authorization Code
chat:write for that user's Slack workspace
That user's Slack workspace
Agent runs nightly GitHub PR summary
Client Credentials
Org-level repo:read
All repos the service account can access

Layer 4: Token Lifecycle — Refresh, Rotation, Revocation, and the Silent Failures

Layers 0 through 3 can all be implemented correctly while this layer destroys your agent's reliability in production. Token lifecycle failures are the most operationally invisible failure class in agent systems because they often do not throw exceptions. They produce incorrect behavior silently, at 2am, on a background workflow that nobody is watching.

Three distinct lifecycle events produce three distinct failure modes:

Expiry

access_token lifetimes vary by provider: 1 hour for Google, 6 hours for HubSpot, configurable for Slack. The agent runs continuously. At some point, a token expires mid-workflow.

Reactive refresh — waiting for a 401 response, then calling the refresh endpoint — is the approach most teams implement first. It creates a race condition: multiple concurrent agent threads detect the 401 simultaneously, all attempt refresh simultaneously. The first succeeds; the others may receive invalid_grant depending on the provider's token rotation behavior.

Proactive refresh — checking expires_in before each tool call, refreshing if within a configurable window — prevents this. But proactive refresh requires distributed locking: if two threads check simultaneously and both determine a refresh is needed, the race condition reappears. The locking logic must be implemented once, correctly, per deployment environment.

Rotation

Slack, with token rotation enabled, invalidates the refresh_token on each use. This is correct per RFC 6749 §6. It is also the specific failure mode documented in the credential ownership post: three concurrent threads, one refresh_token, invalid_grant on threads two and three. Silent failure. Four-day investigation.

Provider-specific rotation behavior is not consistently documented. You encounter each variant at production scale, not in the OAuth spec.

Provider
Access token TTL
Refresh token rotation
Silent failure risk
Google Workspace
1 hour
No rotation
Low (revocation is explicit; expiry is predictable)
Slack (rotation enabled)
Configurable
Yes — single-use per RFC 6749 §6
High (concurrent refresh produces invalid_grant; no exception)
Microsoft 365
1 hour
Rolling window; not invalidated immediately
Medium (stale refresh tokens work until the window closes)
HubSpot
6 hours
No rotation
Medium (expiry window is predictable; rotation not the issue)
GitHub PAT
Configurable; frequently set to "never"
N/A
Low (doesn't expire until explicitly configured to do so)
Salesforce
Configurable via org policy
Optional rotation
Medium (org policy varies; what works in sandbox may not in production)

Revocation

A user disconnects their account. An IdP admin revokes an OAuth grant. An employee's Okta account is disabled and the cascade reaches the OAuth grant.

Without event-driven revocation detection, the agent discovers this when the next tool call returns 401 on a token that was valid one minute ago. At that point, the agent is mid-task. The task fails. The agent must surface an error to a workflow that assumed the credential was valid.

With agent webhooks on connected_account.disconnected, the revocation event fires before the next task starts. The orchestration layer can pause the workflow, notify the user for re-authorization, and resume — rather than discovering the problem inside a running task.

Revocation detection via webhooks requires the infrastructure to exist before the first revocation event. You cannot retroactively add event-driven revocation detection to a polling-based architecture without replacing the polling architecture.

Layer 5: Audit Trail and Delegation Chain — The Enterprise Gate

The audit trail is the last layer, but it is not independent. It is structurally dependent on every layer beneath it. You cannot have a meaningful audit trail if connected_account_id doesn't exist (Layer 2 gap), if scopes weren't bound at authorization time (Layer 3 gap), or if refresh events aren't captured (Layer 4 gap).

This matters because the audit trail is how enterprise security teams evaluate your agent. The SOC 2 CC6.1 requirement is not "do you have logs." It is: "show me evidence that the credential used to access this specific record was authorized by the specific user whose data was accessed, and that the credential was valid at the time of access."

A shared service account credential cannot satisfy this requirement. A user_id + timestamp log cannot satisfy this requirement. Only a log entry that contains all of the following can:

Required fields in every agent action audit entry:

Field
What it proves
Which layer provides it
connection_id
The immutable link between this action and the original OAuth app registration
Layer 2
connected_account_id
Which user's delegation was exercised
Layer 2
scope_used
What specific permission was exercised (not just which tool was called)
Layer 3
token_issued_at
When the credential was issued
Layer 4
token_expires_at
That the credential was valid at action time
Layer 4
tool_call_id
Correlation ID for this specific execution
Layer 5
tool_name
What action was taken
Layer 5
org_id
Which tenant's data was accessed
Layer 2

The compliance standard mapping:

Standard
Requirement
What a missing layer looks like during audit
SOC 2 CC6.1
Evidence credentials were valid at time of each action
Missing token_issued_at / token_expires_at: cannot prove credential wasn't expired or revoked
GDPR Article 6
Data access linked to a specific lawful basis and identity
Missing connected_account_id: cannot distinguish agent access from service account access
HIPAA
Every data access linked to a specific authorized identity
Shared service account: access attributed to application, not to an authorized individual
ISO 27001 A.9
Least-privilege enforcement logged per action
Missing scope_used: cannot demonstrate minimum necessary access

The four event categories every agent audit trail must capture:

  1. Authorization events: connected_account.created with scopes and connection_id; connected_account.reauthorized with scope delta
  2. Token lifecycle events: token.refreshed, token.rotated, token.revoked with timestamps and triggering mechanism
  3. Tool execution events: tool.called with tool_call_id, connected_account_id, tool_name, tool_input (sanitized); tool.completed or tool.failed with result metadata
  4. Scope change events: any delta in authorized scopes between re-authorizations — required for GDPR Article 6 compliance signaling

The dependency statement is precise: a gap at Layer 2 means connected_account_id doesn't exist in the audit log. The log entry attributes the action to an application, not to a user. The SOC 2 auditor's question — "show me the user who authorized this action" — has no answer.

A gap at Layer 3 means scope_used is absent or is the full scope set rather than the specific scope exercised. The ISO 27001 question — "demonstrate least privilege" — cannot be answered with "the user authorized these 12 scopes and the agent used all of them."

A gap at Layer 4 means refresh events aren't recorded. The GDPR Article 6 question — "was consent still valid at the time of this access" — cannot be answered for actions that occurred after a token refresh, because the refresh itself isn't attributed to a re-authorization event.

For a complete treatment of audit trail structure and compliance mapping, see: Audit Trails for Agent Auth in B2B SaaS.

What Scalekit Implements Across All 5 Layers

The total implementation surface for all 5 layers — built correctly, with proper distributed locking, per-provider refresh edge cases, cryptographic tenant isolation, and structured audit logging — is conservatively a year of careful engineering. That estimate comes from the build vs. buy analysis, which also documents the deal-velocity cost: a $400K enterprise deal stalled because the security questionnaire asked about revocation controls and scope governance that the homegrown system wasn't built to answer.

Scalekit's AgentKit implements all 5 layers as infrastructure primitives. Each one maps directly to a component:

Auth Stack Layer
Scalekit Component
What It Implements
Layer 0: Credential Ownership
Token Vault
AES-256 encrypted per-tenant vault; access_token and refresh_token never enter agent runtime; agent holds opaque connected_account_id
Layer 1: Token Storage
Token Vault
Cryptographic per-org isolation with separate key material; auto-redaction from structured log paths; no raw token in LLM context at any point
Layer 2: Tenant Isolation
Connections + Connected Accounts
Connection = developer's OAuth app config (one per provider); ConnectedAccount = user's auth grant (one per user per provider); execute_tool resolves credential from connected_account_id
Layer 3: Scope Enforcement
Connector scopes
Scope bound to ConnectedAccount at authorization time; scope check enforced inside connector before API call; read-only ConnectedAccount cannot execute write tools regardless of LLM output
Layer 4: Token Lifecycle
Token Vault + Agent Webhooks
Proactive refresh scheduled against expires_in; distributed lock prevents concurrent refresh; connected_account.disconnected fires on revocation before the next tool call; provider-specific rotation behavior handled per connector
Layer 5: Audit Trail
Auth Logs
Structured, immutable entries with connection_id, connected_account_id, tool_call_id, scope_used, token_issued_at, token_expires_at per action; SIEM-exportable; full delegation chain recorded

The single execute_tool call abstracts every layer:

result = await scalekit.agents.execute_tool( connected_account_id=user.connected_account_id, # opaque reference; no raw token tool_name="salesforce_create_record", tool_input={ "object_type": "Lead", "fields": {"FirstName": "Alex", "Company": "Acme", "Status": "New"} } )

What this call does before the Salesforce API receives the request:

  • Resolves the access_token from the vault by connected_account_id
  • Checks whether the token is within the proactive refresh window
  • If yes: acquires a distributed lock, refreshes, releases the lock, proceeds with the new token
  • Verifies the ConnectedAccount has salesforce:lead:write scope
  • Injects the credential into the outbound request headers (never exposed to the agent)
  • Emits a structured tool.called audit log entry with full attribution

After the call:

  • Emits tool.completed or tool.failed with tool_call_id correlation

The architecture you build on Scalekit's free tier — 10,000 Connected Accounts at $0/month — is the same architecture that handles enterprise customer security reviews. No rewrites when you add your 50th customer or your first SOC 2 audit. Start at app.scalekit.com.

FAQs

If I'm using an MCP server, do I still have the credential ownership problem?

Yes. The MCP server is now the entity that owns the access_token and refresh_token. All 5 layers still apply; they've relocated to the MCP server process. The agent-to-MCP boundary is a second auth surface that's frequently left unauthenticated in production — 492+ MCP servers were found exposed without authentication in production as of Q1 2026. The outbound boundary (MCP server to downstream API) is almost always Pattern 1 credential ownership — the agent developer's problem — with the MCP server developer as the new owner. See the MCP auth overview for the correct auth topology.

What's the precise failure mode when I go from 1 to 2 enterprise customers without tenant isolation?

A user_id lookup without an org_id filter. If connected_accounts is queried by user_id alone, and two customers happen to have users with the same internal ID (common with incremental integer PKs or UUID collisions in imported data), the query returns the wrong tenant's tokens. The agent decrypts and uses them correctly. Correct-looking behavior. Wrong tenant's data. This is not detectable by the agent and not detectable by standard application monitoring. It surfaces when a customer reports seeing another customer's records — after the fact.

My agent runs background org-level jobs on Client Credentials for all actions. When does this become a compliance problem?

The moment any action must be attributable to a specific user. SOC 2 CC6.1, GDPR Article 6, and HIPAA all require linking data access to a specific authorized identity. A service account satisfies none of these requirements. Background sync, scheduled reports, and org-level dashboard updates are legitimate Client Credentials use cases. Any action touching a specific user's data — reading their email, updating their CRM record, posting on their behalf — requires a ConnectedAccount scoped to that user. The rule: if the action would require the user's explicit consent in a human-driven workflow, it requires their delegated authorization in an agent workflow.

Can I retrofit vault-backed credential ownership after building direct function calling into my agent?

Technically yes. The migration cost is high and the timeline is almost always underestimated. Token storage, refresh logic, scope enforcement, audit instrumentation, and tenant isolation are all built on top of the credential location decision. Migrating requires: refactoring every tool call to use opaque ConnectedAccount references (touching every integration point in the agent), replacing your custom refresh scheduler with vault-managed proactive refresh, rebuilding your audit log schema to include connection_id and connected_account_id (retroactive audit records cannot be reconstructed), and migrating existing stored tokens into the vault with per-tenant key material. Most teams do this once, under time pressure, before a significant enterprise deal closes. The build vs. buy post documents the full cost structure.

How does Scalekit handle providers with non-standard OAuth behavior — Salesforce instance URLs, HubSpot portal IDs, Slack workspace-scoped tokens?

Each connector in AgentKit has provider-specific behavior encoded: Salesforce instance URL resolution per ConnectedAccount (each customer's Salesforce org lives at a different subdomain), HubSpot portal-scoped token binding, Slack workspace-specific token rotation behavior with distributed refresh locking. These are the implementation details you discover one integration at a time when rolling your own — each one a production incident. Scalekit's 150+ prebuilt connectors encode the provider-specific edge cases that don't appear in the OAuth spec.

No items found.
Agent Auth Quickstart
Share this article
Agent Auth Quickstart

Acquire enterprise customers with zero upfront cost

Every feature unlocked. No hidden fees.
Start Free
$0
/ month
1 million Monthly Active Users
100 Monthly Active Organizations
1 SSO connection
1 SCIM connection
10K Connected Accounts
Unlimited Dev & Prod environments