Announcing CIMD support for MCP Client registration
Learn more

Managing OAuth Token Refresh for Long-Running Agents

TL;DR

  • Long-running agents are a structurally different execution class from short-lived API clients. OAuth access token TTLs — 30 minutes for HubSpot, 1 hour for Google, configurable on Salesforce but typically set between 15 minutes and 2 hours — were not designed around an agent that runs for 6 to 24 hours across multiple connected services simultaneously.
  • When a token expires mid-task, the agent does not always throw an exception. The most common failure modes are silent: empty tool results the LLM interprets as "no data," partial writes that corrupt downstream state, or hallucinated recovery paths that look like progress. Agents don't crash — they drift.
  • Managing one token's TTL is a solved problem. Managing three or more independent TTL clocks inside a single stateful agent run — with no shared refresh schedule and concurrent async tool calls — is an orchestration problem, not an auth configuration problem.
  • Refresh tokens carry their own end-of-life clock on a different time axis. A Google invalid_grant or Salesforce INVALID_SESSION_ID from a silently expired or user-revoked refresh token is not recoverable by retry logic. It requires explicit re-authorization — and when your agent is running on behalf of a customer, you find out when they file a support ticket.
  • The correct architecture decouples credential lifecycle from agent logic entirely. The agent passes a connected_account_id; everything below that boundary — proactive TTL-aware refresh, rotation conflict resolution, re-auth event emission, per-tenant credential isolation — belongs in an auth layer the agent never touches.
  • Scalekit AgentKit implements this. Your agent calls execute_tool. It never sees a token.

Your agent needs to pull updated deals from HubSpot, enrich them against Salesforce account records, and push a consolidated report to Google Sheets before the team logs on. You implement three OAuth flows, store the tokens in your database, add a refresh_if_expired() wrapper around each tool call, and ship it. Staging passes. The 30-minute integration test passes.

Three weeks in, a customer flags it: the report shows deal data from hour 1, but the Salesforce enrichment from hour 4 is missing. No errors in the logs. No failed jobs. The run was reported as completed.

What happened at hour 4 is not a bug in your pipeline logic. It is a structural property of how OAuth access tokens behave inside a persistent, multi-service agent runtime, and it surfaces in production, not in testing, because your integration test never ran long enough to hit a token boundary.

"Token expiry in a long-running agent is not an error condition. It is a scheduled event that the agent runtime should never have to manage."

The Assumption OAuth Makes That Agents Break

OAuth 2.0's access token model was built around a specific client profile: interactive, short-lived, browser-mediated. The user authenticates, the client holds a token for the duration of a session, the session ends. Token lifetime and client lifetime are approximately aligned. Expiry is rarely encountered because the user has already left.

Agents break that alignment completely. Your agent is persistent; its tokens are not. There is no user sitting at a browser who will re-authenticate when the token clock runs out. The agent just keeps running — with credentials that expired somewhere between tool call 14 and tool call 15.

The TTL constraints are set by the OAuth provider, not by the developer:

Provider
Access Token TTL
Refreshes in a 6hr Run
Refreshes in a 12hr Run
HubSpot
30 min
12
24
Google Workspace
60 min
6
12
Salesforce
15 min–2 hr (org-configurable)
3–24
6–48
Slack
No expiry by default; 12 hr with token rotation enabled
0 or 1
0 or 1

HubSpot's 30-minute TTL is fixed. Google's 1-hour TTL is similarly non-negotiable. Salesforce's session timeout is set by an org admin; the Salesforce security recommendation is 15 minutes, but production orgs are commonly configured higher. Slack tokens do not expire at all unless token rotation is explicitly enabled, in which case the TTL is 12 hours.

A 6-hour overnight agent run touching all three is not operating with credentials. It is operating on a schedule of credential renewals that most agent frameworks have no concept of.

The TTL numbers are not the problem. What happens inside the agent when those numbers run out is.

What Token Expiry Actually Looks Like in an Agent

A 401 in a traditional API client is clean. The HTTP status is unambiguous, the error propagates up the stack, it gets logged. You know what broke and why.

In an agent, that clarity is the exception. Whether and how a token expiry surfaces depends on three variables: how the tool wrapper handles non-200 responses, how the LLM interprets what comes back from the tool call, and what the agent framework's error recovery policy is. Most combinations of these three variables do not produce a clean exception.

Agents don't crash — they drift. They make poor decisions quietly, and monitoring says everything is fine.

Failure Mode
What Happens at the Agent Layer
Detection Difficulty
Silent skip
Tool returns null or {}. LLM treats it as "no data found" and continues. Run completes. Output is incomplete.
Very high; no exception, no alert
Hallucinated recovery
LLM receives error text in tool result ("401 Unauthorized"). Generates a plausible next action that bypasses the failed tool. Output looks coherent.
Extremely high; only visible in tool call audit trace
Partial write
Token expires mid-subtask after some write operations complete. Downstream tool calls execute against incomplete state. Failure surfaces later in a different system, attributed to the wrong cause.
High; cross-system correlation required
Hard stop
Framework surfaces the 401 as an uncaught exception. Run terminates with a stack trace.
Low; but the least common pattern with modern tool wrappers

The silent skip is the most dangerous. The agent reports the run as completed. The job appears in your history as successful. Data is missing from the output, and nobody knows until a downstream process fails or a customer notices.

Here is what it looks like. Your agent calls a HubSpot search tool. The HubSpot client returns 401 Unauthorized. The tool wrapper catches Exception broadly and returns an empty dict. The ReAct loop receives {}, and the LLM infers "no contacts matched the filter":

# Tool node in a LangGraph ReAct agent from hubspot import HubSpot from langchain_core.tools import tool hubspot_client = HubSpot(access_token=stored_token) @tool def search_hubspot_contacts(query: str) -> dict: try: from hubspot.crm.contacts import PublicObjectSearchRequest search_request = PublicObjectSearchRequest( query=query, properties=["email", "company", "dealstage"] ) response = hubspot_client.crm.contacts.search_api.do_search( public_object_search_request=search_request ) return {"results": [r.to_dict() for r in response.results]} except Exception as e: # 401, 404, network timeout -- all land here identically return {} # LLM sees "no results found" # LLM's next reasoning step after receiving {}: # Observation: {} # Thought: No HubSpot contacts matched the search criteria. # Action: Proceed to next step with empty contact list.

The return {} on a 401 is not a careless mistake. It is a defensive pattern that makes sense for handling legitimate "no results" cases. But it makes token expiry indistinguishable from an empty result set — and the LLM cannot tell the difference.

A single expired token produces one of these failure modes. Now consider what happens when HubSpot, Google Workspace, and Salesforce tokens are all expiring at different points across the same overnight run.

Three Clocks, No Shared Schedule Introduces Coordination Challenge

An agent running for 6 hours against HubSpot, Google Workspace, and Salesforce does not have one token expiry problem. It has an intersection problem.

The three TTL clocks run independently. There is no shared reference point. At any moment during the run, at least one token is within minutes of its boundary. The refresh events distribute unevenly across the timeline and cannot be predicted in advance without tracking each token's individual issue time.

Each refresh is a network round-trip. And that round-trip creates a concurrency hazard inside async agent loops.

# The check-then-act pattern: a race condition in async contexts async def call_hubspot_search(query: str, token_store: dict) -> dict: if is_expired(token_store["hubspot"]["expires_at"]): # Two concurrent tool calls both reach this line at the same time. # Both see the token as expired. # Both fire refresh_token() against HubSpot's OAuth endpoint. # HubSpot issues a new access_token to the first caller # and invalidates the refresh_token. # The second caller fires against an already-consumed refresh_token. # HubSpot returns invalid_grant. # token_store is now in a corrupt state. token_store["hubspot"] = await refresh_token("hubspot", token_store["hubspot"]["refresh_token"] ) from hubspot.crm.contacts import PublicObjectSearchRequest search_request = PublicObjectSearchRequest(query=query) response = await hubspot_client.crm.contacts.search_api.do_search( public_object_search_request=search_request, access_token=token_store["hubspot"]["access_token"] ) return {"results": [r.to_dict() for r in response.results]}

Two concurrent tool calls both detect expiry at the same moment. Both attempt a refresh. The first exchange succeeds and invalidates the old refresh_token. The second fires against a token that no longer exists and receives invalid_grant — with no indication in the error that a concurrent refresh caused it. Your token store is now in a corrupt state, and the next tool call against HubSpot fails regardless of whether the token appears valid in memory.

The multi-tenant variant is worse. If your agent is running for hundreds of customer accounts and those sessions started within a similar deployment window, their HubSpot tokens hit the 30-minute TTL boundary at the same time. Every session fires a refresh against HubSpot's OAuth endpoint simultaneously. This is a scheduled, deterministic load spike — the direct consequence of how the agent was deployed, not an edge case.

"The thundering herd is not an edge case. It is the predictable consequence of deploying agents at scale without a coordinated refresh layer."

All of this assumes the refresh token itself is still valid. It may not be.

When Refresh Tokens Die Quietly

Access token expiry is high-frequency and operational. Your refresh scheduler fires on the expires_in value, the exchange runs, the agent continues. It is noisy enough that you build the infrastructure for it.

Refresh token end-of-life is different. It is infrequent, provider-specific, nearly impossible to replicate in staging, and almost always discovered when a customer's agent stops working and they ask why.

Every agent doing delegated OAuth holds two token types with completely different failure semantics:

Token Type
Typical Lifetime
On Expiry
Recovery Path
Access token
30 min to 2 hr
401 on next API call
Automated: exchange refresh_token for new access_token
Refresh token
Provider-specific; days to indefinite; revocable at any time
invalid_grant or INVALID_SESSION_ID
Not automated: requires user to complete a new OAuth consent flow

The critical distinction: a failed access token refresh is a recoverable infrastructure event. A failed refresh token exchange is not. The agent cannot self-heal. The user must re-authorize.

Google invalid_grant returns an identical error string across three distinct scenarios, each with a different resolution path:

{ "error": "invalid_grant", "error_description": "Token has been expired or revoked." }

The three scenarios behind this single error:

  • App in OAuth Testing publish mode: refresh tokens expire after 7 days with no warning, no notification.
  • Refresh token rotation conflict: your agent's token store holds a refresh_token that was already consumed by a prior exchange — the rotation succeeded, but the new token failed to persist due to a network error between issuance and storage. The stored token is stale.
  • User or Workspace admin revocation: the user disconnected the app from their Google Account, or a Workspace admin revoked the OAuth grant across the org.

Salesforce INVALID_SESSION_ID fires when the org's session policy expires the grant, when the connected app's IP restriction changes, or when the consenting user's Salesforce profile loses "API Enabled" — which happens routinely during role changes and offboarding:

{ "error": "invalid_grant", "error_description": "expired access/refresh token" }

The rotation conflict deserves attention.

Refresh token rotation — introduced as a security countermeasure in RFC 6819 and now required by RFC 9700, the January 2025 OAuth Security BCP — specifies that when a refresh token is used, a new one is issued and the previous one is immediately invalidated. If a network failure occurs between the OAuth provider issuing the new refresh_token and your system storing it, the agent holds an invalidated token. The next refresh attempt fires invalid_grant with no indication the rotation conflict caused it.

For teams shipping agents to customers rather than internal tooling, the user-revocation scenario is the one that produces the most damage. A customer connects their Salesforce org to your agent in Q1. In Q3, the employee who completed the OAuth consent flow leaves the company. Standard offboarding revokes their connected app access. Your refresh_token for that customer's Salesforce org is now permanently invalid. No webhook fires from Salesforce. No error propagates to your system. The next time the agent calls a Salesforce tool, it produces a silent skip — and the customer files a support ticket.

"Access token expiry is noisy. Refresh token expiry is silent, infrequent, and almost always discovered by a customer, not a monitoring system."

Handling all of this correctly requires a fundamental architectural decision: the agent should not be aware that credentials exist.

Scalekit Manages Token Refresh for Long-Running Agents, So You Don't Have To

The standard progression for how agent teams handle token management follows a predictable path.

Stage 1: Credentials in environment variables or a startup config dict, injected at agent launch. Works for a single-tenant internal run with a short lifespan. Breaks the first time a token expires mid-run.

Stage 2: A TokenManager class that wraps each service client, checks TTL before each call, and refreshes on expiry. Better structure — but the agent's tool code is now credential-aware. Every tool function knows about token storage. Concurrent refresh races are unhandled. Multi-tenancy is an afterthought bolted onto a design that was never built for it.

Stage 3: An auth layer external to the agent entirely. The agent holds a connected_account_id. Every tool call resolves credentials on demand from a vault. TTL tracking, proactive refresh, rotation conflict resolution, and re-auth signaling happen below the agent's awareness. The agent calls a tool. It gets a result. It never sees a token.

"The agent should not know that tokens exist. That is not abstraction for convenience; it is abstraction for correctness."

Scalekit is that third stage.

How Scalekit plugs into the agent's tool loop

Scalekit operates at the tool call boundary. Three components wire together once:

  • Connections — configured once per connector (HubSpot, Google Workspace, Salesforce, Slack, and 100+ others). Holds the OAuth app credentials Scalekit needs to execute token exchanges on your behalf. One connection serves all users of that connector.
  • Connected Accounts — created per user, per connector, after the user completes an OAuth consent flow. Scalekit stores the access_token, refresh_token, expiry metadata, and provider-specific rotation state in the token vault. The connected_account_id is the only identifier the agent ever handles.
  • Tool schemas — Scalekit provides ready-to-use tool definitions (hubspot_search_contacts, salesforce_get_account, google_sheets_append_row, and 1,000+ others) that the LLM receives as its available tool list. The schemas define inputs and outputs; Scalekit handles authenticated execution behind each one.

When the agent calls execute_tool, Scalekit fetches the valid token from the vault for that connected_account_id, executes the authenticated API call, and returns structured output. The agent never sees a token:

# Agent tool loop with Scalekit AgentKit import scalekit # LLM receives Scalekit-provided tool schemas as its available tool list tools = await scalekit.get_tools(connected_account_id=connected_account_id) # The agent calls execute_tool -- no token, no auth header, no provider SDK result = await scalekit.execute_tool( connected_account_id=connected_account_id, tool_name="hubspot_search_contacts", tool_input={"query": "company:Acme dealstage:closedwon"} ) # result contains structured output # no credential was ever present in the agent's execution context

Compare this to the Stage 2 version — and what breaks at hour 4:

# Stage 2: agent owns its credentials and inherits all the failure modes async def search_hubspot_contacts(query: str, token_store: dict) -> dict: if is_expired(token_store["hubspot"]["expires_at"]): # Race condition: two concurrent tool calls both reach this check. # Both see expired. Both call refresh. The second exchange fires # against an already-consumed refresh_token. # Result: invalid_grant. token_store corrupt. Next call returns {}. token_store["hubspot"] = await refresh_token("hubspot", token_store["hubspot"]["refresh_token"] ) if not token_store["hubspot"].get("access_token"): return {} # LLM sees "no results" from hubspot.crm.contacts import PublicObjectSearchRequest search_request = PublicObjectSearchRequest(query=query) response = await hubspot_client.crm.contacts.search_api.do_search( public_object_search_request=search_request, access_token=token_store["hubspot"]["access_token"] ) return {"results": [r.to_dict() for r in response.results]}

How Scalekit handles each failure mode across the full run

Failure Mode
AgentKit Behavior
Silent skip
Proactive TTL refresh fires before expiry; execute_tool never fires against a stale token
Hallucinated recovery
Token is always valid at call time; LLM never receives auth error text in a tool result
Partial write
Refresh is serialized per connected_account_id; no tool call proceeds during an in-flight refresh
Concurrent refresh race
Distributed lock per connected_account_id per provider; only one refresh executes; waiting calls reuse the result
Rotation conflict
Rotation is handled atomically in the vault; a stale token is never persisted if a network failure interrupts the issuance-and-store sequence
Revoked refresh token
invalid_grant and INVALID_SESSION_ID surface as structured webhook events, not unhandled exceptions; the affected task pauses, re-auth triggers to the right channel

Re-auth without losing agent state

When a refresh token is permanently invalidated — user revocation, offboarding, org policy change — Scalekit's Agent Webhooks fire immediately. Not on the next tool call. Not after the agent has already produced a corrupted output. The webhook reaches your system before the agent takes another action:

{ "event": "agent.auth.reauth_required", "connected_account_id": "conn_01HXYZ", "provider": "salesforce", "tenant_id": "org_acme", "error_code": "invalid_grant", "reauth_url": "https://your-app.com/auth/salesforce/reconnect?token=...", "occurred_at": "2026-05-25T04:22:11Z" }

Your system catches this event, pauses the affected task for that tenant, and routes reauth_url to the appropriate channel — email, in-app notification, or webhook to the customer's own system. The rest of the agent run continues unaffected: other tenants, other providers, other tasks. Once the user completes reauth, Scalekit stores the new refresh token and emits an auth success event. The paused task resumes from its last checkpoint. No restart. No re-execution of completed steps.

Multi-tenant token isolation

Each connected_account_id maps to exactly one user's OAuth grant per connected service. Token storage is partitioned by connected_account_id; there is no shared credential pool. Refresh schedules are independent per account. Proactive refresh fires relative to each account's individual token issue time with jitter — so hundreds of agent sessions deployed in the same window do not synchronize their HubSpot refresh spikes against the OAuth endpoint.

The agent's tool call code does not change between single-tenant and multi-tenant deployments. The connected_account_id carries the tenant context. Scalekit enforces isolation.

What your agent owns vs. what Scalekit owns

Your Agent
Scalekit AgentKit
LLM orchestration and task logic
Token vault: acquisition, storage, encryption at rest
Tool call sequencing
Proactive TTL refresh per provider per connected_account_id
Re-auth notification routing
Atomic refresh token rotation with conflict resolution
Agent state checkpointing
invalid_grant detection and structured webhook emission
Business logic after re-auth
Staggered refresh scheduling; thundering herd mitigation
Per-tenant credential isolation; zero cross-tenant token access
Provider-specific TTL, rotation, and error semantics per connector

The last row is the one that compounds. HubSpot's fixed 30-minute TTL and non-expiring refresh tokens behave differently from Salesforce's org-configurable session timeout and INVALID_SESSION_ID semantics, which behave differently from Google's invalid_grant rotation conflicts. Every provider is a separate integration surface. AgentKit encapsulates all of it behind one interface — connected_account_id in, structured tool result out — and that interface does not change whether the agent runs for 30 minutes or 24 hours.

Get started with Scalekit →  

Token refresh deep dive →

FAQs

My agent framework has built-in tool retry logic. Why doesn't that handle token refresh?

Tool retry re-invokes the tool with the same credentials it had before. If those credentials expired, the retry fires against an expired token and produces the same failure — a silent skip, another invalid_grant, or a {} response the LLM interprets as empty results. Retry and token refresh are orthogonal mechanisms. The retry handler does not hold or rotate OAuth credentials; it just calls the function again. Conflating them is one of the most common production bugs in agent systems, and it is invisible in staging because tokens are always fresh when the test runs.

Can I sidestep this by running the agent in short windows and restarting with fresh tokens?

You can, with constraints that are often harder to satisfy than handling token refresh correctly. Every task must be idempotent: a partial write from a terminated run must be detectable and undoable before the resumed run re-executes it. The agent's full execution state must be serializable and restorable at an arbitrary checkpoint. For write-heavy agents — CRM updates, ticket creation, calendar scheduling — idempotency is a non-trivial requirement. LangGraph's persistence layer checkpoints state at each step, which helps — but the resumed run still needs valid credentials on restart, which brings you back to the same problem.

What is the correct behavior when a token expires while a tool call is already in flight?

In a check-then-act implementation, two concurrent tool calls that both detect expiry at the same moment will both attempt a refresh. If the provider uses refresh token rotation — introduced in RFC 6819 Section 5.2.2.3 and required by RFC 9700 — the second refresh attempt fires against a now-consumed refresh token and returns invalid_grant. The correct behavior is proactive: refresh fires before the TTL boundary, under a distributed lock per connected_account_id per provider, so no tool call ever races a refresh. If a reactive refresh is still needed, the lock serializes it — one refresh executes, waiting tool calls reuse the result.

How does per-tenant token isolation work across hundreds of concurrent agent sessions?

Each connected_account_id maps to one user's OAuth grant per connected service. Token storage is partitioned by connected_account_id; there is no shared credential pool. A refresh event for one tenant's HubSpot token has no effect on any other tenant's HubSpot token — separate storage partitions, separate refresh schedules, separate webhook event streams. Proactive refresh fires relative to each account's individual token issue time with jitter, so agents deployed in batch do not generate a synchronized refresh spike against the OAuth provider.

When a refresh token is permanently revoked, how do we re-authorize without losing agent state?

Scalekit's Agent Webhooks emit an agent.auth.reauth_required event immediately when a refresh exchange fails with a terminal error. Your system catches the event, pauses the affected agent task — not the full run — and routes the reauth_url to the user through whatever channel is appropriate. Once the user completes reauthorization, Scalekit stores the new refresh token and emits an auth success event. The paused task resumes from its last checkpoint. No restart, no re-execution of completed steps, no other tenants or providers affected.

No items found.
Agent Auth Quickstart
Share this article
Agent Auth Quickstart

Acquire enterprise customers with zero upfront cost

Every feature unlocked. No hidden fees.
Start Free
$0
/ month
1 million Monthly Active Users
100 Monthly Active Organizations
1 SSO connection
1 SCIM connection
10K Connected Accounts
Unlimited Dev & Prod environments