
Your agent needs to pull updated deals from HubSpot, enrich them against Salesforce account records, and push a consolidated report to Google Sheets before the team logs on. You implement three OAuth flows, store the tokens in your database, add a refresh_if_expired() wrapper around each tool call, and ship it. Staging passes. The 30-minute integration test passes.
Three weeks in, a customer flags it: the report shows deal data from hour 1, but the Salesforce enrichment from hour 4 is missing. No errors in the logs. No failed jobs. The run was reported as completed.
What happened at hour 4 is not a bug in your pipeline logic. It is a structural property of how OAuth access tokens behave inside a persistent, multi-service agent runtime, and it surfaces in production, not in testing, because your integration test never ran long enough to hit a token boundary.
"Token expiry in a long-running agent is not an error condition. It is a scheduled event that the agent runtime should never have to manage."
OAuth 2.0's access token model was built around a specific client profile: interactive, short-lived, browser-mediated. The user authenticates, the client holds a token for the duration of a session, the session ends. Token lifetime and client lifetime are approximately aligned. Expiry is rarely encountered because the user has already left.
Agents break that alignment completely. Your agent is persistent; its tokens are not. There is no user sitting at a browser who will re-authenticate when the token clock runs out. The agent just keeps running — with credentials that expired somewhere between tool call 14 and tool call 15.
The TTL constraints are set by the OAuth provider, not by the developer:
HubSpot's 30-minute TTL is fixed. Google's 1-hour TTL is similarly non-negotiable. Salesforce's session timeout is set by an org admin; the Salesforce security recommendation is 15 minutes, but production orgs are commonly configured higher. Slack tokens do not expire at all unless token rotation is explicitly enabled, in which case the TTL is 12 hours.
A 6-hour overnight agent run touching all three is not operating with credentials. It is operating on a schedule of credential renewals that most agent frameworks have no concept of.
The TTL numbers are not the problem. What happens inside the agent when those numbers run out is.
A 401 in a traditional API client is clean. The HTTP status is unambiguous, the error propagates up the stack, it gets logged. You know what broke and why.
In an agent, that clarity is the exception. Whether and how a token expiry surfaces depends on three variables: how the tool wrapper handles non-200 responses, how the LLM interprets what comes back from the tool call, and what the agent framework's error recovery policy is. Most combinations of these three variables do not produce a clean exception.
Agents don't crash — they drift. They make poor decisions quietly, and monitoring says everything is fine.
The silent skip is the most dangerous. The agent reports the run as completed. The job appears in your history as successful. Data is missing from the output, and nobody knows until a downstream process fails or a customer notices.
Here is what it looks like. Your agent calls a HubSpot search tool. The HubSpot client returns 401 Unauthorized. The tool wrapper catches Exception broadly and returns an empty dict. The ReAct loop receives {}, and the LLM infers "no contacts matched the filter":
The return {} on a 401 is not a careless mistake. It is a defensive pattern that makes sense for handling legitimate "no results" cases. But it makes token expiry indistinguishable from an empty result set — and the LLM cannot tell the difference.
A single expired token produces one of these failure modes. Now consider what happens when HubSpot, Google Workspace, and Salesforce tokens are all expiring at different points across the same overnight run.
An agent running for 6 hours against HubSpot, Google Workspace, and Salesforce does not have one token expiry problem. It has an intersection problem.
The three TTL clocks run independently. There is no shared reference point. At any moment during the run, at least one token is within minutes of its boundary. The refresh events distribute unevenly across the timeline and cannot be predicted in advance without tracking each token's individual issue time.
Each refresh is a network round-trip. And that round-trip creates a concurrency hazard inside async agent loops.
Two concurrent tool calls both detect expiry at the same moment. Both attempt a refresh. The first exchange succeeds and invalidates the old refresh_token. The second fires against a token that no longer exists and receives invalid_grant — with no indication in the error that a concurrent refresh caused it. Your token store is now in a corrupt state, and the next tool call against HubSpot fails regardless of whether the token appears valid in memory.
The multi-tenant variant is worse. If your agent is running for hundreds of customer accounts and those sessions started within a similar deployment window, their HubSpot tokens hit the 30-minute TTL boundary at the same time. Every session fires a refresh against HubSpot's OAuth endpoint simultaneously. This is a scheduled, deterministic load spike — the direct consequence of how the agent was deployed, not an edge case.
"The thundering herd is not an edge case. It is the predictable consequence of deploying agents at scale without a coordinated refresh layer."
All of this assumes the refresh token itself is still valid. It may not be.
Access token expiry is high-frequency and operational. Your refresh scheduler fires on the expires_in value, the exchange runs, the agent continues. It is noisy enough that you build the infrastructure for it.
Refresh token end-of-life is different. It is infrequent, provider-specific, nearly impossible to replicate in staging, and almost always discovered when a customer's agent stops working and they ask why.
Every agent doing delegated OAuth holds two token types with completely different failure semantics:
The critical distinction: a failed access token refresh is a recoverable infrastructure event. A failed refresh token exchange is not. The agent cannot self-heal. The user must re-authorize.
Google invalid_grant returns an identical error string across three distinct scenarios, each with a different resolution path:
The three scenarios behind this single error:
Salesforce INVALID_SESSION_ID fires when the org's session policy expires the grant, when the connected app's IP restriction changes, or when the consenting user's Salesforce profile loses "API Enabled" — which happens routinely during role changes and offboarding:
Refresh token rotation — introduced as a security countermeasure in RFC 6819 and now required by RFC 9700, the January 2025 OAuth Security BCP — specifies that when a refresh token is used, a new one is issued and the previous one is immediately invalidated. If a network failure occurs between the OAuth provider issuing the new refresh_token and your system storing it, the agent holds an invalidated token. The next refresh attempt fires invalid_grant with no indication the rotation conflict caused it.
For teams shipping agents to customers rather than internal tooling, the user-revocation scenario is the one that produces the most damage. A customer connects their Salesforce org to your agent in Q1. In Q3, the employee who completed the OAuth consent flow leaves the company. Standard offboarding revokes their connected app access. Your refresh_token for that customer's Salesforce org is now permanently invalid. No webhook fires from Salesforce. No error propagates to your system. The next time the agent calls a Salesforce tool, it produces a silent skip — and the customer files a support ticket.
"Access token expiry is noisy. Refresh token expiry is silent, infrequent, and almost always discovered by a customer, not a monitoring system."
Handling all of this correctly requires a fundamental architectural decision: the agent should not be aware that credentials exist.
The standard progression for how agent teams handle token management follows a predictable path.
Stage 1: Credentials in environment variables or a startup config dict, injected at agent launch. Works for a single-tenant internal run with a short lifespan. Breaks the first time a token expires mid-run.
Stage 2: A TokenManager class that wraps each service client, checks TTL before each call, and refreshes on expiry. Better structure — but the agent's tool code is now credential-aware. Every tool function knows about token storage. Concurrent refresh races are unhandled. Multi-tenancy is an afterthought bolted onto a design that was never built for it.
Stage 3: An auth layer external to the agent entirely. The agent holds a connected_account_id. Every tool call resolves credentials on demand from a vault. TTL tracking, proactive refresh, rotation conflict resolution, and re-auth signaling happen below the agent's awareness. The agent calls a tool. It gets a result. It never sees a token.
"The agent should not know that tokens exist. That is not abstraction for convenience; it is abstraction for correctness."
Scalekit is that third stage.
Scalekit operates at the tool call boundary. Three components wire together once:
When the agent calls execute_tool, Scalekit fetches the valid token from the vault for that connected_account_id, executes the authenticated API call, and returns structured output. The agent never sees a token:
Compare this to the Stage 2 version — and what breaks at hour 4:
When a refresh token is permanently invalidated — user revocation, offboarding, org policy change — Scalekit's Agent Webhooks fire immediately. Not on the next tool call. Not after the agent has already produced a corrupted output. The webhook reaches your system before the agent takes another action:
Your system catches this event, pauses the affected task for that tenant, and routes reauth_url to the appropriate channel — email, in-app notification, or webhook to the customer's own system. The rest of the agent run continues unaffected: other tenants, other providers, other tasks. Once the user completes reauth, Scalekit stores the new refresh token and emits an auth success event. The paused task resumes from its last checkpoint. No restart. No re-execution of completed steps.
Each connected_account_id maps to exactly one user's OAuth grant per connected service. Token storage is partitioned by connected_account_id; there is no shared credential pool. Refresh schedules are independent per account. Proactive refresh fires relative to each account's individual token issue time with jitter — so hundreds of agent sessions deployed in the same window do not synchronize their HubSpot refresh spikes against the OAuth endpoint.
The agent's tool call code does not change between single-tenant and multi-tenant deployments. The connected_account_id carries the tenant context. Scalekit enforces isolation.
The last row is the one that compounds. HubSpot's fixed 30-minute TTL and non-expiring refresh tokens behave differently from Salesforce's org-configurable session timeout and INVALID_SESSION_ID semantics, which behave differently from Google's invalid_grant rotation conflicts. Every provider is a separate integration surface. AgentKit encapsulates all of it behind one interface — connected_account_id in, structured tool result out — and that interface does not change whether the agent runs for 30 minutes or 24 hours.
Tool retry re-invokes the tool with the same credentials it had before. If those credentials expired, the retry fires against an expired token and produces the same failure — a silent skip, another invalid_grant, or a {} response the LLM interprets as empty results. Retry and token refresh are orthogonal mechanisms. The retry handler does not hold or rotate OAuth credentials; it just calls the function again. Conflating them is one of the most common production bugs in agent systems, and it is invisible in staging because tokens are always fresh when the test runs.
You can, with constraints that are often harder to satisfy than handling token refresh correctly. Every task must be idempotent: a partial write from a terminated run must be detectable and undoable before the resumed run re-executes it. The agent's full execution state must be serializable and restorable at an arbitrary checkpoint. For write-heavy agents — CRM updates, ticket creation, calendar scheduling — idempotency is a non-trivial requirement. LangGraph's persistence layer checkpoints state at each step, which helps — but the resumed run still needs valid credentials on restart, which brings you back to the same problem.
In a check-then-act implementation, two concurrent tool calls that both detect expiry at the same moment will both attempt a refresh. If the provider uses refresh token rotation — introduced in RFC 6819 Section 5.2.2.3 and required by RFC 9700 — the second refresh attempt fires against a now-consumed refresh token and returns invalid_grant. The correct behavior is proactive: refresh fires before the TTL boundary, under a distributed lock per connected_account_id per provider, so no tool call ever races a refresh. If a reactive refresh is still needed, the lock serializes it — one refresh executes, waiting tool calls reuse the result.
Each connected_account_id maps to one user's OAuth grant per connected service. Token storage is partitioned by connected_account_id; there is no shared credential pool. A refresh event for one tenant's HubSpot token has no effect on any other tenant's HubSpot token — separate storage partitions, separate refresh schedules, separate webhook event streams. Proactive refresh fires relative to each account's individual token issue time with jitter, so agents deployed in batch do not generate a synchronized refresh spike against the OAuth provider.
Scalekit's Agent Webhooks emit an agent.auth.reauth_required event immediately when a refresh exchange fails with a terminal error. Your system catches the event, pauses the affected agent task — not the full run — and routes the reauth_url to the user through whatever channel is appropriate. Once the user completes reauthorization, Scalekit stores the new refresh token and emits an auth success event. The paused task resumes from its last checkpoint. No restart, no re-execution of completed steps, no other tenants or providers affected.