May 26, 2026

Multi-Step Tool Calling: Managing Auth State Across a Chain of Tool Calls

Q: What if the LLM calls a tool the connected account has no scope for?

The execute_tool call returns a 403. Run list_scoped_tools pre-flight to detect this before the chain starts and any writes are committed. If it surfaces mid-chain despite the pre-flight check (a new tool added post-deployment, for example), halt the chain, log completed steps, call get_authorization_link to generate a re-consent URL, and surface it to the user or operator. Do not retry; scope failures require user interaction.

Team Scalekit

TL;DR

Multi-step agent tool chains expose three auth failure modes that single-step calls never surface: identity drift, scope window collapse, and temporal auth failure under partial write.
Agent execution graphs are non-deterministic; the LLM branches to tools whose scopes were never requested at chain initialization, and no auth library prevents this automatically.
Tokens valid at step one can expire or be revoked before step three; writes already committed to external systems at steps one and two are irreversible without compensation logic the agent must own.
A consistent ^identifier plus ^{connection_name} anchoring all steps to a single connected account resolves identity drift; it does not resolve scope or temporal failures.
Scalekit's connected account model enforces credential consistency across the full chain. Partial failure detection and compensation remain the agent developer's design responsibility.

An agent writing a Salesforce record, creating a Jira ticket from that record's data, and then notifying a Slack channel about the ticket: each step follows from the last. The chain is sequential. The auth tokens were issued once, at the start. What could go wrong?

Three months into production, you find out.

The Salesforce write is executed under the requesting user's delegated OAuth grant. The Jira ticket was created under the service account your team configured to avoid per-user Jira OAuth setup. The Slack message went out under a third credential set tied to the workspace integration you built six months ago. Three writes. Three different ^sub claims. One logical agent action. Your audit trail shows three unrelated principals acting independently. Your SOC 2 auditor asks: "Under whose authorization did the agent modify this Salesforce record?" You open your logs and cannot answer the question.

That is the identity drift problem. It is not the only one.

The Execution Graph Is Not a Transaction

When developers see a multi-step tool chain, the mental model that arrives first is a transaction: A, then B, then C; if C fails, roll back B and A. It is wrong in two ways that matter to agent auth design.

First, there is no rollback. A Salesforce record created, a Jira ticket opened, a Slack message sent: these are durable writes to external systems. They do not participate in any transaction boundary. The agent's execution environment has no commit or abort signal. When step three fails, step one and step two are already in production. Calling this a "rollback problem" misidentifies it. The correct term is compensation: explicitly reversing prior writes by calling the systems that received them, using the same authorization context that created them, knowing that some systems have no compensation API at all.
Second, the execution path is not deterministic. A three-step chain described in a system prompt is an advisory sequence, not a guarantee. Tools are a new kind of software reflecting a contract between deterministic systems and non-deterministic agents. The same input, the same tool definitions, the same system prompt can produce different execution paths across invocations. The LLM may decide, mid-chain, that the Salesforce record requires a calendar event before the Jira ticket. Your token scope did not include ^{calendar.write}. The chain now has a scope problem that was never planned for, and the auth system won't prevent the agent from attempting that call.

Property

DB Transaction

Agent Tool Chain

Atomicity

All or nothing

Each step commits independently

Isolation

Changes invisible until commit

Each write is immediately durable and visible

Rollback

Native

No native rollback; compensation only

Execution path

Deterministic

LLM-driven; non-deterministic

Auth scope

Single connection, single principal

Per-tool, per-step; potentially different principal

Failure recovery

Abort and retry entire transaction

Retry individual steps; compensate prior writes

The implication: auth failures mid-chain are not exceptions to handle gracefully. They are expected runtime events that require explicit design before the chain runs, not after it breaks.

Three Failure Modes That Only Appear in Multi-Step Chains

Single-step tool calls have one failure mode: the credential is invalid. Multi-step chains have three, and each requires a different response.

1. Identity Drift

Identity drift occurs when different steps in the same chain resolve credentials against different authorization principals, silently, without the agent knowing.

The naive implementation resolves credentials independently at each step:

# Fragile: each tool may resolve credentials from a different source def run_chain(user_id: str, request_id: str): crm_result = call_tool("salesforce_update", user_id=user_id) ticket_result = call_tool("jira_create", user_id=user_id) slack_result = call_tool("slack_notify", user_id=user_id)

This looks correct. It passes ^user_id consistently. But each connector may resolve credentials differently behind the scenes. If Salesforce uses delegated OAuth tied to the requesting user, Jira uses a service account your team configured organization-wide, and Slack resolves to a workspace integration with its own credential set, then all three calls succeed, all three return 200, and you have three different ^sub claims across what the LLM treated as a single agent action.

The audit trail is incoherent. The blast radius of any credential compromise is impossible to scope correctly. The compliance answer to "who authorized this?" is three answers instead of one.

Here is what makes identity drift particularly dangerous: it does not cause failures. The chain completes. Everything logs as successful. The incoherence only surfaces in a security review, an audit, or an incident investigation.

2. Scope Window Collapse

OAuth consent is collected once, at ^{connected_account} creation. The scopes available to every tool call in the chain are fixed at that moment. The LLM does not know what scopes were granted; it only knows what tools are defined.

When the LLM branches to a tool whose required scope was not in the original grant, the call fails with a 403. The chain halts mid-execution. Prior writes are already durable. You cannot request additional scopes interactively, because headless background agents have no user to redirect through a consent flow.

79% of enterprises say they have adopted AI agents, but a big chunk finds agent security and trust as a deterring factor for taking them into production. Scope window collapse is one of the failure modes at the crux of this. The agent works in staging, where execution paths are short and controlled. In production, the LLM finds a more efficient path through the tools that requires a scope nobody requested when the connected account was set up three months ago.

The instinct is to request all scopes upfront: grant everything the agent might ever need. This is operationally incorrect. Over-scoped tokens fail enterprise security reviews. They violate the least-privilege principle that governs every serious production deployment. A token with ^crm.read, ^crm.write, ^{calendar.write}, ^email.send, ^files.delete is a credential exposure waiting to happen, and no security team will approve it.

The correct approach is decision-graph-aware scoping: enumerate every tool the agent can call given its system prompt and tool definitions, compute the union of required scopes across all reachable execution paths, and request exactly that set. This is derivable statically from the tool spec. It is not a wildcard grant. It is the minimal set that covers all paths the LLM can actually take.

3. Temporal Auth Failure Under Partial Write

Steps one and two executed successfully. Their writes are committed to external systems. Between step two and step three, the access token expired, or the user explicitly revoked the OAuth grant. Step three fails with a 401 or 403.

The reason matters: token expiry is recoverable with a refreshed token if the underlying refresh token is still valid. Revocation is not recoverable automatically; the user or an admin terminated the authorization intentionally, and any automated retry without explicit re-consent would be unauthorized access.

The partial write state is the hard problem. Retrying step three is correct for token expiry. It does not roll back step one and step two. If step three was supposed to create a record that references the data created at step one and step two, the system is now in an inconsistent intermediate state across two external systems that have no transaction relationship with each other.

This is not an auth problem, rather a compensation design problem that auth failures expose.

Identity Continuity: Anchoring the Chain to a Single Connected Account

The fix for identity drift is architectural, not operational.

Every tool call in a chain must resolve credentials against the same authorization principal. That principal is established once, at chain initialization, and propagated to every subsequent step regardless of which tool the LLM decides to call. In Scalekit's model, a connected account is a specific user's authorization grant for a specific connector. It maps to exactly one ^sub claim. Every ^execute_tool call that uses the same ^identifier plus ^{connection_name} pair routes through the same connected account; the vault injects the correct scoped credential and the audit trail stays coherent.

import scalekit.client import os scalekit_client = scalekit.client.ScalekitClient( client_id=os.getenv("SCALEKIT_CLIENT_ID"), client_secret=os.getenv("SCALEKIT_CLIENT_SECRET"), env_url=os.getenv("SCALEKIT_ENV_URL"), ) actions = scalekit_client.actions # Wrong: resolution happens independently per tool call; # each connector may pull credentials from a different source def run_chain_fragile(user_id: str, request_id: str): crm_result = call_tool("salesforce_update", user_id=user_id) ticket_result = call_tool("jira_create", user_id=user_id) slack_result = call_tool("slack_notify", user_id=user_id) # Correct: the same identifier + connection_name pair anchors every step # to a single connected account and a single sub claim def run_chain_stable(identifier: str, request_id: str): crm_result = actions.execute_tool( tool_name="salesforce_get_opportunity", identifier=identifier, connection_name=os.getenv("SALESFORCE_CONNECTION_NAME"), tool_input={"record_id": request_id}, ) ticket_result = actions.execute_tool( tool_name="jira_create_issue", identifier=identifier, connection_name=os.getenv("JIRA_CONNECTION_NAME"), tool_input={"summary": crm_result.data["name"], "description": crm_result.data["notes"]}, ) slack_result = actions.execute_tool( tool_name="slack_send_message", identifier=identifier, connection_name=os.getenv("SLACK_CONNECTION_NAME"), tool_input={"channel": "#approvals", "text": f"Ticket created: {ticket_result.data['key']}"}, )

The difference is not cosmetic. The fragile implementation allows each connector to resolve credentials independently. The stable implementation routes every call through the same user's connected account for each connector. Scalekit injects the correct scoped credential per connector from the vault; the agent never sees a token. What the agent controls is whether it provides a consistent identity anchor per step.

One ^identifier per chain execution. If the use case genuinely requires multiple users' authorization within a single chain (a co-approval workflow, for example), model this as two explicit chain contexts with a handoff between them, not as ^identifier swapped mid-execution.

Scope Design for Non-Deterministic Execution Paths

The scope set available to a chain is fixed at ^{connected_account} creation. The LLM's execution path is not fixed at chain initialization. These two facts create the scope window problem.

The response is not to expand the scope set to cover every conceivable tool the agent might ever call. That path leads to over-scoped tokens that fail enterprise security review. The response is to compute the required scope set from the agent's decision graph before the connected account is created, not after the first 403 surfaces in production.

Three scope design strategies and their tradeoffs:

Strategy

Description

Production Risk

Exact-match scoping

Request only scopes for the planned execution path

Fails immediately on any LLM branch; 403 mid-chain with durable prior writes

Wildcard grant

Request all scopes for every tool in the catalog

Over-scoped tokens; fails enterprise security review; violates least-privilege

Decision-graph-aware scoping

Enumerate all reachable tool calls from the agent's system prompt and tool definitions; request the minimal scope union

Correct; requires explicit tool graph definition before deployment

Decision-graph-aware scoping works because the set of tools an agent can call is knowable statically. The LLM does not invent tool names; it selects from the tool definitions provided in the request. Every tool definition carries its required scopes. The reachable tool set from any given system prompt and tool list is finite and enumerable. The scope union across that set is the minimum viable grant.

In Scalekit, ^{list_scoped_tools} returns the tool definitions the connected account is currently authorized to call. Running it before chain start confirms whether the scope set covers every reachable tool; a missing tool name is an early signal to trigger re-consent before any writes are attempted:

from google.protobuf.json_format import MessageToDict # Enumerate tools the connected account can actually call # before the chain starts; surface scope gaps before any write is attempted scoped_response, _ = actions.tools.list_scoped_tools( identifier="user_123", filter={"connection_names": [ os.getenv("SALESFORCE_CONNECTION_NAME"), os.getenv("JIRA_CONNECTION_NAME"), os.getenv("SLACK_CONNECTION_NAME"), ]}, ) available_tool_names = set() for scoped_tool in scoped_response.tools: definition = MessageToDict(scoped_tool.tool).get("definition", {}) available_tool_names.add(definition.get("name")) required_tools = {"salesforce_get_opportunity", "jira_create_issue", "slack_send_message"} missing = required_tools - available_tool_names if missing: # Surface re-consent before the chain starts; not after a mid-chain 403 auth_link = actions.get_authorization_link( connection_name=os.getenv("SALESFORCE_CONNECTION_NAME"), # re-consent per connector identifier="user_123", ) raise RuntimeError(f"Connected account lacks scope for: {missing}. " f"Re-consent required: {auth_link.link}")

The re-consent edge case: when a new tool is added to the agent post-deployment, existing connected accounts lack that tool's required scope. The pre-flight check above surfaces this before the chain runs. Running it reactively on a 403 mid-chain is too late; writes at prior steps are already committed.

Compensation Semantics When Auth Fails Mid-Chain

Token expiry mid-chain is not the hard problem. Scalekit handles proactive refresh automatically; the agent does not write refresh logic. The hard problem is what the agent does when auth fails after durable writes have already been committed to external systems.

The mental model most engineers reach for is "retry the failed step." This is correct for transient expiry. It is incorrect as a general response to mid-chain auth failure, because it ignores the state of the systems that already received writes.

The compensation decision tree every chain needs:

Step N auth failure
- Is the failure a 401 (token expired)?
  - Scalekit handles proactive refresh automatically. If the call still fails after refresh, the refresh token itself is invalid (revocation or extended inactivity). Treat as revocation; do not retry automatically.
- Is the failure a 403 (insufficient scope)?
  - Scope was not in the grant at ^{connected_account} creation. Cannot auto-recover; requires explicit re-consent. Log completed steps; trigger ^{get_authorization_link}; block retry until re-consent is confirmed.
- Is the failure a 403 (revoked grant)?
  - User or admin explicitly terminated the OAuth grant. Cannot auto-recover; requires explicit re-authorization. Execute compensation for prior writes if system is inconsistent. Alert operator; do not retry without re-consent.
- For all non-transient failures: Are steps 1..N-1 in a consistent state without step N?
  - Yes: log partial completion; surface to operator.
  - No: execute compensation for writes that leave the system inconsistent; use the same ^identifier and ^{connection_name} as the original writes.

Two principles govern compensation execution. First, compensation calls must use the same ^identifier and ^{connection_name} as the original writes. Compensating a write made under a different identity will fail authorization at the application layer. Second, if the compensation tool is unavailable (no delete endpoint, no cancel API), the agent must alert the operator and block further execution; silent partial state is the failure mode that produces compliance incidents.

Here is a production-grade three-step chain with Scalekit, implementing pre-flight scope checking and compensation on auth failure:

import scalekit.client import os from google.protobuf.json_format import MessageToDict scalekit_client = scalekit.client.ScalekitClient( client_id=os.getenv("SCALEKIT_CLIENT_ID"), client_secret=os.getenv("SCALEKIT_CLIENT_SECRET"), env_url=os.getenv("SCALEKIT_ENV_URL"), ) actions = scalekit_client.actions SALESFORCE_CONN = os.getenv("SALESFORCE_CONNECTION_NAME") JIRA_CONN = os.getenv("JIRA_CONNECTION_NAME") SLACK_CONN = os.getenv("SLACK_CONNECTION_NAME") def preflight_scope_check(identifier: str): """Verify all required tools are in scope before any write is attempted.""" scoped_response, _ = actions.tools.list_scoped_tools( identifier=identifier, filter={"connection_names": [SALESFORCE_CONN, JIRA_CONN, SLACK_CONN]}, ) available = { MessageToDict(t.tool).get("definition", {}).get("name") for t in scoped_response.tools } required = { "salesforce_get_opportunity", "jira_create_issue", "slack_send_message", "jira_delete_issue", # compensation tool for step 2 } missing = required - available if missing: link = actions.get_authorization_link( connection_name=SALESFORCE_CONN, identifier=identifier, ) raise PermissionError(f"Scope gap detected before chain start: {missing}. " f"Re-consent URL: {link.link}") def run_procurement_chain(identifier: str, record_id: str): execution_id = f"proc-{record_id}" completed_steps = [] # Pre-flight: surface scope gaps before any durable write preflight_scope_check(identifier) try: # Step 1: Read opportunity from Salesforce (read; safe to retry) crm_result = actions.execute_tool( tool_name="salesforce_get_opportunity", identifier=identifier, connection_name=SALESFORCE_CONN, tool_input={"record_id": record_id}, ) completed_steps.append(("salesforce_read", crm_result)) # Step 2: Create Jira issue (write; durable on success) ticket_result = actions.execute_tool( tool_name="jira_create_issue", identifier=identifier, connection_name=JIRA_CONN, tool_input={ "summary": crm_result.data["name"], "description": crm_result.data["notes"], "priority": "High", }, ) completed_steps.append(("jira_create", ticket_result)) # Step 3: Notify approver on Slack (write; auth context may have # shifted between step 1 and now if execution took significant time) slack_result = actions.execute_tool( tool_name="slack_send_message", identifier=identifier, connection_name=SLACK_CONN, tool_input={ "channel": "#approvals", "text": f"Review required: {ticket_result.data['key']} -- {crm_result.data['name']}", }, ) completed_steps.append(("slack_notify", slack_result)) except Exception as e: status_code = getattr(e, "status_code", None) if status_code == 401: notify_operator( chain_id=execution_id, message=("Token refresh failed mid-chain. Refresh token invalid or revoked. " f"Completed steps: {[s[0] for s in completed_steps]}. " "Re-authorization required."), ) compensate_partial_writes(completed_steps, identifier) elif status_code == 403: link = actions.get_authorization_link( connection_name=SALESFORCE_CONN, identifier=identifier, ) notify_operator( chain_id=execution_id, message=(f"Auth failure (403) at step after: {[s[0] for s in completed_steps]}. " f"Re-consent URL: {link.link}"), ) compensate_partial_writes(completed_steps, identifier) else: raise def compensate_partial_writes(completed_steps: list, identifier: str): """ Compensate in reverse order using the same identifier that made the writes. Compensation calls must use the same connected account as the original writes. """ for step_name, result in reversed(completed_steps): if step_name == "jira_create": actions.execute_tool( tool_name="jira_delete_issue", identifier=identifier, connection_name=JIRA_CONN, tool_input={"issue_key": result.data["key"]}, ) def notify_operator(chain_id: str, message: str): # Route to your alerting system (PagerDuty, Slack ops channel, etc.) print(f"[OPERATOR ALERT] Chain {chain_id}: {message}")

Three things to observe. The ^identifier never changes between steps; Scalekit maps it to the same connected account for each connector. Compensation uses the same ^identifier as the original writes; using a different identity to undo a write will fail at the application layer regardless of whether the token is valid. The error handlers distinguish 401 (token lifecycle failure) from 403 (authorization failure) because the correct operator message and recovery path differ.

What Scalekit Handles vs. What You Own

A consistent ^identifier plus ^{connection_name} pair in Scalekit enforces one thing precisely: all tool calls route through the same user's connected account, resolving the same vault entry, producing the same ^sub claim on every call. Token storage, refresh orchestration, proactive rotation, per-connector provider edge cases, and the immutable audit trail linking each call to its authorization event: Scalekit handles all of these.

Concern

Scalekit

Agent Developer

Credential resolution per tool call

Vault lookup, decryption, injection

Pass the same identifier + connection_name through every call

Token refresh during chain execution

Proactive refresh before expiry; per-connector provider logic

Handle 401 as signal that refresh token is invalid; surface to operator

Identity consistency across tools

Routes to same connected account when identifier is stable

Initialize identifier once at chain entry; never change mid-chain

Scope enforcement

Enforces scopes granted at consent; returns 403 on violation

Run list_scoped_tools pre-flight; surface gaps before writes begin

Audit trail per tool call

Immutable log per execute_tool call linked to the authorization event

Add chain-level execution_id to correlate steps in your own logs

Auth URL generation

get_authorization_link returns re-consent URL

Detect 403; route user through re-consent before retry

Compensation logic

None; domain-specific

Define per-tool compensation actions; execute in reverse order on failure

The boundary matters precisely because of what auth cannot know. Scalekit knows whether a token is valid, which principal it belongs to, and whether the scope set covers the requested operation. It does not know whether a Jira ticket created at step two is in a consistent state without the Salesforce update at step three. That is domain logic. Treating auth as a complete solution to partial failure is the design error that produces incidents.

FAQs

Can I use different identifier values for different tools in the same chain?

Technically, yes. In practice, no. Different identifiers in one chain mean different users' authorization grants are mixed in a single execution context. The audit trail shows multiple principals authoring what is logically one agent action. If the use case genuinely requires multi-user authorization within one chain (a co-approval workflow is the most common legitimate case), model it as two explicit chain contexts with a handoff between them, not as ^identifier swapped mid-execution.

What if the LLM calls a tool the connected account has no scope for?

The ^execute_tool call returns a 403. Run ^{list_scoped_tools} pre-flight to detect this before the chain starts and any writes are committed. If it surfaces mid-chain despite the pre-flight check (a new tool added post-deployment, for example), halt the chain, log completed steps, call ^{get_authorization_link} to generate a re-consent URL, and surface it to the user or operator. Do not retry; scope failures require user interaction.

Does Scalekit handle token refresh automatically during a running chain?

Yes. Scalekit refreshes tokens proactively before expiry, not reactively on 401. In normal operation the agent never sees a token expiry error mid-chain. A 401 surfacing to the agent means the refresh itself failed, which is consistent with the refresh token being revoked or expired due to extended inactivity. At that point re-authorization (a new consent flow) is required; automated retry will not succeed.

Token refresh succeeded but the call still returned 403. What happened?

A 403 after successful refresh is a scope failure, not a token lifecycle failure. The refreshed token carries the same scope set as the expired one; the scope problem predates the expiry. Check the scope set on the connected account via ^{list_scoped_tools} against the tool's required scopes. The resolution is re-consent with the missing scope added, not a retry.

What happens to prior writes if re-consent fails because the user is unavailable?

This is the operational gap most chains do not plan for. The agent cannot proceed. Prior writes are durable. The options are: execute compensation for the prior writes immediately and surface a "re-authorization required" notification for clean retry; or hold the chain in a suspended state and resume from the failed step once the user completes re-consent. The worst choice is silently leaving partial state with no operator notification; that is the failure mode that produces compliance incidents and week-long debugging sessions.

No items found.