
According to Gartner, 40% of enterprise applications will feature task-specific AI agents by end of 2026, up from less than 5% in 2025. LangGraph now sees 27,100 monthly searches; CrewAI follows with 14,800. LangGraph crossed 34.5 million monthly downloads. These numbers describe a category that has moved from experiment to infrastructure.
The frameworks driving that adoption have each found a clear orchestration model that production teams trust:
The orchestration problem — how an LLM selects tools, how state flows between steps, how agents hand off to each other — is solved. Each framework above is a defensible production choice for the reasoning layer. The deep implementation specifics for each are covered in their own posts: LangChain, CrewAI, Mastra, Anthropic SDK.
What none of them have solved — and what none of them are trying to solve — is the layer underneath.
Every framework above handles the same set of things:
Every framework above explicitly does not handle:
This is not a gap born from negligence; it is intentional scope design. Frameworks are orchestration primitives. They are not integration platforms. The credential layer is a separate problem that frameworks deliberately pass back to the application.
How that boundary surfaces differs per framework. The boundary itself does not.
LangChain / LangGraph: The @tool decorator has no concept of user identity. Credentials reach the tool function only through a closure over a token you already fetched, or through RunnableConfig["configurable"] that carries whatever you manually put there. LangSmith's @auth.authenticate and @auth.on handlers govern access to LangGraph's own resources — threads, assistants, run history. They have no relationship to the OAuth credentials your tools need to reach Salesforce or Gmail.
CrewAI: Tools are instantiated once at Crew() construction and reused across every kickoff(). Under single-tenant, single-execution conditions this is invisible. Under concurrent multi-tenant load, the same BaseTool instance carries state from Customer A's execution into Customer B's. The shared tool object becomes a shared identity surface. The orchestration layer looks fine throughout; the instability lives entirely beneath it.
Mastra: requestContext is the right primitive for carrying runtime identity — it travels through every execute() call without being serialized into the LLM prompt, which is exactly correct. But requestContext carries whatever you put into it. @mastra/auth-okta verifies the inbound JWT and sets userId in context; it answers "who is calling your Mastra server." It says nothing about what that user has connected downstream — whether their HubSpot token is live, their Linear token was revoked last week, or their Salesforce instance URL is na47.salesforce.com and not login.salesforce.com.
Anthropic SDK: Tool definitions are pure JSON schemas passed to anthropic.messages.create(). The SDK handles reasoning and tool orchestration. Authentication with any external service — the OAuth flow, the token exchange, the refresh cycle, the user-scoped credential — is completely outside the SDK boundary by design.
OpenAI Agents SDK: context_wrapper carries state across agent handoffs. There is no credential layer built into the SDK; every external service your agent touches requires you to have obtained, stored, and refreshed the appropriate token before it reaches the tool function.
Google ADK: AuthConfig and AuthCredential exist as primitives. ADK defines the structure for credential types — OAuth2Auth, APIKey, ServiceAccountCredentials. What it does not provide is storage, refresh orchestration, per-user routing, or tenant isolation. The OAuth exchange logic, the token table, and the refresh handler are all yours.
The gaps are structural. The only variable is which surface in each framework exposes them:
1. No credential storage. Tokens go wherever you put them: environment variables, a user_tokens table you own, hardcoded in a closure. No standard pattern exists inside any framework.
2. No token refresh. Gmail access tokens expire in 60 minutes; Salesforce tokens in 2 hours; Dropbox has a non-standard refresh behavior that silently reuses tokens past expiry under certain conditions. A mid-chain 401 surfaces as empty data in LangGraph's ToolNode — not an exception, just missing data passed into the next step. In CrewAI's BaseTool._run(), a stale credential returns silently wrong results. In Mastra's execute(), an unhandled rejection from an expired token breaks the step with no typed error the workflow can branch on. The failure mode varies; the root cause is the same.
3. No per-user routing. @tool, BaseTool._run(), and Mastra's execute() all receive no user context unless explicitly injected. The default — if you haven't built injection — is whatever credential the function has access to at runtime. In practice, that is a shared service account or bot token. Every customer's agent posts as the same Slack identity. Every CRM lookup uses the same Salesforce org.
4. No per-tenant isolation. LangGraph's @auth system governs LangGraph resources. It does not enforce that Customer A's tools cannot reach Customer B's connected accounts. None of the six frameworks enforce tenant isolation at the credential layer; they have no concept of a credential-to-tenant mapping. That enforcement must live in infrastructure, not in orchestration logic.
5. No connector catalog. Every enterprise system your agent needs to reach — Salesforce, Slack, Gmail, Linear, Notion, GitHub — is a custom tool your team builds, including the OAuth implementation, the token storage, and the refresh logic. Build it for one connector, and you have a working demo. Build it for four connectors across 100 customers, and these patterns all converge on the same production failures.
Where each gap surfaces per framework:
For the full implementation breakdown per framework: LangChain's three credential patterns and their failure modes, CrewAI's shared instance problem under concurrent load, Mastra's requestContext gap, Anthropic SDK's auth boundary.
Before naming anything that solves this, it is worth being precise about the spec. What does "solving the credential layer" actually mean in a multi-tenant B2B SaaS context?
Per-connector OAuth implementation, not generic OAuth. Salesforce has a per-org instance URL that cannot be hardcoded; every org resolves to a different endpoint and the token exchange must account for it. Slack scopes must be pre-declared in the App Dashboard before the flow runs; a missing scope returns an access error that looks nothing like a configuration issue. Gmail tokens follow Google's standard 1-hour expiry; Dropbox has non-standard refresh behavior. Each provider is its own integration surface.
Token storage at the right layer. Access token + refresh token + expiry timestamp + service-specific metadata (Salesforce instance URL, Slack workspace ID, etc.) + user-to-token mapping per connector. For 100 customers across 4 connectors, that is 400+ credential records, each requiring encryption at rest, per-tenant isolation, and indexed retrieval on every tool call.
Proactive refresh, not reactive. Waiting for a 401 to trigger refresh creates race conditions: multiple agent threads hitting the same expired token simultaneously will each attempt refresh, each unaware the others are doing the same. The result is duplicate refresh requests, potential token invalidation (some providers invalidate the old refresh token on first use), and cascading retries. Refresh must be coordinated and proactive — triggered by expires_in, not by API failure.
Tenant isolation enforced at the infrastructure layer. Not in tool logic. Not in orchestration logic. The credential store must guarantee that a lookup for user_alice's HubSpot token cannot return user_bob's token regardless of what the agent requests. This is an architectural primitive, not an application-level check.
Revocation tied to identity lifecycle events. An employee's Slack token does not expire when their Okta account is disabled. The credential layer must respond to SCIM deprovisioning events, not just OAuth expiry. See the full breakdown of what proper offboarding requires when agents are in the picture.
Audit trails at the tool call level. Per SOC 2 CC6.1, every agent action must reference the originating authorization event. The audit log entry for a Salesforce write must carry the connection_id linking back to when that user originally authorized access, with what scopes, for what duration.
Scalekit AgentKit is the connector identity layer built to this spec. The call interface from the framework side is:
Scalekit resolves user_alice's HubSpot credential, validates it against the scopes she originally authorized, refreshes proactively if needed, executes against HubSpot's API, and returns the result. user_alice's token never appears in application code. The agent stays stateless. The audit trail is automatic. The connector catalog covers 150+ enterprise apps with 3,000+ pre-built tools — CRM (Salesforce, HubSpot, Pipedrive, Attio), communication (Slack, Gmail, Outlook, Teams), project management (Linear, Jira, Notion, Asana), data (Snowflake, BigQuery, Databricks), dev tools (GitHub, GitLab, Vercel), and more.
Full documentation: docs.scalekit.com/agentkit/overview/
The correct framing is not "which framework handles auth best." None of them handle auth; that is a deliberate design decision across the entire category. The correct framing is: pick your framework on orchestration fit, pick your credential infrastructure on the production requirements of your agent. These decisions do not overlap.
Framework selection on orchestration fit:
Credential layer: the injection point is per-framework; the infrastructure requirement is not.
Once your agent calls real enterprise APIs on behalf of real users across multiple tenants, the credential layer is an infrastructure problem. The framework determines which injection point you use to pass the resolved credential from vault to tool. The injection points are:
All six are injection points, not credential infrastructure. The infrastructure — OAuth flows, token vault, refresh coordination, tenant isolation, revocation — is the same requirement regardless of which framework sits above it. Per-framework integration guides for AgentKit: LangChain, Anthropic, Mastra, Google ADK, OpenAI, CrewAI.
No. The injection point in each framework is a single call substitution inside each tool's execute function. The orchestration logic — graph structure, crew definition, workflow steps — is untouched.
For single-tenant, single-concurrent-execution deployments, shared tool instances in CrewAI work fine. The isolation failure surfaces under concurrent multi-tenant execution when multiple kickoff() calls hit the same tool instances simultaneously. If you are ever going to run the same crew for more than one customer, the credential layer must be isolated before you hit production.
LangGraph's @auth system governs access to LangGraph resources — threads, assistants, run history. It controls who can read or write a LangGraph thread. It does not govern what external systems the tools inside those threads can access. These are separate authorization surfaces: one controls the agent platform, the other controls the external connectors the agent calls. The full breakdown is in the LangChain tool calling post.
The three agent action patterns cover this precisely. User-delegated access means the agent acts with a specific user's OAuth grant and their exact scopes. Org-level credentials (Client Credentials flow) means the agent runs with service account access for background jobs where no user is in the loop. MCP tool access uses short-lived scoped tokens per downstream call. The choice between them is a function of the action type, not the framework.
The token vault and per-tenant isolation requirements apply any time credentials must be scoped per customer and managed outside the agent runtime. For internal APIs, M2M authentication via organization-scoped service accounts is the right pattern. See M2M authentication for internal services and external APIs.