Announcing CIMD support for MCP Client registration
Learn more

Best Agent Development Frameworks (Auth and Other Features Compared)

TL;DR

  • Every major agent framework in 2026 — LangGraph, CrewAI, OpenAI Agents SDK, Anthropic SDK, Google ADK, Mastra — has a clearly defined scope boundary: orchestration in, credential management out.
  • The mechanical difference between how LangChain, CrewAI, and Mastra surface that boundary is real; the structural gap is identical across all of them.
  • "Where secrets live" in your agent is not a framework question; it is an infrastructure question every framework intentionally returns to you.
  • Multi-tenant credential isolation — one set of scoped tokens per user, per connector, per org — is not achievable through any pattern native to any of these frameworks.
  • The decision guide is not "which framework handles auth best." It is: pick your framework on orchestration fit, solve the credential layer separately with purpose-built connector identity infrastructure.

Frameworks Are at Adoption Scale. The Orchestration Problem Is Solved.

According to Gartner, 40% of enterprise applications will feature task-specific AI agents by end of 2026, up from less than 5% in 2025. LangGraph now sees 27,100 monthly searches; CrewAI follows with 14,800. LangGraph crossed 34.5 million monthly downloads. These numbers describe a category that has moved from experiment to infrastructure.

The frameworks driving that adoption have each found a clear orchestration model that production teams trust:

Framework
Orchestration Model
Language
Optimized For
LangGraph
Directed state graph
Python
Fine-grained production workflows, HITL, time-travel debugging
CrewAI
Role-based crews
Python
Multi-agent collaboration, role-driven task decomposition
OpenAI Agents SDK
Explicit handoffs
Python / JS
OpenAI-native deployments, clean agent-to-agent transfer
Anthropic SDK
Tool-use chain with sub-agents
Python / JS
Claude-native reasoning, safety-constrained workflows
Google ADK
Hierarchical agent tree
Python / Go / Java / TS
Vertex AI deployments, multi-language enterprise teams
Mastra
Typed workflows + step orchestration
TypeScript
TypeScript-native agents, Vercel and edge deployments

The orchestration problem — how an LLM selects tools, how state flows between steps, how agents hand off to each other — is solved. Each framework above is a defensible production choice for the reasoning layer. The deep implementation specifics for each are covered in their own posts: LangChain, CrewAI, Mastra, Anthropic SDK.

What none of them have solved — and what none of them are trying to solve — is the layer underneath.

The Scope Boundary Every Framework Draws in the Same Place

Every framework above handles the same set of things:

  • LLM routing and tool schema serialization
  • Reasoning loop control (when to call a tool, when to stop)
  • State passing between steps or agents
  • Multi-agent coordination patterns

Every framework above explicitly does not handle:

  • OAuth flows to external services (Slack, Salesforce, Gmail, Linear, HubSpot)
  • Token storage, encryption at rest, per-tenant isolation
  • Refresh lifecycle management
  • Per-user credential routing
  • Revocation flows

This is not a gap born from negligence; it is intentional scope design. Frameworks are orchestration primitives. They are not integration platforms. The credential layer is a separate problem that frameworks deliberately pass back to the application.

How that boundary surfaces differs per framework. The boundary itself does not.

LangChain / LangGraph: The @tool decorator has no concept of user identity. Credentials reach the tool function only through a closure over a token you already fetched, or through RunnableConfig["configurable"] that carries whatever you manually put there. LangSmith's @auth.authenticate and @auth.on handlers govern access to LangGraph's own resources — threads, assistants, run history. They have no relationship to the OAuth credentials your tools need to reach Salesforce or Gmail.

CrewAI: Tools are instantiated once at Crew() construction and reused across every kickoff(). Under single-tenant, single-execution conditions this is invisible. Under concurrent multi-tenant load, the same BaseTool instance carries state from Customer A's execution into Customer B's. The shared tool object becomes a shared identity surface. The orchestration layer looks fine throughout; the instability lives entirely beneath it.

Mastra: requestContext is the right primitive for carrying runtime identity — it travels through every execute() call without being serialized into the LLM prompt, which is exactly correct. But requestContext carries whatever you put into it. @mastra/auth-okta verifies the inbound JWT and sets userId in context; it answers "who is calling your Mastra server." It says nothing about what that user has connected downstream — whether their HubSpot token is live, their Linear token was revoked last week, or their Salesforce instance URL is na47.salesforce.com and not login.salesforce.com.

Anthropic SDK: Tool definitions are pure JSON schemas passed to anthropic.messages.create(). The SDK handles reasoning and tool orchestration. Authentication with any external service — the OAuth flow, the token exchange, the refresh cycle, the user-scoped credential — is completely outside the SDK boundary by design.

OpenAI Agents SDK: context_wrapper carries state across agent handoffs. There is no credential layer built into the SDK; every external service your agent touches requires you to have obtained, stored, and refreshed the appropriate token before it reaches the tool function.

Google ADK: AuthConfig and AuthCredential exist as primitives. ADK defines the structure for credential types — OAuth2Auth, APIKey, ServiceAccountCredentials. What it does not provide is storage, refresh orchestration, per-user routing, or tenant isolation. The OAuth exchange logic, the token table, and the refresh handler are all yours.

The Five Gaps That Appear in Every Framework When Agents Hit Real Enterprise APIs

The gaps are structural. The only variable is which surface in each framework exposes them:

1. No credential storage. Tokens go wherever you put them: environment variables, a user_tokens table you own, hardcoded in a closure. No standard pattern exists inside any framework.

2. No token refresh. Gmail access tokens expire in 60 minutes; Salesforce tokens in 2 hours; Dropbox has a non-standard refresh behavior that silently reuses tokens past expiry under certain conditions. A mid-chain 401 surfaces as empty data in LangGraph's ToolNode — not an exception, just missing data passed into the next step. In CrewAI's BaseTool._run(), a stale credential returns silently wrong results. In Mastra's execute(), an unhandled rejection from an expired token breaks the step with no typed error the workflow can branch on. The failure mode varies; the root cause is the same.

3. No per-user routing. @tool, BaseTool._run(), and Mastra's execute() all receive no user context unless explicitly injected. The default — if you haven't built injection — is whatever credential the function has access to at runtime. In practice, that is a shared service account or bot token. Every customer's agent posts as the same Slack identity. Every CRM lookup uses the same Salesforce org.

4. No per-tenant isolation. LangGraph's @auth system governs LangGraph resources. It does not enforce that Customer A's tools cannot reach Customer B's connected accounts. None of the six frameworks enforce tenant isolation at the credential layer; they have no concept of a credential-to-tenant mapping. That enforcement must live in infrastructure, not in orchestration logic.

5. No connector catalog. Every enterprise system your agent needs to reach — Salesforce, Slack, Gmail, Linear, Notion, GitHub — is a custom tool your team builds, including the OAuth implementation, the token storage, and the refresh logic. Build it for one connector, and you have a working demo. Build it for four connectors across 100 customers, and these patterns all converge on the same production failures.

Where each gap surfaces per framework:

Gap
LangGraph
CrewAI
Mastra
Anthropic / OpenAI / ADK
Credential storage
Developer's user_tokens table
Developer-owned
Developer-owned
Developer-owned
Token refresh
Manual, inside each @tool
Manual, inside BaseTool._run()
Manual, inside execute()
Manual per integration
Per-user routing
Via RunnableConfig (if built)
Via tool constructor (if built)
Via requestContext (if populated)
Via closure / context (if built)
Tenant isolation
Not enforced
Not enforced
Not enforced
Not enforced
Connector catalog
None
None
None
None built-in

For the full implementation breakdown per framework: LangChain's three credential patterns and their failure modes, CrewAI's shared instance problem under concurrent load, Mastra's requestContext gap, Anthropic SDK's auth boundary.

What a Production Connector Identity Layer Actually Requires

Before naming anything that solves this, it is worth being precise about the spec. What does "solving the credential layer" actually mean in a multi-tenant B2B SaaS context?

Per-connector OAuth implementation, not generic OAuth. Salesforce has a per-org instance URL that cannot be hardcoded; every org resolves to a different endpoint and the token exchange must account for it. Slack scopes must be pre-declared in the App Dashboard before the flow runs; a missing scope returns an access error that looks nothing like a configuration issue. Gmail tokens follow Google's standard 1-hour expiry; Dropbox has non-standard refresh behavior. Each provider is its own integration surface.

Token storage at the right layer. Access token + refresh token + expiry timestamp + service-specific metadata (Salesforce instance URL, Slack workspace ID, etc.) + user-to-token mapping per connector. For 100 customers across 4 connectors, that is 400+ credential records, each requiring encryption at rest, per-tenant isolation, and indexed retrieval on every tool call.

Proactive refresh, not reactive. Waiting for a 401 to trigger refresh creates race conditions: multiple agent threads hitting the same expired token simultaneously will each attempt refresh, each unaware the others are doing the same. The result is duplicate refresh requests, potential token invalidation (some providers invalidate the old refresh token on first use), and cascading retries. Refresh must be coordinated and proactive — triggered by expires_in, not by API failure.

Tenant isolation enforced at the infrastructure layer. Not in tool logic. Not in orchestration logic. The credential store must guarantee that a lookup for user_alice's HubSpot token cannot return user_bob's token regardless of what the agent requests. This is an architectural primitive, not an application-level check.

Revocation tied to identity lifecycle events. An employee's Slack token does not expire when their Okta account is disabled. The credential layer must respond to SCIM deprovisioning events, not just OAuth expiry. See the full breakdown of what proper offboarding requires when agents are in the picture.

Audit trails at the tool call level. Per SOC 2 CC6.1, every agent action must reference the originating authorization event. The audit log entry for a Salesforce write must carry the connection_id linking back to when that user originally authorized access, with what scopes, for what duration.

Scalekit AgentKit is the connector identity layer built to this spec. The call interface from the framework side is:

result = scalekit.actions.executeTool( identifier="user_alice", tool_name="hubspot_search_contacts", tool_input={"filterGroups": [...], "properties": [...], "limit": 50} )

Scalekit resolves user_alice's HubSpot credential, validates it against the scopes she originally authorized, refreshes proactively if needed, executes against HubSpot's API, and returns the result. user_alice's token never appears in application code. The agent stays stateless. The audit trail is automatic. The connector catalog covers 150+ enterprise apps with 3,000+ pre-built tools — CRM (Salesforce, HubSpot, Pipedrive, Attio), communication (Slack, Gmail, Outlook, Teams), project management (Linear, Jira, Notion, Asana), data (Snowflake, BigQuery, Databricks), dev tools (GitHub, GitLab, Vercel), and more.

Full documentation: docs.scalekit.com/agentkit/overview/

Framework Decision Guide: Two Independent Decisions, Not One

The correct framing is not "which framework handles auth best." None of them handle auth; that is a deliberate design decision across the entire category. The correct framing is: pick your framework on orchestration fit, pick your credential infrastructure on the production requirements of your agent. These decisions do not overlap.

Framework selection on orchestration fit:

If your agent needs...
Framework
Fine-grained state control, HITL checkpoints, time-travel debugging, complex conditional flows
LangGraph
Multi-agent role-based collaboration, declarative task decomposition, fast prototyping
CrewAI
TypeScript-native, edge and Vercel deployment, strict end-to-end typing
Mastra
Claude-native reasoning, safety-constrained workflows, sub-agent patterns
Anthropic SDK
OpenAI-native deployments, clean agent-to-agent explicit handoffs
OpenAI Agents SDK
Vertex AI enterprise deployment, multi-language team (Java, Go, Python, TypeScript)
Google ADK

Credential layer: the injection point is per-framework; the infrastructure requirement is not.

Once your agent calls real enterprise APIs on behalf of real users across multiple tenants, the credential layer is an infrastructure problem. The framework determines which injection point you use to pass the resolved credential from vault to tool. The injection points are:

Framework
Credential injection point
LangGraph
RunnableConfig["configurable"]
CrewAI
Tool constructor arguments, resolved per-execution at kickoff()
Mastra
requestContext
Anthropic SDK
Closure or context passed into the tool executor function
OpenAI Agents SDK
context_wrapper
Google ADK
ToolContext

All six are injection points, not credential infrastructure. The infrastructure — OAuth flows, token vault, refresh coordination, tenant isolation, revocation — is the same requirement regardless of which framework sits above it. Per-framework integration guides for AgentKit: LangChain, Anthropic, Mastra, Google ADK, OpenAI, CrewAI.

FAQs

Does adding a credential layer mean rewriting the agent?

No. The injection point in each framework is a single call substitution inside each tool's execute function. The orchestration logic — graph structure, crew definition, workflow steps — is untouched.

If the crew runs for only one customer, does the shared instance problem still apply?

For single-tenant, single-concurrent-execution deployments, shared tool instances in CrewAI work fine. The isolation failure surfaces under concurrent multi-tenant execution when multiple kickoff() calls hit the same tool instances simultaneously. If you are ever going to run the same crew for more than one customer, the credential layer must be isolated before you hit production.

Why doesn't LangGraph's built-in auth solve the per-user credential problem?

LangGraph's @auth system governs access to LangGraph resources — threads, assistants, run history. It controls who can read or write a LangGraph thread. It does not govern what external systems the tools inside those threads can access. These are separate authorization surfaces: one controls the agent platform, the other controls the external connectors the agent calls. The full breakdown is in the LangChain tool calling post.

What's the difference between per-user and per-org credential patterns?

The three agent action patterns cover this precisely. User-delegated access means the agent acts with a specific user's OAuth grant and their exact scopes. Org-level credentials (Client Credentials flow) means the agent runs with service account access for background jobs where no user is in the loop. MCP tool access uses short-lived scoped tokens per downstream call. The choice between them is a function of the action type, not the framework.

Is this only relevant for external SaaS connectors, or does it apply to internal APIs too?

The token vault and per-tenant isolation requirements apply any time credentials must be scoped per customer and managed outside the agent runtime. For internal APIs, M2M authentication via organization-scoped service accounts is the right pattern. See M2M authentication for internal services and external APIs.

No items found.
Agent Auth Quickstart
Share this article
Agent Auth Quickstart

Acquire enterprise customers with zero upfront cost

Every feature unlocked. No hidden fees.
Start Free
$0
/ month
1 million Monthly Active Users
100 Monthly Active Organizations
1 SSO connection
1 SCIM connection
10K Connected Accounts
Unlimited Dev & Prod environments