Jun 26, 2026

When to Use a Virtual MCP Server: The N×M Problem

The math that breaks Agents at production scale

A two-agent prototype that touches three tools is easy. You wire each agent to each tool, hardcode a few credentials, and it works. The trouble starts when the prototype becomes a product. Now you have several agents, a dozen tools, and a growing list of customers, each with their own accounts. The number of connections you maintain stops growing in a line and starts growing like a grid.

Why this is a question of when, not whether

That grid is the N×M problem, and a virtual MCP server is one of the main ways teams tame it. The useful question is not whether the pattern works, but at what point it earns its place in your deployment. To answer that well, you have to separate two different N×M problems that are easy to conflate:

The N×M that the Model Context Protocol already solved.
The N×M that it did not, and that you still own.

The N×M problem MCP already solved

One protocol instead of a grid of integrations

Before MCP, connecting M AI applications to N tools meant building and maintaining up to M times N bespoke integrations. The Model Context Protocol collapses that grid by giving both sides one protocol to implement, which turns M×N into M+N. Each tool speaks MCP once, each client speaks MCP once, and any client can then reach any compliant server.

A borrowed and proven idea

This is not a novel trick. MCP took the approach directly from the Language Server Protocol, which solved the same grid for code editors and programming languages. The same standardization logic explains the common shorthand for MCP as a universal adapter for AI tools. The key point for a deployment decision is narrower than the marketing: what MCP standardized is the wire format and the discovery model, not the operation of your fleet.

The N×M problem MCP did not solve

Standard protocol, non-standard operations

Here is the part that catches teams out. MCP solves N×M at the protocol layer, but a second N×M lives at the operational layer, and that one does not disappear because the wire format is uniform. Every remote MCP server you run still needs its own authentication, its own credentials, its own scopes, and its own hosting. Standardizing how a call is framed does nothing to standardize who is allowed to make it or where the server runs.

Where the cost actually shows up

The operational grid concentrates in a few predictable places:

Per-server authentication, which teams running more than a handful of servers report becomes a maintenance burden within weeks.
Credential sprawl, repeatedly cited as the first operational failure mode once servers multiply.
Hosting and scaling, since each remote server is a service you must keep available.

Production auth is not a checkbox

The authentication piece deserves emphasis, because it is where most of the operational N×M hides. The MCP authorization framework specifies OAuth 2.1, and current guidance for production remote servers is to use PKCE-protected flows, expose protected resource metadata for discovery, and validate the audience claim on every token. One rule matters more than the rest: a server should never pass a client's token straight through to a downstream API, and should obtain a separately scoped token instead. Doing that correctly once is real work. Doing it once per server, per provider, is the operational N×M in plain sight.

The deployment decision, four ways

The options on the table

Choosing how to deploy is really choosing who absorbs that operational grid. Four broad options come up in practice, and they map onto how MCP servers are commonly categorized as workstation, managed, and remote deployments:

Local STDIO server: runs as a subprocess on one machine, no network, no auth layer.
Self-hosted remote server: you run it over Streamable HTTP and own uptime, scaling, and security.
Managed or hosted remote server: a provider runs the endpoint, and you connect over a URL.
Scoped virtual MCP server: a managed endpoint that also narrows the tool surface and carries per-user identity.

What each option trades

Option

You operate a server

Multi-user and concurrency

Auth you own

Best fit

Local STDIO

Yes, locally

Poor, degrades under concurrent load

Environment variables only

Single developer, local tooling

Self-hosted remote

Yes

Yes, if you build for it

The full OAuth 2.1 setup, per server

Control and data-boundary needs

Managed remote

Yes

Mostly delegated

Speed, minimal infrastructure

Scoped vMCP

Yes, per-user by design

Delegated, scoped per user

Multi-tenant products needing tight reach

Reading the matrix

STDIO is the right answer for a tool you run on your own laptop, and the wrong answer the moment more than one user or one machine is involved, because it has no real concurrency story and no natural place to enforce auth or audit. The further down the table you go, the more of the operational N×M someone else absorbs. The scoped vMCP option absorbs the most, which is why it tends to win for multi-tenant products.

The scaling subtlety worth tracking

Stateful today

One protocol detail shapes the self-host decision more than people expect. The latest stable MCP specification, dated 2025-11-25, treats MCP as a stateful protocol, and stateful sessions are awkward to scale horizontally because a session pinned to one process fights with load balancers. If you self-host, that awkwardness is your problem to engineer around.

Stateless coming

A stateless rework is published as a release candidate dated 2026-07-28, and it is not the final specification as of mid-2026. The practical guidance is to design against the stateful model today while planning for the stateless one. A managed or virtual endpoint is attractive partly because it absorbs this churn on your behalf, rather than leaving you to re-architect transport and session handling as the spec matures.

When a virtual MCP server is the right call

Signals that the time has come

The pattern earns its place at a fairly predictable threshold. The clearest signals are below:

You serve multiple users or tenants, each with their own credentials.
You are maintaining auth and hosting for several MCP servers and feeling the per-server burden.
You need to expose a tight, per-role slice of a tool surface rather than its full catalog.
Per-user isolation and a consolidated audit trail have become requirements.

Signals to wait

It is equally important to know when this is overhead you do not need yet. If a single connector serves one trusted user, or you have no multi-tenant or audit obligations, the simpler deployments are enough. Adopt the pattern when reach, identity, and operational load actually become the constraint, not before. The earlier sections describe what each option costs; this is simply the point at which the heavier option pays for itself.

Scalekit solves operational N×M to N×1

The same scoped-virtual option, made concrete

Scalekit's vMCP implements the fourth option from the matrix above. It is worth being precise about what it does and does not claim. It does not re-solve the protocol N×M, since MCP already did that. It collapses the operational N×M to roughly N×1 at the credential layer, so that every connector inherits the same vault and audit chain rather than a fresh auth stack each time.

How the collapse works

The mechanism is a managed, scoped endpoint plus a per-user identity model:

There is no MCP server for you to deploy, host, or scale; the endpoint is generated and served over Streamable HTTP.
One server definition serves every user in a tenant, with a short-lived session token minted per run and scoped to that user's connected accounts.
Credentials live in a per-tenant encrypted vault, resolved server-side at call time, and never enter the agent runtime, the model context, or your logs.

Why this lines up with production auth guidance

This design follows the same rule that the authorization guidance insists on. Rather than forwarding a client's token to a downstream provider, Scalekit resolves a separately scoped credential for the authorizing user on each call.

For teams with data-boundary requirements, the same model is available either as a Scalekit-hosted endpoint or deployed in your own cloud, so the deployment decision does not force a choice between control and convenience.

Where to start

A short, documented path takes you from the model to a running endpoint.

Start with the virtual MCP overview on Scalekit Docs.
Set up and connect a Virtual MCP server using Scalekit.
Refer to this Claude managed agent example.

More Resources:

Begin with the AgentKit and connectors overview to see the connected-account model.
Follow the MCP authorization quickstart for OAuth 2.1 and scoped tokens.
If you run FastMCP, use the FastMCP integration guide.
To add OAuth 2.1 to an MCP server you already run, see Scalekit MCP Auth.

The deployment decision is who absorbs the grid

What you are really deciding

MCP took the integration grid from M×N down to M+N at the protocol layer, and that is settled. The decision in front of you is about the second grid, the operational one made of auth, credentials, and hosting per server. Every deployment option is a different answer to a single question: who absorbs that work, you or the platform?

How to choose

If you run one tool for one trusted user, keep it local and move on.
If you need control over hosting and data boundaries, self-host and budget for the auth and scaling work.
If you are shipping a multi-tenant product and feeling the per-server burden, a scoped virtual MCP server collapses the operational grid to something close to N×1.

Decide on identity, reach, and operational load first. The reduction in connections and maintenance follows from getting those three right. For the deeper auth mechanics behind this, the SSO-backed MCP authentication guide and the insight on building a secure MCP server go further than this overview can.

No items found.

On this page

Introduction
‍

This is some text inside of a div block.