
We just shipped virtual MCP servers, and here's why it matters.
Connect Gmail to an AI agent today and you get all 30 tools. Send mail, delete mail, create labels, manage filters, all of it, every time, whether your agent needs it or not. Most agents need exactly one of those tools.
This causes real problems. When a model has to pick from 30 tools instead of 3, it picks wrong more often. All those extra tool descriptions sit in the context window burning tokens on every single call, even though they never get used. And every tool the agent can call is a tool that can go wrong, get misused, or get exploited, whether you meant to expose it or not.
You can't fix this by writing a better prompt. Telling the model "just use fetch" doesn't make the other 29 tools disappear from its context. They're still there, still competing for the model's attention, still expanding what could go wrong. The fix has to happen below the prompt, in how the tools get exposed in the first place.
We keep hearing the same story from teams building agent products (this is a pattern we've seen across customer conversations, not a doc spec). A company lets its own customers build custom agents: pick a prompt, pick some tools, ship it. Tools might come from Gmail, Calendar, Slack, or the company's own product.
Here's the catch: those agents run for thousands of different end users, and each user has their own login, their own permissions, their own data. The agent definition, what tools it's allowed to use, needs to be set once. But who it's acting as needs to change every single time it runs.
Most MCP setups don't let you do this cleanly. You get one server URL, and that URL bakes the tool list and the user's identity together into one thing. So if you want one agent definition shared across a thousand users, you either make a thousand separate servers, which is a mess to manage and a security headache, or you give up on keeping each user's access separate, which is worse.
It splits those two things apart, the way they should have been split from the start.
First, you define the server once: which connections it can use, and which exact tools from each one. A summarizer agent gets fetch. Nothing else. Not send, not delete, not archive. This definition is reusable, set it up once per agent role and you're done. See what a virtual MCP server actually is for the full architecture.
Second, at the moment the agent actually runs, you hand it a short-lived token tied to one specific user. The server uses that token to pull the right credentials behind the scenes. The user's actual tokens never show up in the agent's context or in your logs. And because the credentials are resolved fresh every run, you get clean separation between every user, automatically.
So the definition stays fixed. The identity changes every time. That's the whole idea.
This isn't just a security nicety, it changes how the agent behaves. Going from 40 available tools down to the 5 or 10 an agent role actually needs cuts the wasted context by around 80%. That's the difference between an agent that picks the right tool most of the time and one that picks the right tool basically every time.
If you're building one assistant for internal use, you might never hit this. But the moment you're running agents for more than one customer, each with their own users and their own data, this stops being optional. We've seen the exact same need come up independently across several teams building "let our customers build their own agents" features, which tells us this isn't a one-off problem, it's a pattern (again, field observation, not a documented spec). We break down exactly when this threshold hits in the N×M problem.
It also shows up from the other direction. Products are starting to become MCP clients themselves, reaching out to external MCP servers that their own customers configure. The moment your product is the one brokering access on someone else's behalf, you need this same split between what's allowed and who's asking.
Virtual MCP servers are live now. Read the full setup in the docs, define a server for your first agent role, and see the tool list shrink to exactly what it should be. Once it's running, set per-user and per-tenant rate limits before you put it in front of real traffic.