
Six months ago, most of us hadn’t even heard of MCP.
There was no standard playbook, no “right” way to expose tools to LLMs. Just a few GitHub repos, some early experiments, and a lot of shared frustration in Discord threads. But something about it clicked — the idea that AI agents could go beyond chat and actually do things, securely and reliably.
As an auth platform, we knew we had to take MCP seriously as an infrastructure. Because when agents start taking actions, everything comes back to trust, auth, and scopes.
So we rolled up our sleeves and brought up an internal MCP server for Scalekit. Along the way, we leaned on every working implementation out there, learned from the community’s growing pains, contributed back fixes, and most importantly, shaped the infrastructure we’d want other teams (and their agents) to rely on.
Before we even touched scopes or auth, we had one big question to answer: “How should responses be streamed back to the agent?"
This wasn’t just about UX or output polish. It impacted everything, from how clients parsed responses, to how rich the tool outputs could be, to how easily agents could consume what we were sending.
It sounds like a small implementation detail, but it shaped how we designed the entire thing. So let me put it the way we thought about it…
Picture this: you're at a restaurant. You tell the waiter, “Keep bringing me dishes as the kitchen prepares them.”
That’s what Server-Sent Events (SSE) does. It delivers each item — one at a time, in a predictable format, without you having to ask again. If the connection drops, it picks up where it left off. Reliable, minimal, and great when you just need simple updates.
But that’s all you get:
Now imagine instead of a waiter, you had a delivery van from the kitchen.
It can carry full meals, sides, drinks, instructions, and even partial updates — all streamed to you, however you want to unpack them. That’s HTTP Streamable.
It’s a bit more work to manage as you have to handle reconnects and unpacking, but the flexibility is worth it when you're working with LLMs that need structured JSON, multi-step outputs, or large payloads.
We started small — just a couple of internal tools wired up to the MCP server. These weren’t production-critical yet. The goal was to validate:
Can agents call our tools, and can we control what happens when they do?
Each tool had a clear structure:
Here’s what one of the earliest tools looked like — list_environments:
We used Zod schemas to validate inputs for tools that needed them:
And we kept registration clean and centralized:
Having all tools registered declaratively in one place made it easy to reason about what we were exposing and how.
Even at this early stage, we knew that eventually every tool would need clear access boundaries. But first, we just wanted the wiring to feel right, both for us as developers and for the agents calling them.
Here's a full demo of how Scalekit's MCP servers work:
With a few tools wired up, it was time to protect them.
Since the MCP server was going to be a public-facing surface callable by real agents, over the internet, we had no business leaving it open. This wasn’t just for demo use. It was a real integration point, and we treated it like one from the start.
From the very beginning, we approached the MCP server with a platform mindset:
This wasn’t about locking things down for the sake of it. It was about clarity — for us, and for anyone calling the tools.
We also knew agents needed to discover and understand what access they had. So we aligned on using OAuth 2.1 and the protected resource metadata spec — so that any agent could programmatically understand what scopes were required, how to get tokens, and what the rules of the game were.
This decision addressed concerns about accidental overreach.
In the next section, we’ll break down exactly how we implemented this — the four pieces that made our auth layer work.
Learn more - How to Map an Existing API into MCP Tool Definitions
We didn’t have to look far for auth — we used Scalekit’s own OAuth2-based MCP auth. It gave us everything we needed to lock down access while keeping agent interactions smooth.
Every agent or system that talks to the MCP server needs a token. We used the OAuth 2.1 client credentials grant, which is ideal for machine-to-machine use cases — like AI agents acting on behalf of a workspace or org.
These tokens are scoped, short-lived, and issued from the same authorization server we already run for all our Scalekit integrations.
Every incoming request carries a Bearer token in the header:
The token is a JWT — signed, structured, and verifiable. If the signature or claims are invalid, the request is rejected right away. No tool logic runs until auth is confirmed.
To support dynamic discovery, especially for standards-aware agents, we exposed a protected resource metadata endpoint:
It returns a JSON document that tells clients how this resource is protected and what scopes it supports:
This makes the server self-describing, an important feature when agents are dynamically choosing tools and preparing their auth flows.
Once the token is verified, we inspect the scopes in the claims:
Every tool defines what scopes it needs. If the token doesn’t carry those scopes, the request is rejected — clearly, with a structured error message.
This keeps access granular and explicit — no tool can be called without the exact permission attached.
That’s it. Auth-first, scoped by default, and standards-aligned.
This setup gave us confidence to expose tools to external agents, knowing they’d be properly isolated, traceable, and revocable.
Once auth was wired in and a few tools were scoped, we needed to answer the real question: “Can agents actually use this?”
It’s one thing to build a working endpoint. It’s another to see how an actual agent interacts with your tool — what it discovers, what it fails on, and how it reacts to unclear definitions or missing scopes.
So we tested with real clients:
To connect them, we used mcp-remote — which turned out to be critical.
This little proxy handled a bunch of headaches for us:
It became our primary bridge between dev instances and agent clients — and made local debugging feel almost production-grade.
Real client testing surfaced a ton of useful issues early:
We’d tweak a tool, rerun mcp-remote, and immediately see how Claude or ChatGPT reacted. That feedback loop — build, proxy, test helped us debug fast, ship safely, and shape better defaults.
Testing with live agents gave us real-world validation, but when you're in the middle of development, opening up Claude or ChatGPT just to check if a tool is wired correctly isn’t exactly efficient.
That’s why MCP Inspector quickly became our go-to. We kept it open constantly, like a second terminal.
It gave us a live view of the tools we’d registered:
We could plug in sample inputs, see what responses came back, and tweak things without jumping into a full agent session. That tight loop helped us catch all kinds of small mistakes early — missing scopes, unclear param shapes, unexpected return formats.
Inspector became our daily safety net. If you’re building an MCP server, this kind of introspection tool isn’t a nice-to-have. It’ll save you hours of blind debugging, especially once you’ve got more than a handful of tools to manage.
We’re far from done. This first version was about standing something up, proving that a secure, agent-friendly MCP server could be more than a demo. Now we’re pushing further.
Next on our list:
We're building toward something that's not just agent-compatible, but agent-native, a tool surface designed from the ground up to support secure, scoped, LLM-triggered actions.
If you’re thinking of building your own MCP server or even just experimenting, here’s what we’d tell you, dev to dev:
We’re still building, but excited about where this is going. If the first era of software was humans calling APIs, and the second is agents calling tools, then MCP is where it all gets real-
Want to deploy your own secure MCP server — with scopes, streaming, and OAuth built in? Sign up for a Free account with Scalekit and accelerate your agentic infrastructure. Need help with tool design, auth, or debugging? Book time with our auth experts.
While Server Sent Events offer simplicity and automatic reconnection for basic updates, HTTP streamable provides superior flexibility for complex AI interactions. It allows developers to bundle multiple items, handle large payloads, and deliver structured JSON or multi-step outputs that LLMs require. Although it necessitates more effort to manage reconnections and unpack data, the ability to provide rich, granular tool outputs makes it the preferred choice for sophisticated agentic workflows. Scalekit recommends this approach when building infrastructure meant to support diverse and high-volume agent requests.
Securing agentic access requires a robust machine-to-machine authentication strategy. We implement OAuth 2.1 using the client credentials grant to issue short-lived, scoped tokens. Every incoming request must carry a verifiable JSON Web Token in the authorization header. By validating the signature and claims before any tool logic executes, we ensure that only authorized entities can interact with the server. This standards-based approach provides a secure, scalable foundation for AI agents to act on behalf of organizations while maintaining strict isolation and auditability across all tool executions.
Scope-based access control ensures granular security by defining exactly what permissions are required for each specific tool. When an agent attempts to execute a command, the server inspects the JWT claims to verify the presence of necessary scopes like read or write permissions. This prevents accidental overreach and ensures that agents only perform actions they are explicitly authorized for. By enforcing these boundaries at the tool level, developers can create a safe environment where multiple agents coexist with varying levels of access, minimizing the blast radius of any potential credential compromise.
Agents can programmatically discover security requirements through a protected resource metadata endpoint located at a well-known path. This endpoint returns a JSON document detailing the authorization server, supported bearer methods, and a list of available scopes. By adhering to the OAuth 2.1 protected resource metadata specification, the MCP server becomes self-describing. This allows sophisticated, standards-aware agents to dynamically understand how to obtain tokens and which permissions are necessary to call specific tools, facilitating smoother integrations and reducing manual configuration errors for developers and engineering teams.
The client credentials grant is specifically designed for machine-to-machine or agent-to-agent communication where no end-user is present during the transaction. In an MCP context, AI agents act as independent entities or on behalf of a workspace, making traditional user-based flows impractical. Using this grant type allows the authorization server to issue tokens directly to the agent client based on its own credentials. This simplifies the authentication process for automated systems while providing a secure way to manage short-lived access tokens that are tied to specific, limited scopes within the platform.
Validation involves testing tools against live clients like Claude Desktop or ChatGPT using proxy utilities like mcp-remote. These tools handle complex issues such as Cross-Origin Resource Sharing and header forwarding, allowing developers to see how an LLM reacts to tool descriptions and input schemas. Real-world testing often reveals hidden issues like scope mismatches or unclear parameter definitions that automated tests might miss. Using the MCP Inspector also provides a manual way to verify tool registration and input validation, ensuring the server responds correctly before it is exposed to agent-led discovery.
A secure MCP tool must include a clear name, a detailed description, enforced scopes, and a robust execution function. Using schema validation libraries like Zod is critical to ensure that inputs from LLMs match expected formats and types. Because the consumer is an AI rather than a human, descriptions must be precise to help the agent understand when and how to use the tool. This structured approach, combined with centralized registration, makes it easier to reason about the security posture and intended functionality of the entire toolset being exposed to agents.
From a CISO perspective, the MCP architecture built on OAuth 2.1 ensures that every agent interaction is traceable and revocable. By requiring signed JWTs for every request, organizations can maintain detailed logs of which agent triggered which tool and with what permissions. The use of short-lived tokens and granular scopes follows the principle of least privilege, reducing the risk of unauthorized data access. This platform-first mindset transforms AI integrations from experimental scripts into production-grade infrastructure that meets the rigorous security and compliance standards required by modern B2B enterprises and their customers.
Developers should prioritize standards-alignment and security-first design from the very beginning. This includes implementing OAuth early, defining granular scopes for every action, and making the server introspectable through metadata endpoints. As the ecosystem evolves from humans calling APIs to agents calling tools, the infrastructure must be resilient enough to handle dynamic discovery and automated decision-making. By building on proven auth protocols and focusing on clear tool definitions, engineering teams can create secure, scalable environments where AI agents perform complex tasks reliably without compromising the integrity of the underlying data or systems.