Six months ago, most of us hadn’t even heard of MCP.
There was no standard playbook, no “right” way to expose tools to LLMs. Just a few GitHub repos, some early experiments, and a lot of shared frustration in Discord threads. But something about it clicked — the idea that AI agents could go beyond chat and actually do things, securely and reliably.
As an auth platform, we knew we had to take MCP seriously as an infrastructure. Because when agents start taking actions, everything comes back to trust, auth, and scopes.
So we rolled up our sleeves and brought up an internal MCP server for Scalekit. Along the way, we leaned on every working implementation out there, learned from the community’s growing pains, contributed back fixes, and most importantly, shaped the infrastructure we’d want other teams (and their agents) to rely on.
Before we even touched scopes or auth, we had one big question to answer: “How should responses be streamed back to the agent?”
This wasn’t just about UX or output polish. It impacted everything, from how clients parsed responses, to how rich the tool outputs could be, to how easily agents could consume what we were sending.
It sounds like a small implementation detail, but it shaped how we designed the entire thing. So let me put it the way we thought about it…
Picture this: you're at a restaurant. You tell the waiter, “Keep bringing me dishes as the kitchen prepares them.”
That’s what Server-Sent Events (SSE) does. It delivers each item — one at a time, in a predictable format, without you having to ask again. If the connection drops, it picks up where it left off. Reliable, minimal, and great when you just need simple updates.
But that’s all you get:
Now imagine instead of a waiter, you had a delivery van from the kitchen.
It can carry full meals, sides, drinks, instructions, and even partial updates — all streamed to you, however you want to unpack them. That’s HTTP Streamable.
It’s a bit more work to manage as you have to handle reconnects and unpacking, but the flexibility is worth it when you're working with LLMs that need structured JSON, multi-step outputs, or large payloads.
We started small — just a couple of internal tools wired up to the MCP server. These weren’t production-critical yet. The goal was to validate:
Can agents call our tools, and can we control what happens when they do?
Each tool had a clear structure:
Here’s what one of the earliest tools looked like — list_environments:
We used Zod schemas to validate inputs for tools that needed them:
And we kept registration clean and centralized:
Having all tools registered declaratively in one place made it easy to reason about what we were exposing and how.
Even at this early stage, we knew that eventually every tool would need clear access boundaries. But first, we just wanted the wiring to feel right, both for us as developers and for the agents calling them.
Here's a full demo of how Scalekit's MCP servers work:
With a few tools wired up, it was time to protect them.
Since the MCP server was going to be a public-facing surface callable by real agents, over the internet, we had no business leaving it open. This wasn’t just for demo use. It was a real integration point, and we treated it like one from the start.
From the very beginning, we approached the MCP server with a platform mindset:
This wasn’t about locking things down for the sake of it. It was about clarity — for us, and for anyone calling the tools.
We also knew agents needed to discover and understand what access they had. So we aligned on using OAuth 2.1 and the protected resource metadata spec — so that any agent could programmatically understand what scopes were required, how to get tokens, and what the rules of the game were.
This decision addressed concerns about accidental overreach.
In the next section, we’ll break down exactly how we implemented this — the four pieces that made our auth layer work.
We didn’t have to look far for auth — we used Scalekit’s own OAuth2-based MCP auth. It gave us everything we needed to lock down access while keeping agent interactions smooth.
Every agent or system that talks to the MCP server needs a token. We used the OAuth 2.1 client credentials grant, which is ideal for machine-to-machine use cases — like AI agents acting on behalf of a workspace or org.
These tokens are scoped, short-lived, and issued from the same authorization server we already run for all our Scalekit integrations.
Every incoming request carries a Bearer token in the header:
The token is a JWT — signed, structured, and verifiable. If the signature or claims are invalid, the request is rejected right away. No tool logic runs until auth is confirmed.
To support dynamic discovery, especially for standards-aware agents, we exposed a protected resource metadata endpoint:
It returns a JSON document that tells clients how this resource is protected and what scopes it supports:
This makes the server self-describing, an important feature when agents are dynamically choosing tools and preparing their auth flows.
Once the token is verified, we inspect the scopes in the claims:
Every tool defines what scopes it needs. If the token doesn’t carry those scopes, the request is rejected — clearly, with a structured error message.
This keeps access granular and explicit — no tool can be called without the exact permission attached.
That’s it. Auth-first, scoped by default, and standards-aligned.
This setup gave us confidence to expose tools to external agents, knowing they’d be properly isolated, traceable, and revocable.
Once auth was wired in and a few tools were scoped, we needed to answer the real question: “Can agents actually use this?”
It’s one thing to build a working endpoint. It’s another to see how an actual agent interacts with your tool — what it discovers, what it fails on, and how it reacts to unclear definitions or missing scopes.
So we tested with real clients:
To connect them, we used mcp-remote — which turned out to be critical.
This little proxy handled a bunch of headaches for us:
It became our primary bridge between dev instances and agent clients — and made local debugging feel almost production-grade.
Real client testing surfaced a ton of useful issues early:
We’d tweak a tool, rerun mcp-remote, and immediately see how Claude or ChatGPT reacted. That feedback loop — build, proxy, test helped us debug fast, ship safely, and shape better defaults.
Testing with live agents gave us real-world validation, but when you're in the middle of development, opening up Claude or ChatGPT just to check if a tool is wired correctly isn’t exactly efficient.
That’s why MCP Inspector quickly became our go-to. We kept it open constantly, like a second terminal.
It gave us a live view of the tools we’d registered:
We could plug in sample inputs, see what responses came back, and tweak things without jumping into a full agent session. That tight loop helped us catch all kinds of small mistakes early — missing scopes, unclear param shapes, unexpected return formats.
Inspector became our daily safety net. If you’re building an MCP server, this kind of introspection tool isn’t a nice-to-have. It’ll save you hours of blind debugging, especially once you’ve got more than a handful of tools to manage.
We’re far from done. This first version was about standing something up, proving that a secure, agent-friendly MCP server could be more than a demo. Now we’re pushing further.
Next on our list:
We're building toward something that's not just agent-compatible, but agent-native, a tool surface designed from the ground up to support secure, scoped, LLM-triggered actions.
If you’re thinking of building your own MCP server or even just experimenting, here’s what we’d tell you, dev to dev:
We’re still building, but excited about where this is going. If the first era of software was humans calling APIs, and the second is agents calling tools, then MCP is where it all gets real.