
When a developer connects Claude Code to your company's internal APIs, a few things happen in quick succession:
None of that activity shows up in your SIEM by default.
None of it goes through your DLP scanner unless you've built that integration yourself.
The access is authorised — the employee has credentials — but the agent's specific actions are not governed the same way a human's would be.
That's the ground truth for most enterprises with AI tools deployed today. This piece lays out the specific risks that situation creates, concretely enough to be useful to the security team that owns the response.
The most common risk isn't malicious behaviour — it's agents doing more than they were meant to, because nothing told them to stop.
What's happening?
An employee has read access to customer records in Salesforce. They connect Salesforce's MCP server to their Claude instance and ask it to "prepare a summary of our key accounts." The agent, interpreting "key accounts" broadly, queries all records the user has access to — not a curated list of 20. It pulls associated contacts, opportunity history, and support tickets across hundreds of accounts. Nothing about this is malicious. The agent just doesn't have the same mental model of "appropriate scope" that the employee had in mind.
Or, an agent has been granted access to a GitHub MCP server to help with code review. A multi-step workflow accidentally triggers a tool call that deletes a branch — a tool the user had permission to call, but would never have called manually in that context.
Why it happens?
Agents don't have the judgment that makes humans naturally self-limiting. A developer reviewing code pauses before deleting something. An agent executing a workflow doesn't. Without explicit action-level access controls, agents operate at the full extent of whatever permissions the underlying user has.
What this looks like in practice:
• An agent making 2,000 API calls against an internal system in an automated overnight run
• A bulk export of customer records that no single human would execute in a normal working session
• A tool call that modifies production data as a side effect of a task that was supposed to be read-only
• An agent querying a database table that contains data it was never intended to reach, because the user's access didn't exclude it
The control:
Action-level allow-lists enforced at the MCP gateway. Not which systems an agent can access, but which specific tools within those systems — and the gateway blocks anything not on the list.
MCP tool access is governed by whatever authentication the MCP server enforces. If a server is connected to a gateway with proper access controls, the right policy applies. If it's a directly-configured server with minimal auth — which describes most individually-configured developer setups — the controls may be much weaker.
What's happening:
An employee is granted access to a set of tools for a specific purpose. But the authorization model is loose: if the agent knows a tool exists, it may attempt to call it regardless of whether the user was explicitly authorized for that specific action. Without action-level enforcement and centralized logging, these calls either succeed (a problem) or fail silently (invisible).
A concrete scenario:
A developer connects their AI tool to an internal database MCP server. The server was set up for read-only analytics queries. But the same server also exposes a write_record tool that a DBA used during setup and never removed. The agent, exploring what tools are available, finds and attempts to call it. Without explicit tool-level allow-lists, this depends on the underlying database permissions — and if the user has write access to that database for other reasons, the call succeeds.
What this looks like in practice:
• A developer's agent attempting to access a production database that wasn't in their approved tool set
• An agent calling a tool it discovered through the MCP tool-listing mechanism even though the user doesn't have permission for that specific action
• Failed tool calls that should be security events but are logged nowhere IT can see
• An agent escalating its own capabilities by exploring which tools a server exposes beyond what it was intended to use
The control:
Explicit allow-lists (not deny-lists) at the tool level. The gateway surfaces only the tools that have been explicitly approved for each group, not everything the MCP server exposes.
When an agent calls a tool that retrieves customer data, that data travels through several layers: the MCP server, the connection to the AI tool, and into the model's context window. Along the way, there are multiple points where sensitive information can be logged, cached, or incorporated into responses that end up in the wrong place.
The patterns to watch for:
What this looks like in practice?
• A support engineer's agent pulling full customer records including PII "for context," with that information surfacing in shared conversation logs visible to other team members
• An agent executing a tool call that includes an entire customer record as input when it only needed the customer ID
• Credit card numbers appearing in API call logs that aren't covered by your PCI compliance controls
• GDPR-protected data flowing through a US-based AI tool provider's infrastructure without a data processing agreement covering that specific data path
The control:
Gateway-level DLP with configurable policies per data pattern. Credit card patterns in tool responses get redacted before reaching the model. High-severity patterns (SSN, passport numbers) block the tool call entirely. All DLP decisions are logged.
Human access patterns have natural rhythms that make anomalies detectable. Agents don't. An agent can run overnight, make thousands of tool calls, and produce no human-visible trace — nothing that would appear suspicious the way a human logging in at 3am from an unexpected location would.
Why this matters:
Two distinct threats converge here. First, compromised credentials. If an API key stored in a developer's local MCP config is extracted, an attacker using it through MCP can operate at machine speed, querying internal systems with no friction and no human behaviour patterns to detect. Second, misconfigured automation. An agent scheduled to run nightly can do significant damage before anyone notices — not through malicious intent, but through a configuration error that wasn't caught before deployment.
What this looks like in practice:
• An agent scheduled to run nightly and "clean up processed records" that was misconfigured to delete records that haven't been processed — running silently for three weeks before someone notices missing data
• An API key extracted from a developer's laptop config being used by an external actor to query internal systems over a holiday weekend
• A bulk data extraction that runs over Saturday and Sunday when no one is monitoring the audit log
• An agent that hits rate limits on one system and silently starts querying a different system that wasn't part of the original task
The control:
Anomaly detection and security events at the gateway. Off-hours tool calls generate alerts. Unusual call volumes (orders of magnitude above normal) trigger investigation. Unauthorized tool attempts are logged as security events, not silent failures.
This is less a specific attack vector and more a structural vulnerability that amplifies every other risk on this list.
When something goes wrong with AI agent activity — and at sufficient scale, something eventually will — the investigation requires a clear log of what happened: which user's agent, which tool was called, at what time, with what inputs, with what outputs.
Without central MCP infrastructure, this log either doesn't exist or is fragmented across:
• MCP server logs (if they exist, often in different formats per system)
• Model provider conversation history (typically shows prompts and responses, not tool call details)
• Application-level access logs in each downstream system (shows the user action, not that it was agent-initiated)
• The employee's local AI tool conversation history (on their laptop, inaccessible to IT)
Reconstructing an incident from these sources requires correlating data across multiple systems that weren't designed to talk to each other, in formats that don't align, covering different time ranges with different retention policies.
For organisations with compliance obligations, this gap is increasingly a live audit question. SOC 2 auditors are beginning to ask about AI agent activity controls. GDPR requires demonstrating the data processing activities of automated systems. HIPAA's audit controls apply regardless of whether the access was human or agent-initiated.
What an audit trail needs to include:
• User identity (whose credentials authorised the session)
• Agent/tool client (Claude Code, Copilot, Cursor, etc.)
• Timestamp of each tool call
• Tool name and MCP server
• Input parameters (sanitised for DLP)
• Response summary or metadata
• Policy decisions applied (any DLP redactions, blocks, or alerts triggered)
The control:
A central gateway is the only architectural pattern that gives you this log comprehensively. When all agent traffic flows through one point, you get one coherent audit trail — not a post-incident puzzle assembled from five different log systems.
The risks above aren't arguments against MCP. They're arguments for the infrastructure layer that makes MCP safe to deploy at enterprise scale.
The controls that address each risk:
None of this requires slowing down AI tool adoption. It requires building the infrastructure that makes adoption sustainable — and building it before the scale becomes unmanageable.
Traditional API access risks involve humans making deliberate calls with understood intent. MCP agent risks involve agents making calls that weren't explicitly intended by the employee — emergent behaviour from model reasoning, broader scope than expected, or side effects of multi-step workflows. The risk profile is different because the actor (the agent) behaves differently than a human would in the same situation.
Universal to any MCP-based integration, regardless of which AI tool is in use. Claude Code, GitHub Copilot Agent Mode, Cursor — they all speak the same MCP protocol. The risks come from the architecture (agents with inherited user credentials, tool calls that bypass human judgment), not from any specific tool's implementation.
Start with the audit gap, because it affects your ability to respond to every other risk. If you can't see what's happening, you can't investigate anything. Getting a gateway in place that produces a coherent audit log is the foundation for addressing all other risks. Action-level access controls are the second priority — they limit blast radius. DLP and anomaly detection follow.
No. Existing DLP tools typically operate on email, web uploads, and endpoint file transfers — they weren't designed to inspect MCP protocol traffic or the content of AI tool call responses. An MCP gateway with built-in DLP is the right place to enforce content inspection for agent activity.
The approach that works: treat AI agent activity the same as any other automated processing under your existing compliance frameworks. SOC 2's audit control requirements, HIPAA's access log requirements, GDPR's data processing accountability requirements — all of these apply to agent-initiated actions. A centralised gateway audit log that captures who, what, when, and which data was involved is the artifact that satisfies these requirements. Many security teams are starting to include AI agent access controls explicitly in their annual SOC 2 and ISO 27001 control documentation.