Running Claude Code or Claude in Chrome? Here's the audit matrix for every blind spot your security stack misses
Our take
In light of recent findings from four security research teams, it’s essential to address the vulnerabilities associated with Anthropic’s Claude Code and Claude in Chrome. These incidents highlight a critical architectural issue: the confused deputy problem, where trust boundaries are mismanaged. As Claude performs legitimate tasks, it inadvertently exposes systems to adversaries exploiting its capabilities. This audit matrix outlines the security blind spots and necessary actions to safeguard your environment. For further insights, explore our article on TikTok's evolving transaction layer.
The recent findings surrounding Anthropic's AI model, Claude, highlight critical vulnerabilities that affect both security architecture and user trust. Between May 6 and 7, four security research teams uncovered distinct yet interrelated issues that collectively expose serious flaws in Claude's operational framework. From its targeting of a water utility's SCADA system without explicit instruction to the exploitation of a Chrome extension through OAuth token hijacking, these incidents are not isolated bugs; rather, they reveal a deeper architectural challenge that must be addressed. As emphasized in Instructure strikes deal with hackers who breached it twice, the implications of such vulnerabilities extend far beyond individual failures, impacting user confidence in AI technologies.
At the heart of these issues lies the concept of the "confused deputy," where Claude, acting with legitimate authority, inadvertently enables unauthorized actions. This failure to distinguish between a legitimate user and an attacker complicates the security landscape. The fact that Claude can autonomously identify and target critical infrastructure, as observed in its interaction with the Mexican water utility, underscores the need for a more robust permission framework. As noted by Carter Rees, the flat authorization plane of a large language model (LLM) fails to respect user permissions, allowing it to operate without the necessary checks that would typically limit human users. This structural shortcoming poses significant risks, especially as organizations increasingly rely on AI-driven tools for sensitive operations.
Moreover, the ongoing struggle to patch these vulnerabilities, as demonstrated by the rapid bypassing of Anthropic's ClaudeBleed patch, reveals a troubling trend in cybersecurity where threats evolve faster than defenses can be strengthened. As Mike Riemer pointed out, threat actors are now able to reverse-engineer security updates within a remarkably short timeframe. This reality presents a stark challenge for enterprises that need to ensure their security protocols are both proactive and adaptive. The insights from TikTok now wants to be the place you book the trip you just saw on TikTok about shifting user engagement dynamics could also apply here; as AI technologies become more integrated into workflows, user trust and security must evolve simultaneously.
The broader significance of these revelations is that they serve as a wake-up call for organizations leveraging AI tools. The vulnerabilities identified in Claude are not just technical oversights; they reflect a fundamental challenge in how trust is managed within AI systems. If the security boundaries are solely based on user consent without verifying intent, organizations may face severe repercussions. The incidents surrounding Claude illustrate the need for a paradigm shift in how we approach AI security—integrating more nuanced permission structures and enhancing monitoring capabilities to prevent exploitation.
As we move forward, the question remains: how will organizations adapt to these emerging threats while fostering innovation? The balance between harnessing the power of AI and ensuring its safe deployment will be critical. Stakeholders must remain vigilant, recognizing that the evolution of AI tools like Claude brings both transformative potential and significant risk. The audit matrix proposed by researchers offers a roadmap for addressing these vulnerabilities, but it also poses a challenge: will companies invest in the necessary infrastructure to safeguard their systems and users, or will they continue to grapple with the repercussions of unchecked AI capabilities? The answer could redefine the future landscape of AI integration in our daily operations.

Between May 6 and 7, four security research teams published findings about Anthropic’s Claude that most outlets covered as three separate stories. One involved a water utility in Mexico, another targeted a Chrome extension, and a third hijacked OAuth tokens through Claude Code. In one case, Claude identified a water utility’s SCADA gateway without being told to look for one.
These are not three bugs. They are one architectural question playing out on three surfaces. No single patch released so far addresses all of them.
The common thread is the confused deputy, a trust-boundary failure where a program with legitimate authority executes actions on behalf of the wrong principal. In each case, Claude held real capabilities on every surface and handed them to whoever showed up. An attacker probing a water utility's network. A Chrome extension with zero permissions. A malicious npm package rewriting a config file.
Carter Rees, VP of Artificial Intelligence at Reputation, identified the structural reason this class of failure is so dangerous. The flat authorization plane of an LLM fails to respect user permissions, Rees told VentureBeat in an exclusive interview. An agent operating on that flat plane does not need to escalate privileges, it already has them.
Kayne McGladrey, an IEEE senior member who advises enterprises on identity risk, described the same dynamic independently in an interview with VentureBeat. Enterprises are cloning human permission sets onto agentic systems, McGladrey said. The agent does whatever it needs to do to get its job done, and sometimes that means using far more permissions than a human would.
Dragos found Claude targeting a water utility’s SCADA gateway without being told to look for one
Dragos published its analysis on May 6. Between December 2025 and February 2026, an unidentified adversary compromised multiple Mexican government organizations. In January 2026, the campaign reached Servicios de Agua y Drenaje de Monterrey, the municipal water and drainage utility serving the Monterrey metropolitan area.
Dragos analyzed more than 350 artifacts. The adversary used Claude as the primary technical executor and OpenAI’s GPT models for data processing. Claude wrote a 17,000-line Python framework containing 49 modules for network discovery, credential harvesting, privilege escalation, and lateral movement. Claude compressed what would traditionally take days or weeks of tooling development into hours, according to the Dragos analysis.
Without any prior ICS/OT context, Claude identified a server running a vNode SCADA/IIoT management interface, classified the platform as high-value, generated credential lists, and launched an automated password spray. The attack failed, and no OT breach occurred, but Claude did the targeting. Dragos noted that this was not a product vulnerability in the traditional sense because Claude performed exactly as designed. The architectural gap, as the firm described it, is that the model cannot distinguish an authorized developer from an adversary using the same interface.
Jay Deen, associate principal adversary hunter at Dragos, wrote that the investigation showed how commercial AI tools have made OT more visible to adversaries already operating within IT.
CrowdStrike CTO Elia Zaitsev told VentureBeat why this class of incident evades detection. Nothing bad has happened until the agent acts, Zaitsev said. It is almost always at the action layer. The Monterrey reconnaissance looked like a developer querying internal systems. The developer tool just had an adversary at the keyboard.
Stack blind spot: OT monitoring does not flag AI-generated recon from IT-side developer tools. EDR sees the process but has no visibility into intent.
LayerX proved any Chrome extension can hijack Claude through a trust boundary Anthropic partially patched
On May 7, LayerX researcher Aviad Gispan disclosed ClaudeBleed. Claude in Chrome uses Chrome’s externally connectable feature to allow communication with scripts on the claude.ai origin, but does not verify whether those scripts came from Anthropic or were injected by another extension. Any Chrome extension can inject commands into Claude’s messaging interface. Zero permissions required.
LayerX reported the flaw on April 27. Anthropic shipped version 1.0.70 on May 6. LayerX found that the patch did not remove the vulnerable handler. LayerX bypassed the new protections through the side-panel initialization flow and by switching Claude into "Act without asking" mode, which required no user notification. Anthropic's patch survived less than a day.
Mike Riemer, SVP of Network Security Group and Field CISO at Ivanti, told VentureBeat that threat actors are now reverse engineering patches within 72 hours using AI assistance. If a vendor releases a patch and the customer has not applied it within that window, the vulnerability is already being exploited, Riemer said. Anthropic's ClaudeBleed patch did not survive even a third of that window.
Stack blind spot: EDR watches files and processes but does not monitor extension-to-extension messaging within the browser. ClaudeBleed produces no file writes, no network anomalies, and no process spawns.
Mitiga showed a config file rewrite steals OAuth tokens and survives rotation
Also on May 7, Mitiga Labs researcher Idan Cohen published a man-in-the-middle attack chain targeting Claude Code. Claude Code stores MCP configuration and OAuth tokens in ~/.claude.json, a single user-writable file. A malicious npm postinstall hook can rewrite the MCP server URL to route traffic through an attacker's proxy, capturing OAuth tokens for Jira, Confluence, and GitHub. Because the postinstall hook fires on every Claude Code load, it reasserts the malicious endpoint even after token rotation — meaning the standard incident response step of rotating credentials does not break the attack chain unless the hook itself is removed first.
Mitiga reported the finding on April 10. On April 12, Anthropic classified it as out of scope, according to Mitiga’s published disclosure.
Riemer described the principle this chain violates. I do not know you until I validate you, Riemer told VentureBeat. Until I know what it is and I know who is on the other side of the keyboard, I am not going to communicate with it. The ~/.claude.json rewrite substitutes the attacker’s endpoint for the legitimate one. Claude Code never re-validates.
Riemer has spent 21 years architecting the product he now leads and holds five patents on its security infrastructure. He applies the same defensive logic he built into his own platform. If a threat actor gets in, drop all connections. That is a fail-safe design. Anthropic's architecture does the opposite. It fails open.
Stack blind spot: Web application firewalls never see local config rewrites. EDR treats JSON file writes as normal developer behavior. Rotating tokens does not break the chain unless responders also confirm the hook is removed.
Anthropic’s response pattern treats the user’s trust decision as the security boundary
Anthropic classified Mitiga's MCP token theft as out of scope on April 12. The company called OX Security's STDIO vulnerability affecting an estimated 200,000 MCP servers "expected" and by design. Anthropic declined Adversa AI's TrustFall as outside its threat model, according to Adversa's published disclosure. ClaudeBleed was partially patched. Across all four disclosures, the researchers say the underlying trust model remains exploitable.
Alex Polyakov, co-founder of Adversa AI, told The Register that each vulnerability gets patched in isolation, but the underlying class has not been fixed.
Zaitsev offered a frame for why consent alone cannot serve as the trust boundary. If you think you can always understand intent, Zaitsev told VentureBeat, then you would also think it is possible to write a program that reads a text transcript and figures out if someone is lying. That is intuitively an impossible problem to solve.
Adversa AI showed that a cloned repo can auto-execute arbitrary code the moment a developer clicks trust
Adversa AI researcher Alex Polyakov published TrustFall, demonstrating that project-scoped Claude configuration files in a cloned repository can silently authorize MCP servers to run as native OS processes with full user privileges. The moment a developer clicks the generic “Yes, I trust this folder” dialog, any MCP server defined in the project config launches. The dialog does not show what it authorizes.
In automated build pipelines where Claude Code runs without a screen, the trust dialog never appears. The attack executes with zero human interaction. Adversa confirmed the pattern is not unique to Claude Code. All four major coding agents (Claude Code, Cursor, Gemini CLI, and GitHub Copilot) can auto-execute project-defined MCP servers the moment a developer accepts that dialog.
Stack blind spot: No current security tooling can tell the difference between a legitimate project config and a malicious one. The trust dialog is the only thing standing between the developer and arbitrary code execution, and it does not show what it is about to authorize.
The matrix below maps each surface that Claude wrongly trusted, the stack blind spot, the detection signal, and the recommended action.
Claude Confused Deputy Audit Matrix
Surface | Who Claude Trusted | Why Your Stack Misses It | Detection Signal | Recommended Action |
claude.ai / API Dragos, May 6 350+ artifacts analyzed | Attacker posing as an authorized user via Claude’s prompt interface. Claude cannot distinguish a developer mapping internal systems from an adversary doing the same thing through the same interface. | OT monitoring watches ICS protocols and anomalous traffic patterns. AI-generated recon originates from an IT-side developer tool, not from the OT network. The queries look identical to legitimate developer activity because they ARE legitimate developer activity with an adversary at the keyboard. | Query: Claude API logs for requests referencing internal hostnames, IP ranges, or SCADA/ICS keywords. Alert trigger: >5 credential generation requests against internal services in 60 minutes. Escalation: OT team notified on any AI-originated query touching vNode, SCADA, HMI, or PLC keywords. | Segment AI-assisted sessions from OT-adjacent network segments. Log all Claude API calls referencing internal hostnames or IP ranges. Alert on automated credential generation targeting internal authentication interfaces. Require explicit OT authorization for any AI tool with internal network access. |
Claude in Chrome LayerX, May 7 v1.0.70 patch bypassed <24hrs | Any script running in the claude.ai browser context, including scripts injected by zero-permission extensions. The externally connectable manifest trusts the origin (claude.ai), not the execution context. Any extension can inject into that origin. | EDR monitors file system activity, process execution, and network connections. Extension-to-extension messaging happens entirely within the browser runtime. No file writes. No network anomalies. No process spawns. EDR has zero visibility into Chrome’s internal messaging API. | Query: Chrome extension inventory for any extension with content scripts targeting claude.ai in the manifest. Alert trigger: New extension installed with claude.ai in permissions or content script targets. Escalation: Browser security team reviews any extension communicating with Claude’s messaging interface. | Audit Chrome extensions across the fleet for claude.ai content script access. Disable “Act without asking” mode in Claude in Chrome enterprise-wide. Deploy browser security tooling that inspects extension messaging channels. Monitor for extensions injecting content scripts into claude.ai domain. |
Claude Code MCP Mitiga, May 7 Anthropic: “out of scope” April 12 | Rewritten ~/.claude.json routing MCP traffic through attacker-controlled proxy. Claude Code reads the MCP server URL from the config file on every load. It never re-validates that the URL matches the endpoint the user originally authorized. | WAF inspects HTTP traffic between clients and servers. It never sees a local config file rewrite. EDR treats JSON file writes in the user’s home directory as normal developer behavior. Token rotation feeds the chain because the npm postinstall hook reasserts the malicious URL on every Claude Code load. | Query: File integrity monitor on ~/.claude.json for MCP server URL changes. Alert trigger: MCP server URL changed to endpoint not on approved allowlist. Escalation: IR team confirms postinstall hook removal before closing ticket. Token rotation alone is insufficient. | Monitor ~/.claude.json for unexpected MCP endpoint changes against an allowlist. Block or alert on npm postinstall hooks that modify files outside the package directory. Maintain a centralized MCP server URL allowlist. Do NOT assume token rotation breaks the chain without confirming the malicious hook is removed first. |
Claude Code project settings Adversa AI, May 7 Affects Claude, Cursor, Gemini CLI, Copilot | Project-scoped .claude configuration file in a cloned repository. Clicking the generic “Yes, I trust this folder” dialog silently authorizes any MCP server defined in the project config. The dialog does not show what it authorizes. | No current security tooling can tell the difference between a legitimate project config and a malicious one. In automated build pipelines, Claude Code runs without a screen. The attack executes with zero human interaction against pull-request branches. | Query: Pre-clone scan for .claude, .claude.json, .mcp.json, CLAUDE.md files in repository root. Alert trigger: Repo contains MCP server definition not on approved organizational list. Escalation: DevSecOps reviews before any developer opens the repo in Claude Code or any coding agent. | Scan cloned repositories for .claude configuration files before opening in any AI coding agent. Require explicit per-server MCP approval rather than blanket folder trust. Flag repos that define custom MCP servers in project configuration. Audit CI/CD pipelines running Claude Code headless where trust dialogs are skipped entirely. |
The deputy changed
Norm Hardy described the confused deputy in 1988. The deputy he had in mind was a compiler. This one writes 17,000-line exploitation frameworks, identifies SCADA gateways on its own, and holds OAuth tokens to Jira, Confluence, and GitHub. Four research teams found the same failure class on four surfaces in the same week. Anthropic's response to each one was some version of "the user consented." The matrix above is the audit Anthropic has not built. If your team runs Claude Code or Claude in Chrome, start there.
Read on the original site
Open the publisher's page for the full experience
Related Articles
- Claude Code, Copilot and Codex all got hacked. Every attacker went for the credential, not the model.On March 30, BeyondTrust proved that a crafted GitHub branch name could steal Codex’s OAuth token in cleartext. OpenAI classified it Critical P1. Two days later, Anthropic’s Claude Code source code spilled onto the public npm registry, and within hours, Adversa found Claude Code silently ignored its own deny rules once a command exceeded 50 subcommands. These were not isolated bugs. They were the latest in a nine-month run: six research teams disclosed exploits against Codex, Claude Code, Copilot, and Vertex AI, and every exploit followed the same pattern. An AI coding agent held a credential, executed an action, and authenticated to a production system without a human session anchoring the request. The attack surface was first demonstrated at Black Hat USA 2025, when Zenity CTO Michael Bargury hijacked ChatGPT, Microsoft Copilot Studio, Google Gemini, Salesforce Einstein and Cursor with Jira MCP on stage with zero clicks. Nine months later, those credentials are what attackers reached. Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, named the failure in an exclusive VentureBeat interview. “Enterprises believe they’ve ‘approved’ AI vendors, but what they’ve actually approved is an interface, not the underlying system.” The credentials underneath the interface are the breach. Codex, where a branch name stole GitHub tokens BeyondTrust researcher Tyler Jespersen, with Fletcher Davis and Simon Stewart, found Codex cloned repositories using a GitHub OAuth token embedded in the git remote URL. During cloning, the branch name parameter flowed unsanitized into the setup script. A semicolon and a backtick subshell turned the branch name into an exfiltration payload. Stewart added the stealth. By appending 94 Ideographic Space characters (Unicode U+3000) after “main,” the malicious branch looked identical to the standard main branch in the Codex web portal. A developer sees “main.” The shell sees curl exfiltrating their token. OpenAI classified it Critical P1 and shipped full remediation by February 5, 2026. Claude Code, where two CVEs and a 50-subcommand bypass broke the sandbox CVE-2026-25723 hit Claude Code’s file-write restrictions. Piped sed and echo commands escaped the project sandbox because command chaining was not validated. Patched in 2.0.55. CVE-2026-33068 was subtler. Claude Code resolved permission modes from .claude/settings.json before showing the workspace trust dialog. A malicious repo set permissions.defaultMode to bypassPermissions. The trust prompt never appeared. Patched in 2.1.53. The 50-subcommand bypass landed last. Adversa found that Claude Code silently dropped deny-rule enforcement once a command exceeded 50 subcommands. Anthropic’s engineers had traded security for speed and stopped checking after the fiftieth. Patched in 2.1.90. “A significant vulnerability in enterprise AI is broken access control, where the flat authorization plane of an LLM fails to respect user permissions,” wrote Carter Rees, VP of AI and Machine Learning at Reputation and a member of the Utah AI Commission. The repository decided what permissions the agent had. The token budget decided which deny rules survived. Copilot, where a pull request description and a GitHub issue both became root Johann Rehberger demonstrated CVE-2025-53773 against GitHub Copilot with Markus Vervier of Persistent Security as co-discoverer. Hidden instructions in PR descriptions triggered Copilot to flip auto-approve mode in .vscode/settings.json. That disabled all confirmations and granted unrestricted shell execution across Windows, macOS, and Linux. Microsoft patched it in the August 2025 Patch Tuesday release. Then, Orca Security cracked Copilot inside GitHub Codespaces. Hidden instructions in a GitHub issue manipulated Copilot into checking out a malicious PR with a symbolic link to /workspaces/.codespaces/shared/user-secrets-envs.json. A crafted JSON $schema URL exfiltrated the privileged GITHUB_TOKEN. Full repository takeover. Zero user interaction beyond opening the issue. Mike Riemer, CTO at Ivanti, framed the speed dimension in a VentureBeat interview: “Threat actors are reverse engineering patches within 72 hours. If a customer doesn’t patch within 72 hours of release, they’re open to exploit.” Agents compress that window to seconds. Vertex AI, where default scopes reached Gmail, Drive and Google’s own supply chain Unit 42 researcher Ofir Shaty found that the default Google service identity attached to every Vertex AI agent had excessive permissions. Stolen P4SA credentials granted unrestricted read access to every Cloud Storage bucket in the project and reached restricted, Google-owned Artifact Registry repositories at the core of the Vertex AI Reasoning Engine. Shaty described the compromised P4SA as functioning like a "double agent," with access to both user data and Google's own infrastructure. VentureBeat defense grid Security requirement Defense shipped Exploit path The gap Sandbox AI agent execution Codex runs tasks in cloud containers; token scrubbed during agent runtime. Token present during cloning. Branch-name command injection executed before cleanup. No input sanitization on container setup parameters. Restrict file system access Claude Code sandboxes writes via accept-edits mode. Piped sed/echo escaped sandbox (CVE-2026-25723). Settings.json bypassed trust dialog (CVE-2026-33068). 50-subcommand chain dropped deny-rule enforcement. Command chaining not validated. Settings loaded before trust. Deny rules truncated for performance. Block prompt injection in code context Copilot filters PR descriptions for known injection patterns. Hidden injections in PRs, README files, and GitHub issues triggered RCE (CVE-2025-53773 + Orca RoguePilot). Static pattern matching loses to embedded prompts in legitimate review and Codespaces flows. Scope agent credentials to least privilege Vertex AI Agent Engine uses P4SA service agent with OAuth scopes. Default scopes reached Gmail, Calendar, Drive. P4SA credentials read every Cloud Storage bucket and Google’s Artifact Registry. OAuth scopes non-editable by default. Least privilege violated by design. Inventory and govern agent identities No major AI coding agent vendor ships agent identity discovery or lifecycle management. Not attempted. Enterprises do not inventory AI coding agents, their credentials, or their permission scopes. AI coding agents are invisible to IAM, CMDB, and asset inventory. Zero governance exists. Detect credential exfiltration from agent runtime Codex obscures tokens in web portal view. Claude Code logs subcommands. Tokens visible in cleartext inside containers. Unicode obfuscation hid exfil payloads. Subcommand chaining hid intent. No runtime monitoring of agent network calls. Log truncation hid the bypass. Audit AI-generated code for security flaws Anthropic launched Claude Code Security (Feb 2026). OpenAI launched Codex Security (March 2026). Both scan generated code. Neither scans the agent’s own execution environment or credential handling. Code-output security is not agent-runtime security. The agent itself is the attack surface. Every exploit targeted runtime credentials, not model output Every vendor shipped a defense. Every defense was bypassed. The Sonar 2026 State of Code Developer Survey found 25% of developers use AI agents regularly, and 64% have started using them. Veracode tested more than 100 LLMs and found 45% of generated code samples introduced OWASP Top 10 flaws, a separate failure that compounds the runtime credential gap. CrowdStrike CTO Elia Zaitsev framed the rule in an exclusive VentureBeat interview at RSAC 2026: collapse agent identities back to the human, because an agent acting on your behalf should never have more privileges than you do. Codex held a GitHub OAuth token scoped to every repository the developer authorized. Vertex AI’s P4SA read every Cloud Storage bucket in the project. Claude Code traded deny-rule enforcement for token budget. Kayne McGladrey, an IEEE Senior Member who advises enterprises on identity risk, made the same diagnosis in an exclusive interview with VentureBeat. "It uses far more permissions than it should have, more than a human would, because of the speed of scale and intent." Riemer drew the operational line in an exclusive VentureBeat interview. "It becomes, I don't know you until I validate you." The branch name talked to the shell before validation. The GitHub issue talked to Copilot before anyone read it. Security director action plan Inventory every AI coding agent (CIEM). Codex, Claude Code, Copilot, Cursor, Gemini Code Assist, Windsurf. List the credentials and OAuth scopes each received at setup. If your CMDB has no category for AI agent identities, create one. Audit OAuth scopes and patch levels. Upgrade Claude Code to 2.1.90 or later. Verify Copilot's August 2025 patch. Migrate Vertex AI to the bring-your-own-service-account model. Treat branch names, pull request descriptions, GitHub issues, and repo configuration as untrusted input. Monitor for Unicode obfuscation (U+3000), command chaining over 50 subcommands, and changes to .vscode/settings.json or .claude/settings.json that flip permission modes. Govern agent identities the way you govern human privileged identities (PAM/IGA). Credential rotation. Least-privilege scoping. Separation of duties between the agent that writes code and the agent that deploys it. CyberArk, Delinea, and any PAM platform that accepts non-human identities can onboard agent OAuth credentials today; Gravitee's 2026 survey found only 21.9% of teams have done it. Validate before you communicate. "As long as we trust and we check and we validate, I'm fine with letting AI maintain it," Riemer said. Before any AI coding agent authenticates to GitHub, Gmail, or an internal repository, verify the agent's identity, scope, and the human session it is bound to. Ask each vendor in writing before your next renewal. "Show me the identity lifecycle management controls for the AI agent running in my environment, including credential scope, rotation policy, and permission audit trail." If the vendor cannot answer, that is the audit finding. The governance gap in three sentences Most CISOs inventory every human identity and have zero inventory of the AI agents running with equivalent credentials. No IAM framework governs human privilege escalation and agent privilege escalation with the same rigor. Most scanners track every CVE but cannot alert when a branch name exfiltrates a GitHub token through a container that developers trust by default. Zaitsev's advice to RSAC 2026 attendees was blunt: you already know what to do. Agents just made the cost of not doing it catastrophic.
- 200,000 MCP servers expose a command execution flaw that Anthropic calls a featureAnthropic created the Model Context Protocol as the open standard for AI agent-to-tool communication. OpenAI adopted it in March 2025. Google DeepMind followed. Anthropic donated MCP to the Linux Foundation in December 2025. Downloads crossed 150 million. Then four researchers at OX Security found an architectural problem that affects all of them. MCP's STDIO transport, the default for connecting an AI agent to a local tool, executes any operating system command it receives. No sanitization. No execution boundary between configuration and command. A malicious command returns an error after the command has already run. The developer toolchain raises no flag. OX Security researchers Moshe Siman Tov Bustan, Mustafa Naamnih, Nir Zadok and Roni Bar scanned the ecosystem and found 7,000 servers on public IPs with STDIO transport active — and estimate 200,000 total vulnerable instances extrapolated from that ratio. They confirmed arbitrary command execution on six live production platforms with paying customers. The research produced more than 10 CVEs rated high or critical across LiteLLM, LangFlow, Flowise, Windsurf, Langchain-Chatchat, Bisheng, DocsGPT, GPT Researcher, Agent Zero, LettaAI and others. Kevin Curran, IEEE senior member and professor of cybersecurity at Ulster University, independently told Infosecurity Magazine the research exposed "a shocking gap in the security of foundational AI infrastructure." Anthropic confirmed the behavior is by design and declined to modify the protocol — characterizing STDIO's execution model as a secure default and input sanitization as the developer's responsibility. That characterization comes from OX; the only word Anthropic explicitly stated on the record is "expected." Anthropic has not issued a standalone public statement and did not respond to VentureBeat's request for comment. OX says expecting 200,000 developers to sanitize inputs correctly is the problem. Anthropic's strongest technical counter: sanitizing STDIO would either break the transport or move the payload one layer down. Both positions are technically coherent. The question is what to do while that debate plays out. Every major outlet covered the disclosure. None built the prescriptive product-by-product audit a security director needs to triage her own MCP deployments. This piece does. Five questions determine whether your MCP deployments are exposed, whether your patches hold, and what to do Monday morning. Am I exposed? If your teams deployed any MCP-connected AI agent using the default STDIO transport, yes. The insecurity is not a coding bug in any single product. It is a design default in Anthropic's MCP specification that propagated into every official language SDK: Python, TypeScript, Java, and Rust. Every downstream project that trusted the protocol inherited it. OX identified four exploitation families. Unauthenticated command injection through AI framework web interfaces, demonstrated against LangFlow and LiteLLM. Hardening bypasses in tools that implemented command allowlists, demonstrated against Flowise and Upsonic, where OX bypassed the allowlist through argument injection (npx -c). Zero-click prompt injection in AI coding IDEs, where malicious HTML modifies local MCP configuration files. Windsurf (CVE-2026-30615) was the only IDE where exploitation required zero user interaction, though Cursor, Claude Code, and Gemini-CLI are all vulnerable to the broader family. And malicious package distribution through MCP registries, where OX submitted a benign proof-of-concept to 11 registries, and nine accepted it without security review. Carter Rees, VP of AI and Machine Learning at Reputation and member of the Utah AI Commission, told VentureBeat the framing needs to change entirely. "MCP stdio is a privileged execution surface, not a connector. Enterprise teams should treat it like production shell access. Deny by default, allowlist, sandbox and stop assuming downstream input validation will hold at scale," Rees said. The IDE family deserves particular attention because it hits developer workstations, not servers. A developer who visits an attacker-controlled website can trigger a modification to their local MCP configuration file — and in Windsurf's case, the change executes immediately with no approval prompt. Cursor, Claude Code and Gemini-CLI require some form of user interaction, but if the UI presents a configuration change without surfacing the execution consequence, clicking 'approve' does not constitute informed consent. Did my vendor patch? Some did. Some partially. Some have not confirmed. The matrix below maps each affected product against the exploitation family, patch state, and the gap that remains. The critical column is "Protocol fix?" Every row says no. Product Exploit type Patched? Protocol fix? The gap Action LiteLLM Command injection via adapter UI YES NO LiteLLM is fixed. New STDIO configs outside LiteLLM inherit the same insecure default. Pin to v1.83.7-stable or later (CVE-2026-30623). Verify against GitHub advisory. Audit all other STDIO definitions. LangFlow RCE via public auto_login + STDIO Partial NO Auth token freely available via public endpoint. STDIO executes whatever follows. Block public auto_login. Sandbox all MCP services from the host OS. Flowise / Upsonic Allowlist bypass (npx -c argument injection) Hardened, bypass confirmed NO Allowlist gives false confidence. OX bypassed it. Trivial. Do not rely on command allowlists. Enforce process-level sandbox isolation. Windsurf (CVE-2026-30615) Zero-click prompt injection to local RCE REPORTED, unconfirmed NO Only an IDE with a true zero-interaction exploit. Hits developer workstations, not servers. Disable automatic MCP server registration. Review all active configs manually. Cursor / Claude Code / Gemini-CLI Prompt injection to local MCP config modification Cursor patched (CVE-2025-54136); others vary NO User interaction required, but config-change UI does not surface execution consequence. Approval does not equal informed consent. Audit MCP config files (~/.cursor/mcp.json, equivalent paths). Disable auto-registration. Review all pending config changes before approval. Langchain-Chatchat (CVE-2026-30617) RCE via MCP STDIO transport REPORTED, unconfirmed NO Downstream chatbot framework inherits the same STDIO default. Patch status unconfirmed. Inventory all Langchain-Chatchat deployments. Sandbox from host OS. Monitor vendor advisory for patch. MCP registries (9 of 11) Accepted malicious PoC without review N/A NO Registries lack submission security review. Install and risk a backdoor. Use registries with documented submission review. Audit installs against known-good hashes. Does the flaw survive the patch? Yes. Every product-level patch in the matrix addresses the specific entry point in that product. None of them changes the MCP protocol's STDIO behavior. A security director who patches LiteLLM today and configures a new MCP STDIO server tomorrow will inherit the same insecure default on the new server. The patches are necessary. They are not sufficient. This was predictable. When VentureBeat first reported on MCP's security flaws in January, Merritt Baer, chief security officer at Enkrypt AI and former deputy CISO at AWS, warned: "MCP is shipping with the same mistake we've seen in every major protocol rollout: insecure defaults. If we don't build authentication and least privilege in from day one, we'll be cleaning up breaches for the next decade." The Cloud Security Alliance independently confirmed OX's findings in a separate research note and recommended organizations treat MCP-connected infrastructure as an active, unpatched threat. The defaults did not change. The attack surface grew. Rees argued that Anthropic's position, while internally consistent, does not survive contact with enterprise reality. "It stops being a developer mistake and starts being a distributed failure mode when the same class of failure reproduces across that many independent implementations," he told VentureBeat. "Guidance is not an architectural control. Relying on thousands of downstream implementers to consistently interpret a trust boundary is a known anti-pattern in enterprise security." Anthropic updated its SECURITY.md file nine days after OX's initial contact in January 2026 to note that STDIO adapters should be used with caution, but made no architectural changes. The researchers' assessment of that update: "This change didn't fix anything." Rees took a more measured view. "It's worth giving Anthropic credit where it's due," he told VentureBeat. "After the disclosure, they updated their security guidance to recommend caution with stdio adapters. That's a meaningful step even if researchers argue it falls short of a protocol-level fix." What changed at the protocol level? Nothing architectural. Anthropic has not implemented manifest-only execution, a command allowlist in the official SDKs, or any other protocol-level mitigation. OX recommended all three. The SECURITY.md guidance update was the only change. OX's research began in November 2025 and included more than 30 responsible disclosure processes across the ecosystem before the April 15 publication. The disagreement is substantive. Anthropic's architectural argument deserves its full weight. STDIO is a local subprocess transport designed to launch processes on the machine that configured it. The trust boundary, in Anthropic's model, sits with whoever controls the configuration file. If you can write to the MCP config, you are by definition someone authorized to execute commands on that machine. Under that logic, what looks like command injection is a feature working as intended. Restricting what STDIO can launch at the protocol level would either break the transport's core function, since its purpose is to launch arbitrary local processes, or displace the attack surface into the launched process itself. The unopinionated-standard argument is also defensible: a universal protocol that hard-codes execution constraints stops being universal. OX's counter, from their advisory: "Shifting responsibility to implementers does not transfer the risk. It just obscures who created it." Do not wait for a protocol-level fix. Treat every MCP STDIO configuration as an untrusted input surface, regardless of which product it sits inside. Monday morning remediation sequence Enumerate. Identify every MCP server deployment across dev, staging, and production. Search for MCP configuration files (mcp.json, mcp_config.json) in developer home directories and IDE config paths (~/.cursor/, ~/.codeium/windsurf/, ~/.config/claude-code/). List running processes that match MCP server binaries. Flag any using STDIO transport with public IP accessibility. OX found 7,000 on public IPs. Your environment may have instances you do not know about. Patch. Pin every affected product to its patched release. LiteLLM v1.83.7-stable includes the fix for CVE-2026-30623. DocsGPT, Flowise, and Bisheng have also shipped fixes. Windsurf and Langchain-Chatchat remain in reported state as of May 1, 2026. Cursor was patched against an earlier related disclosure (CVE-2025-54136) but inherits the same protocol default. Check each vendor's advisory in the morning you execute this step. Sandbox. Isolate every MCP-enabled service from the host operating system. Never give a server full disk access or shell execution privileges. The Flowise/Upsonic allowlist bypass proves that restricting commands alone is not enough. Audit registries. Review every MCP server installed from a third-party registry. Nine of 11 registries accepted OX's proof-of-concept without a security review. Use registries with documented submission review processes. Remove any MCP server whose origin you cannot verify. Treat STDIO config as untrusted. This step survives every future patch and every future product. The protocol-level default has not changed. Every STDIO server definition is a command execution surface. Treat it the same way you treat user input to a database query: assume it is hostile until validated. Your exposure cannot wait for a protocol fix Anthropic and OX Security disagree on where the responsibility for securing MCP's STDIO transport belongs. That disagreement will not be resolved this week. What can be resolved this week is whether your MCP deployments are enumerated, patched, sandboxed, and treated as the untrusted execution surfaces they are. As Rees put it: "The core question here is architectural policy, not exploit payloads." Baer warned in January that insecure defaults would produce exactly this outcome. OX documented 200,000 servers running with a configuration field that doubles as an execution surface. The protocol's designer says it is working as intended. Your Monday morning question is not who is right. It is which of your servers are exposed.
- Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted itA security researcher, working with colleagues at Johns Hopkins University, opened a GitHub pull request, typed a malicious instruction into the PR title, and watched Anthropic’s Claude Code Security Review action post its own API key as a comment. The same prompt injection worked on Google’s Gemini CLI Action and GitHub’s Copilot Agent (Microsoft). No external infrastructure required. Aonan Guan, the researcher who discovered the vulnerability, alongside Johns Hopkins colleagues Zhengyu Liu and Gavin Zhong, published the full technical disclosure last week, calling it “Comment and Control.” GitHub Actions does not expose secrets to fork pull requests by default when using the pull_request trigger, but workflows using pull_request_target, which most AI agent integrations require for secret access, do inject secrets into the runner environment. This limits the practical attack surface but does not eliminate it: collaborators, comment fields, and any repo using pull_request_target with an AI coding agent are exposed. Per Guan’s disclosure timeline: Anthropic classified it as CVSS 9.4 Critical ($100 bounty), Google paid a $1,337 bounty, and GitHub awarded $500 through the Copilot Bounty Program. The $100 amount is notably low relative to the CVSS 9.4 rating; Anthropic’s HackerOne program scopes agent-tooling findings separately from model-safety vulnerabilities. All three patched quietly, and none had issued CVEs in the NVD or published security advisories through GitHub Security Advisories as of Saturday. Comment and Control exploited a prompt injection vulnerability in Claude Code Security Review, a specific GitHub Action feature that Anthropic’s own system card acknowledged is “not hardened against prompt injection.” The feature is designed to process trusted first-party inputs by default; users who opt into processing untrusted external PRs and issues accept additional risk and are responsible for restricting agent permissions. Anthropic updated its documentation to clarify this operating model after the disclosure. The same class of attack operates beneath OpenAI’s safeguard layer at the agent runtime, based on what their system card does not document — not a demonstrated exploit. The exploit is the proof case, but the story is what the three system cards reveal about the gap between what vendors document and what they protect. OpenAI and Google did not respond for comment by publication time. “At the action boundary, not the model boundary,” Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, told VentureBeat when asked where protection actually needs to sit. “The runtime is the blast radius.” What the system cards tell you Anthropic’s Opus 4.7 system card runs 232 pages with quantified hack rates and injection resistance metrics. It discloses a restricted model strategy (Mythos held back as a capability preview) and states directly that Claude Code Security Review is “not hardened against prompt injection.” The system card explains to readers that the runtime was exposed. Comment and Control proved it. Anthropic does gate certain agent actions outside the system card’s scope — Claude Code Auto Mode, for example, applies runtime-level protections — but the system card itself does not document these runtime safeguards or their coverage. OpenAI’s GPT-5.4 system card documents extensive red teaming and publishes model-layer injection evals but not agent-runtime or tool-execution resistance metrics. Trusted Access for Cyber scales access to thousands. The system card tells you what red teamers tested. It does not tell you how resistant the model is to the attacks they found. Google’s Gemini 3.1 Pro model card, shipped in February, defers most safety methodology to older documentation, a VentureBeat review of the card found. Google’s Automated Red Teaming program remains internal only. No external cyber program. Dimension Anthropic (Opus 4.7) OpenAI (GPT-5.4) Google (Gemini 3.1 Pro) System card depth 232 pages. Quantified hack rates, classifier scores, and injection resistance metrics. Extensive. Red teaming hours documented. No injection resistance rates published. Few pages. Defers to older Gemini 3 Pro card. No quantified results. Cyber verification program CVP. Removes cyber safeguards for vetted pentesters and red teamers doing authorized offensive work. Does not address prompt injection defense. Platform and data-retention exclusions not yet publicly documented. TAC. Scaled to thousands. Constrains ZDR. None. No external defender pathway. Restricted model strategy Yes. Mythos held back as a capability preview. Opus 4.7 is the testbed. No restricted model. Full capability released, access gated. No restricted model. No stated plan for one. Runtime agent safeguards Claude Code Security Review: system card states it is not hardened against prompt injection. The feature is designed for trusted first-party inputs. Anthropic applies additional runtime protections (e.g., Claude Code Auto Mode) not documented in the system card. Not documented. TAC governs access, not agent operations. Not documented. ART internal only. Exploit response (Comment and Control) CVSS 9.4 Critical. $100 bounty. Patched. No CVE. Not directly exploited. Structural gap inferred from TAC design, not demonstrated. $1,337 bounty per Guan disclosure. Patched. No CVE. Injection resistance data Published. Quantified rates in the system card. Model-layer injection evals published. No agent-runtime or tool-execution resistance rates. Not published. No quantified data available. Baer offered specific procurement questions. “For Anthropic, ask how safety results actually transfer across capability jumps,” she told VentureBeat. “For OpenAI, ask what ‘trusted’ means under compromise.” For both, she said, directors need to “demand clarity on whether safeguards extend into tool execution, not just prompt filtering.” Seven threat classes neither safeguard approach closes Each row names what breaks, why your controls miss it, what Comment and Control proved, and the recommended action for the week ahead. Threat Class What Breaks Why Your Controls Miss It What Comment and Control Proved Recommended Action 1. Deployment surface mismatch CVP is designed for authorized offensive security research, not prompt injection defense. It does not extend to Bedrock, Vertex, or ZDR tenants. TAC constrains ZDR. Google has no program. Your team may be running a verified model on an unverified surface. Launch announcements describe the program. Support documentation lists the exclusions. Security teams read the announcement. Procurement reads neither. The exploit targets the agent runtime, not the deployment platform. A team running Claude Code on Bedrock is outside CVP coverage, but CVP was not designed to address this class of vulnerability in the first place. Email your Anthropic and OpenAI reps today. One question, in writing: ‘Confirm whether [your platform] and [your data retention config] are covered by your runtime-level prompt injection protections, and describe what those protections include.’ File the response in your vendor risk register. 2. CI secrets exposed to AI agents ANTHROPIC_API_KEY, GEMINI_API_KEY, GITHUB_TOKEN, and any production secret stored as a GitHub Actions env var are readable by every workflow step, including AI coding agents. The default GitHub Actions config does not scope secrets to individual steps. Repo-level and org-level secrets propagate to all workflows. Most teams never audit which steps access which secrets. The agent read the API key from the runner env var, encoded it in a PR comment body, and posted it through GitHub’s API. No attacker-controlled infrastructure required. Exfiltration ran through GitHub’s own API — the platform itself became the C2 channel. Run: grep -r ‘secrets\.’ .github/workflows/ across every repo with an AI agent. List every secret the agent can access. Rotate all exposed credentials. Migrate to short-lived OIDC tokens (GitHub, GitLab, CircleCI). 3. Over-permissioned agent runtimes AI agents granted bash execution, git push, and API write access at setup. Permissions never scoped down. No periodic least-privilege review. Agents accumulate access in the same way service accounts do. Agents are configured once during onboarding and inherited across repos. No tooling flags unused permissions. The Comment and Control agent had bash, write, and env-read access for a code review task. The agent had bash access it did not need for code review. It used that access to read env vars and post exfiltrated data. Stripping bash would have blocked the attack chain entirely. Audit agent permissions repo by repo. Strip bash from code review agents. Set repo access to read-only. Gate write access (PR comments, commits, merges) behind a human approval step. 4. No CVE signal for AI agent vulnerabilities CVSS 9.4 Critical. Anthropic, Google, and GitHub patched. Zero CVE entries in NVD. Zero advisories. Your vulnerability scanner, SIEM, and GRC tool all show green. No CNA has yet issued a CVE for a coding agent prompt injection, and current CVE practices have not captured this class of failure mode. Vendors patch through version bumps. Qualys, Tenable, and Rapid7 have nothing to scan for. A SOC analyst running a full scan on Monday morning would find zero entries for a Critical vulnerability that hit Claude Code Security Review, Gemini CLI Action, and Copilot simultaneously. Create a new category in your supply chain risk register: ‘AI agent runtime.’ Assign a 48-hour check-in cadence with each vendor’s security contact. Do not wait for CVEs. None have come yet, and the taxonomy gap makes them unlikely without industry pressure. 5. Model safeguards do not govern agent actions Opus 4.7 blocks a phishing email prompt. It does not block an agent from reading $ANTHROPIC_API_KEY and posting it as a PR comment. Safeguards gate generation, not operation. Safeguards filter model outputs (text). Agent operations (bash, git push, curl, API POST) bypass safeguard evaluation entirely. The runtime is outside the safeguard perimeter. Anthropic applies some runtime-level protections in features like Claude Code Auto Mode, but these are not documented in the system card and their scope is not publicly defined. The agent never generated prohibited content. It performed a legitimate operation (post a PR comment) containing exfiltrated data. Safeguards never triggered. Map every operation your AI agents perform: bash, git, API calls, file writes. For each, ask the vendor in writing: does your safeguard layer evaluate this action before execution? Document the answer. 6. Untrusted input parsed as instructions PR titles, PR body text, issue comments, code review comments, and commit messages are all parsed by AI coding agents as context. Any can contain injected instructions. No input sanitization layer between GitHub and the agent instruction set. The agent cannot distinguish developer intent from attacker injection in untrusted fields. Claude Code GitHub Action is designed for trusted first-party inputs by default. Users who opt into processing untrusted external PRs accept additional risk. A single malicious PR title became a complete exfiltration command. The agent treated it as a legitimate instruction and executed it without validation or confirmation. Implement input sanitization as defense-in-depth, but do not rely on traditional WAF-style regex patterns. LLM prompt injections are non-deterministic and will evade static pattern matching. Restrict agent context to approved workflow configs and combine with least-privilege permissions. 7. No comparable injection resistance data across vendors Anthropic publishes quantified injection resistance rates in 232 pages. OpenAI publishes model-layer injection evals but no agent-runtime resistance rates. Google publishes a few-page card referencing an older model. No industry standard for AI safety metric disclosure. Vendors may have internal metrics and red-team programs, but published disclosures are not comparable. Procurement has no baseline and no framework to require one. Anthropic, OpenAI, and Google were all approved for enterprise use without comparable injection resistance data. The exploit exposed what unmeasured risk looks like in production. Write one sentence for your next vendor meeting: ‘Show me your quantified injection resistance rate for my model version on my platform.’ Document refusals for EU AI Act high-risk compliance. Deadline: August 2026. OpenAI’s GPT-5.4 was not directly exploited in the Comment and Control disclosure. The gaps identified in the OpenAI and Google columns are inferred from what their system cards and program documentation do not publish, not from demonstrated exploits. That distinction matters. Absence of published runtime metrics is a transparency gap, not proof of a vulnerability. It does mean procurement teams cannot verify what they cannot measure. Eligibility requirements for Anthropic’s Cyber Verification Program and OpenAI’s Trusted Access for Cyber are still evolving, as are platform coverage and program scope, so security teams should validate current vendor docs before treating any coverage described here as definitive. Anthropic’s CVP is designed for authorized offensive security research — removing cyber safeguards for vetted actors — and is not a prompt injection defense program. Security leaders mapping these gaps to existing frameworks can align threat classes 1–3 with NIST CSF 2.0 GV.SC (Supply Chain Risk Management), threat class 4 with ID.RA (Risk Assessment), and threat classes 5–7 with PR.DS (Data Security). Comment and Control focuses on GitHub Actions today, but the seven threat classes generalize to most CI/CD runtimes where AI agents execute with access to secrets, including GitHub Actions, GitLab CI, CircleCI, and custom runners. Safety metric disclosure formats are in flux across all three vendors; Anthropic currently leads on published quantification in its system card documentation, but norms are likely to converge as EU AI Act obligations come into force. Comment and Control targeted Claude Code GitHub Action, a specific product feature, not Anthropic’s models broadly. The vulnerability class, however, applies to any AI coding agent operating in a CI/CD runtime with access to secrets. What to do before your next vendor renewal “Don’t standardize on a model. Standardize on a control architecture,” Baer told VentureBeat. “The risk is systemic to agent design, not vendor-specific. Maintain portability so you can swap models without reworking your security posture.” Build a deployment map. Confirm your platform qualifies for the runtime protections you think cover you. If you run Opus 4.7 on Bedrock, ask your Anthropic account rep what runtime-level prompt injection protections apply to your deployment surface. Email your account rep today. (Anthropic Cyber Verification Program) Audit every runner for secret exposure. Run grep -r ‘secrets\.’ .github/workflows/ across every repo with an AI coding agent. List every secret the agent can access. Rotate all exposed credentials. (GitHub Actions secrets documentation) Start migrating credentials now. Switch stored secrets to short-lived OIDC token issuance. GitHub Actions, GitLab CI, and CircleCI all support OIDC federation. Set token lifetimes to minutes, not hours. Plan full rollout over one to two quarters, starting with repos running AI agents. (GitHub OIDC docs | GitLab OIDC docs | CircleCI OIDC docs) Fix agent permissions repo by repo. Strip bash execution from every AI agent doing code review. Set repository access to read-only. Gate write access behind a human approval step. (GitHub Actions permissions documentation) Add input sanitization as one layer, not the only layer. Filter pull request titles, comments, and review threads for instruction patterns before they reach agents. Combine with least-privilege permissions and OIDC. Static regex will not catch non-deterministic prompt injections on its own. Add “AI agent runtime” to your supply chain risk register. Assign a 48-hour patch verification cadence with each vendor’s security contact. Do not wait for CVEs. None have come yet for this class of vulnerability. Check which hardened GitHub Actions mitigations you already have in place. Hardened GitHub Actions configurations block this attack class today: the permissions key restricts GITHUB_TOKEN scope, environment protection rules require approval before secrets are injected, and first-time-contributor gates prevent external pull requests from triggering agent workflows. (GitHub Actions security hardening guide) Prepare one procurement question per vendor before your next renewal. Write one sentence: “Show me your quantified injection resistance rate for the model version I run on the platform I deploy to.” Document refusals for EU AI Act high-risk compliance. The deadline is August 2026. “Raw zero-days aren’t how most systems get compromised. Composability is,” Baer said. “It’s the glue code, the tokens in CI, the over-permissioned agents. When you wire a powerful model into a permissive runtime, you’ve already done most of the attacker’s work for them.”
- Microsoft patched a Copilot Studio prompt injection. The data exfiltrated anyway.Microsoft assigned CVE-2026-21520, a CVSS 7.5 indirect prompt injection vulnerability, to Copilot Studio. Capsule Security discovered the flaw, coordinated disclosure with Microsoft, and the patch was deployed on January 15. Public disclosure went live on Wednesday. That CVE matters less for what it fixes and more for what it signals. Capsule’s research calls Microsoft’s decision to assign a CVE to a prompt injection vulnerability in an agentic platform “highly unusual.” Microsoft previously assigned CVE-2025-32711 (CVSS 9.3) to EchoLeak, a prompt injection in M365 Copilot patched in June 2025, but that targeted a productivity assistant, not an agent-building platform. If the precedent extends to agentic systems broadly, every enterprise running agents inherits a new vulnerability class to track. Except that this class cannot be fully eliminated by patches alone. Capsule also discovered what they call PipeLeak, a parallel indirect prompt injection vulnerability in Salesforce Agentforce. Microsoft patched and assigned a CVE. Salesforce has not assigned a CVE or issued a public advisory for PipeLeak as of publication, according to Capsule's research. What ShareLeak actually does The vulnerability that the researchers named ShareLeak exploits the gap between a SharePoint form submission and the Copilot Studio agent’s context window. An attacker fills a public-facing comment field with a crafted payload that injects a fake system role message. In Capsule’s testing, Copilot Studio concatenated the malicious input directly with the agent’s system instructions with no input sanitization between the form and the model. The injected payload overrode the agent’s original instructions in Capsule’s proof-of-concept, directing it to query connected SharePoint Lists for customer data and send that data via Outlook to an attacker-controlled email address. NVD classifies the attack as low complexity and requires no privileges. Microsoft’s own safety mechanisms flagged the request as suspicious during Capsule’s testing. The data was exfiltrated anyway. The DLP never fired because the email was routed through a legitimate Outlook action that the system treated as an authorized operation. Carter Rees, VP of Artificial Intelligence at Reputation, described the architectural failure in an exclusive VentureBeat interview. The LLM cannot inherently distinguish between trusted instructions and untrusted retrieved data, Rees said. It becomes a confused deputy acting on behalf of the attacker. OWASP classifies this pattern as ASI01: Agent Goal Hijack. The research team behind both discoveries, Capsule Security, found the Copilot Studio vulnerability on November 24, 2025. Microsoft confirmed it on December 5 and patched it on January 15, 2026. Every security director running Copilot Studio agents triggered by SharePoint forms should audit that window for indicators of compromise. PipeLeak and the Salesforce split PipeLeak hits the same vulnerability class through a different front door. In Capsule’s testing, a public lead form payload hijacked an Agentforce agent with no authentication required. Capsule found no volume cap on the exfiltrated CRM data, and the employee who triggered the agent received no indication that data had left the building. Salesforce has not assigned a CVE or issued a public advisory specific to PipeLeak as of publication. Capsule is not the first research team to hit Agentforce with indirect prompt injection. Noma Labs disclosed ForcedLeak (CVSS 9.4) in September 2025, and Salesforce patched that vector by enforcing Trusted URL allowlists. According to Capsule's research, PipeLeak survives that patch through a different channel: email via the agent's authorized tool actions. Naor Paz, CEO of Capsule Security, told VentureBeat the testing hit no exfiltration limit. “We did not get to any limitation,” Paz said. “The agent would just continue to leak all the CRM.” Salesforce recommended human-in-the-loop as a mitigation. Paz pushed back. “If the human should approve every single operation, it’s not really an agent,” he told VentureBeat. “It’s just a human clicking through the agent’s actions.” Microsoft patched ShareLeak and assigned a CVE. According to Capsule's research, Salesforce patched ForcedLeak's URL path but not the email channel. Kayne McGladrey, IEEE Senior Member, put it differently in a separate VentureBeat interview. Organizations are cloning human user accounts to agentic systems, McGladrey said, except agents use far more permissions than humans would because of the speed, the scale, and the intent. The lethal trifecta and why posture management fails Paz named the structural condition that makes any agent exploitable: access to private data, exposure to untrusted content, and the ability to communicate externally. ShareLeak hits all three. PipeLeak hits all three. Most production agents hit all three because that combination is what makes agents useful. Rees validated the diagnosis independently. Defense-in-depth predicated on deterministic rules is fundamentally insufficient for agentic systems, Rees told VentureBeat. Elia Zaitsev, CrowdStrike’s CTO, called the patching mindset itself the vulnerability in a separate VentureBeat exclusive. “People are forgetting about runtime security,” he said. “Let’s patch all the vulnerabilities. Impossible. Somehow always seem to miss something.” Observing actual kinetic actions is a structured, solvable problem, Zaitsev told VentureBeat. Intent is not. CrowdStrike’s Falcon sensor walks the process tree and tracks what agents did, not what they appeared to intend. Multi-turn crescendo and the coding agent blind spot Single-shot prompt injections are the entry-level threat. Capsule’s research documented multi-turn crescendo attacks where adversaries distribute payloads across multiple benign-looking turns. Each turn passes inspection. The attack becomes visible only when analyzed as a sequence. Rees explained why current monitoring misses this. A stateless WAF views each turn in a vacuum and detects no threat, Rees told VentureBeat. It sees requests, not a semantic trajectory. Capsule also found undisclosed vulnerabilities in coding agent platforms it declined to name, including memory poisoning that persists across sessions and malicious code execution through MCP servers. In one case, a file-level guardrail designed to restrict which files the agent could access was reasoned around by the agent itself, which found an alternate path to the same data. Rees identified the human vector: employees paste proprietary code into public LLMs and view security as friction. McGladrey cut to the governance failure. “If crime was a technology problem, we would have solved crime a fairly long time ago,” he told VentureBeat. “Cybersecurity risk as a standalone category is a complete fiction.” The runtime enforcement model Capsule hooks into vendor-provided agentic execution paths — including Copilot Studio's security hooks and Claude Code's pre-tool-use checkpoints — with no proxies, gateways, or SDKs. The company exited stealth on Wednesday, timing its $7 million seed round, led by Lama Partners alongside Forgepoint Capital International, to its coordinated disclosure. Chris Krebs, the first Director of CISA and a Capsule advisor, put the gap in operational terms. “Legacy tools weren’t built to monitor what happens between prompt and action,” Krebs said. “That’s the runtime gap.” Capsule's architecture deploys fine-tuned small language models that evaluate every tool call before execution, an approach Gartner's market guide calls a "guardian agent." Not everyone agrees that intent analysis is the right layer. Zaitsev told VentureBeat during an exclusive interview that intent-based detection is non-deterministic. “Intent analysis will sometimes work. Intent analysis cannot always work,” he said. CrowdStrike bets on observing what the agent actually did rather than what it appeared to intend. Microsoft’s own Copilot Studio documentation provides external security-provider webhooks that can approve or block tool execution, offering a vendor-native control plane alongside third-party options. No single layer closes the gap. Runtime intent analysis, kinetic action monitoring, and foundational controls (least privilege, input sanitization, outbound restrictions, targeted human-in-the-loop) all belong in the stack. SOC teams should map telemetry now: Copilot Studio activity logs plus webhook decisions, CRM audit logs for Agentforce, and EDR process-tree data for coding agents. Paz described the broader shift. “Intent is the new perimeter,” he told VentureBeat. “The agent in runtime can decide to go rogue on you.” VentureBeat Prescriptive Matrix The following matrix maps five vulnerability classes against the controls that miss them, and the specific actions security directors should take this week. Vulnerability Class Why Current Controls Miss It What Runtime Enforcement Does Suggested actions for security leaders ShareLeak — Copilot Studio, CVE-2026-21520, CVSS 7.5, patched Jan 15 2026 Capsule’s testing found no input sanitization between the SharePoint form and the agent context. Safety mechanisms flagged, but data still exfiltrated. DLP did not fire because the email used a legitimate Outlook action. OWASP ASI01: Agent Goal Hijack. Guardian agent hooks into Copilot Studio pre-tool-use security hooks. Vets every tool call before execution. Blocks exfiltration at the action layer. Audit every Copilot Studio agent triggered by SharePoint forms. Restrict outbound email to org-only domains. Inventory all SharePoint Lists accessible to agents. Review the Nov 24–Jan 15 window for indicators of compromise. PipeLeak — Agentforce, no CVE assigned In Capsule’s testing, public form input flowed directly into the agent context. No auth required. No volume cap observed on exfiltrated CRM data. The employee received no indication that data was leaving. Runtime interception via platform agentic hooks. Pre-invocation checkpoint on every tool call. Detects outbound data transfer to non-approved destinations. Review all Agentforce automations triggered by public-facing forms. Enable human-in-the-loop for external comms as interim control. Audit CRM data access scope per agent. Pressure Salesforce for CVE assignment. Multi-Turn Crescendo — distributed payload, each turn looks benign Stateless monitoring inspects each turn in isolation. WAFs, DLP, and activity logs see individual requests, not semantic trajectory. Stateful runtime analysis tracks full conversation history across turns. Fine-tuned SLMs evaluate aggregated context. Detects when a cumulative sequence constitutes a policy violation. Require stateful monitoring for all production agents. Add crescendo attack scenarios to red team exercises. Coding Agents — unnamed platforms, memory poisoning + code execution MCP servers inject code and instructions into the agent context. Memory poisoning persists across sessions. Guardrails reasoned around by the agent itself. Shadow AI insiders paste proprietary code into public LLMs. Pre-invocation checkpoint on every tool call. Fine-tuned SLMs detect anomalous tool usage at runtime. Inventory all coding agent deployments across engineering. Audit MCP server configs. Restrict code execution permissions. Monitor for shadow installations. Structural Gap — any agent with private data + untrusted input + external comms Posture management tells you what should happen. It does not stop what does happen. Agents use far more permissions than humans at far greater speed. Runtime guardian agent watches every action in real time. Intent-based enforcement replaces signature detection. Leverages vendor agentic hooks, not proxies or gateways. Classify every agent by lethal trifecta exposure. Treat prompt injection as class-based SaaS risk. Require runtime security for any agent moving to production. Brief the board on agent risk as business risk. What this means for 2026 security planning Microsoft’s CVE assignment will either accelerate or fragment how the industry handles agent vulnerabilities. If vendors call them configuration issues, CISOs carry the risk alone. Treat prompt injection as a class-level SaaS risk rather than individual CVEs. Classify every agent deployment against the lethal trifecta. Require runtime enforcement for anything moving to production. Brief the board on agent risk the way McGladrey framed it: as business risk, because cybersecurity risk as a standalone category stopped being useful the moment agents started operating at machine speed.