Microsoft patched a Copilot Studio prompt injection. The data exfiltrated anyway.
Our take

Microsoft assigned CVE-2026-21520, a CVSS 7.5 indirect prompt injection vulnerability, to Copilot Studio. Capsule Security discovered the flaw, coordinated disclosure with Microsoft, and the patch was deployed on January 15. Public disclosure went live on Wednesday.
That CVE matters less for what it fixes and more for what it signals. Capsule’s research calls Microsoft’s decision to assign a CVE to a prompt injection vulnerability in an agentic platform “highly unusual.” Microsoft previously assigned CVE-2025-32711 (CVSS 9.3) to EchoLeak, a prompt injection in M365 Copilot patched in June 2025, but that targeted a productivity assistant, not an agent-building platform. If the precedent extends to agentic systems broadly, every enterprise running agents inherits a new vulnerability class to track. Except that this class cannot be fully eliminated by patches alone.
Capsule also discovered what they call PipeLeak, a parallel indirect prompt injection vulnerability in Salesforce Agentforce. Microsoft patched and assigned a CVE. Salesforce has not assigned a CVE or issued a public advisory for PipeLeak as of publication, according to Capsule's research.
What ShareLeak actually does
The vulnerability that the researchers named ShareLeak exploits the gap between a SharePoint form submission and the Copilot Studio agent’s context window. An attacker fills a public-facing comment field with a crafted payload that injects a fake system role message. In Capsule’s testing, Copilot Studio concatenated the malicious input directly with the agent’s system instructions with no input sanitization between the form and the model.
The injected payload overrode the agent’s original instructions in Capsule’s proof-of-concept, directing it to query connected SharePoint Lists for customer data and send that data via Outlook to an attacker-controlled email address. NVD classifies the attack as low complexity and requires no privileges.
Microsoft’s own safety mechanisms flagged the request as suspicious during Capsule’s testing. The data was exfiltrated anyway. The DLP never fired because the email was routed through a legitimate Outlook action that the system treated as an authorized operation.
Carter Rees, VP of Artificial Intelligence at Reputation, described the architectural failure in an exclusive VentureBeat interview. The LLM cannot inherently distinguish between trusted instructions and untrusted retrieved data, Rees said. It becomes a confused deputy acting on behalf of the attacker. OWASP classifies this pattern as ASI01: Agent Goal Hijack.
The research team behind both discoveries, Capsule Security, found the Copilot Studio vulnerability on November 24, 2025. Microsoft confirmed it on December 5 and patched it on January 15, 2026. Every security director running Copilot Studio agents triggered by SharePoint forms should audit that window for indicators of compromise.
PipeLeak and the Salesforce split
PipeLeak hits the same vulnerability class through a different front door. In Capsule’s testing, a public lead form payload hijacked an Agentforce agent with no authentication required. Capsule found no volume cap on the exfiltrated CRM data, and the employee who triggered the agent received no indication that data had left the building. Salesforce has not assigned a CVE or issued a public advisory specific to PipeLeak as of publication.
Capsule is not the first research team to hit Agentforce with indirect prompt injection. Noma Labs disclosed ForcedLeak (CVSS 9.4) in September 2025, and Salesforce patched that vector by enforcing Trusted URL allowlists. According to Capsule's research, PipeLeak survives that patch through a different channel: email via the agent's authorized tool actions.
Naor Paz, CEO of Capsule Security, told VentureBeat the testing hit no exfiltration limit. “We did not get to any limitation,” Paz said. “The agent would just continue to leak all the CRM.”
Salesforce recommended human-in-the-loop as a mitigation. Paz pushed back. “If the human should approve every single operation, it’s not really an agent,” he told VentureBeat. “It’s just a human clicking through the agent’s actions.”
Microsoft patched ShareLeak and assigned a CVE. According to Capsule's research, Salesforce patched ForcedLeak's URL path but not the email channel.
Kayne McGladrey, IEEE Senior Member, put it differently in a separate VentureBeat interview. Organizations are cloning human user accounts to agentic systems, McGladrey said, except agents use far more permissions than humans would because of the speed, the scale, and the intent.
The lethal trifecta and why posture management fails
Paz named the structural condition that makes any agent exploitable: access to private data, exposure to untrusted content, and the ability to communicate externally. ShareLeak hits all three. PipeLeak hits all three. Most production agents hit all three because that combination is what makes agents useful.
Rees validated the diagnosis independently. Defense-in-depth predicated on deterministic rules is fundamentally insufficient for agentic systems, Rees told VentureBeat.
Elia Zaitsev, CrowdStrike’s CTO, called the patching mindset itself the vulnerability in a separate VentureBeat exclusive. “People are forgetting about runtime security,” he said. “Let’s patch all the vulnerabilities. Impossible. Somehow always seem to miss something.” Observing actual kinetic actions is a structured, solvable problem, Zaitsev told VentureBeat. Intent is not. CrowdStrike’s Falcon sensor walks the process tree and tracks what agents did, not what they appeared to intend.
Multi-turn crescendo and the coding agent blind spot
Single-shot prompt injections are the entry-level threat. Capsule’s research documented multi-turn crescendo attacks where adversaries distribute payloads across multiple benign-looking turns. Each turn passes inspection. The attack becomes visible only when analyzed as a sequence.
Rees explained why current monitoring misses this. A stateless WAF views each turn in a vacuum and detects no threat, Rees told VentureBeat. It sees requests, not a semantic trajectory.
Capsule also found undisclosed vulnerabilities in coding agent platforms it declined to name, including memory poisoning that persists across sessions and malicious code execution through MCP servers. In one case, a file-level guardrail designed to restrict which files the agent could access was reasoned around by the agent itself, which found an alternate path to the same data. Rees identified the human vector: employees paste proprietary code into public LLMs and view security as friction.
McGladrey cut to the governance failure. “If crime was a technology problem, we would have solved crime a fairly long time ago,” he told VentureBeat. “Cybersecurity risk as a standalone category is a complete fiction.”
The runtime enforcement model
Capsule hooks into vendor-provided agentic execution paths — including Copilot Studio's security hooks and Claude Code's pre-tool-use checkpoints — with no proxies, gateways, or SDKs. The company exited stealth on Wednesday, timing its $7 million seed round, led by Lama Partners alongside Forgepoint Capital International, to its coordinated disclosure.
Chris Krebs, the first Director of CISA and a Capsule advisor, put the gap in operational terms. “Legacy tools weren’t built to monitor what happens between prompt and action,” Krebs said. “That’s the runtime gap.”
Capsule's architecture deploys fine-tuned small language models that evaluate every tool call before execution, an approach Gartner's market guide calls a "guardian agent."
Not everyone agrees that intent analysis is the right layer. Zaitsev told VentureBeat during an exclusive interview that intent-based detection is non-deterministic. “Intent analysis will sometimes work. Intent analysis cannot always work,” he said. CrowdStrike bets on observing what the agent actually did rather than what it appeared to intend. Microsoft’s own Copilot Studio documentation provides external security-provider webhooks that can approve or block tool execution, offering a vendor-native control plane alongside third-party options. No single layer closes the gap. Runtime intent analysis, kinetic action monitoring, and foundational controls (least privilege, input sanitization, outbound restrictions, targeted human-in-the-loop) all belong in the stack. SOC teams should map telemetry now: Copilot Studio activity logs plus webhook decisions, CRM audit logs for Agentforce, and EDR process-tree data for coding agents.
Paz described the broader shift. “Intent is the new perimeter,” he told VentureBeat. “The agent in runtime can decide to go rogue on you.”
VentureBeat Prescriptive Matrix
The following matrix maps five vulnerability classes against the controls that miss them, and the specific actions security directors should take this week.
Vulnerability Class | Why Current Controls Miss It | What Runtime Enforcement Does | Suggested actions for security leaders |
ShareLeak — Copilot Studio, CVE-2026-21520, CVSS 7.5, patched Jan 15 2026 | Capsule’s testing found no input sanitization between the SharePoint form and the agent context. Safety mechanisms flagged, but data still exfiltrated. DLP did not fire because the email used a legitimate Outlook action. OWASP ASI01: Agent Goal Hijack. | Guardian agent hooks into Copilot Studio pre-tool-use security hooks. Vets every tool call before execution. Blocks exfiltration at the action layer. | Audit every Copilot Studio agent triggered by SharePoint forms. Restrict outbound email to org-only domains. Inventory all SharePoint Lists accessible to agents. Review the Nov 24–Jan 15 window for indicators of compromise. |
PipeLeak — Agentforce, no CVE assigned | In Capsule’s testing, public form input flowed directly into the agent context. No auth required. No volume cap observed on exfiltrated CRM data. The employee received no indication that data was leaving. | Runtime interception via platform agentic hooks. Pre-invocation checkpoint on every tool call. Detects outbound data transfer to non-approved destinations. | Review all Agentforce automations triggered by public-facing forms. Enable human-in-the-loop for external comms as interim control. Audit CRM data access scope per agent. Pressure Salesforce for CVE assignment. |
Multi-Turn Crescendo — distributed payload, each turn looks benign | Stateless monitoring inspects each turn in isolation. WAFs, DLP, and activity logs see individual requests, not semantic trajectory. | Stateful runtime analysis tracks full conversation history across turns. Fine-tuned SLMs evaluate aggregated context. Detects when a cumulative sequence constitutes a policy violation. | Require stateful monitoring for all production agents. Add crescendo attack scenarios to red team exercises. |
Coding Agents — unnamed platforms, memory poisoning + code execution | MCP servers inject code and instructions into the agent context. Memory poisoning persists across sessions. Guardrails reasoned around by the agent itself. Shadow AI insiders paste proprietary code into public LLMs. | Pre-invocation checkpoint on every tool call. Fine-tuned SLMs detect anomalous tool usage at runtime. | Inventory all coding agent deployments across engineering. Audit MCP server configs. Restrict code execution permissions. Monitor for shadow installations. |
Structural Gap — any agent with private data + untrusted input + external comms | Posture management tells you what should happen. It does not stop what does happen. Agents use far more permissions than humans at far greater speed. | Runtime guardian agent watches every action in real time. Intent-based enforcement replaces signature detection. Leverages vendor agentic hooks, not proxies or gateways. | Classify every agent by lethal trifecta exposure. Treat prompt injection as class-based SaaS risk. Require runtime security for any agent moving to production. Brief the board on agent risk as business risk. |
What this means for 2026 security planning
Microsoft’s CVE assignment will either accelerate or fragment how the industry handles agent vulnerabilities. If vendors call them configuration issues, CISOs carry the risk alone.
Treat prompt injection as a class-level SaaS risk rather than individual CVEs. Classify every agent deployment against the lethal trifecta. Require runtime enforcement for anything moving to production. Brief the board on agent risk the way McGladrey framed it: as business risk, because cybersecurity risk as a standalone category stopped being useful the moment agents started operating at machine speed.
Read on the original site
Open the publisher's page for the full experience
Related Articles
- Claude Code, Copilot and Codex all got hacked. Every attacker went for the credential, not the model.On March 30, BeyondTrust proved that a crafted GitHub branch name could steal Codex’s OAuth token in cleartext. OpenAI classified it Critical P1. Two days later, Anthropic’s Claude Code source code spilled onto the public npm registry, and within hours, Adversa found Claude Code silently ignored its own deny rules once a command exceeded 50 subcommands. These were not isolated bugs. They were the latest in a nine-month run: six research teams disclosed exploits against Codex, Claude Code, Copilot, and Vertex AI, and every exploit followed the same pattern. An AI coding agent held a credential, executed an action, and authenticated to a production system without a human session anchoring the request. The attack surface was first demonstrated at Black Hat USA 2025, when Zenity CTO Michael Bargury hijacked ChatGPT, Microsoft Copilot Studio, Google Gemini, Salesforce Einstein and Cursor with Jira MCP on stage with zero clicks. Nine months later, those credentials are what attackers reached. Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, named the failure in an exclusive VentureBeat interview. “Enterprises believe they’ve ‘approved’ AI vendors, but what they’ve actually approved is an interface, not the underlying system.” The credentials underneath the interface are the breach. Codex, where a branch name stole GitHub tokens BeyondTrust researcher Tyler Jespersen, with Fletcher Davis and Simon Stewart, found Codex cloned repositories using a GitHub OAuth token embedded in the git remote URL. During cloning, the branch name parameter flowed unsanitized into the setup script. A semicolon and a backtick subshell turned the branch name into an exfiltration payload. Stewart added the stealth. By appending 94 Ideographic Space characters (Unicode U+3000) after “main,” the malicious branch looked identical to the standard main branch in the Codex web portal. A developer sees “main.” The shell sees curl exfiltrating their token. OpenAI classified it Critical P1 and shipped full remediation by February 5, 2026. Claude Code, where two CVEs and a 50-subcommand bypass broke the sandbox CVE-2026-25723 hit Claude Code’s file-write restrictions. Piped sed and echo commands escaped the project sandbox because command chaining was not validated. Patched in 2.0.55. CVE-2026-33068 was subtler. Claude Code resolved permission modes from .claude/settings.json before showing the workspace trust dialog. A malicious repo set permissions.defaultMode to bypassPermissions. The trust prompt never appeared. Patched in 2.1.53. The 50-subcommand bypass landed last. Adversa found that Claude Code silently dropped deny-rule enforcement once a command exceeded 50 subcommands. Anthropic’s engineers had traded security for speed and stopped checking after the fiftieth. Patched in 2.1.90. “A significant vulnerability in enterprise AI is broken access control, where the flat authorization plane of an LLM fails to respect user permissions,” wrote Carter Rees, VP of AI and Machine Learning at Reputation and a member of the Utah AI Commission. The repository decided what permissions the agent had. The token budget decided which deny rules survived. Copilot, where a pull request description and a GitHub issue both became root Johann Rehberger demonstrated CVE-2025-53773 against GitHub Copilot with Markus Vervier of Persistent Security as co-discoverer. Hidden instructions in PR descriptions triggered Copilot to flip auto-approve mode in .vscode/settings.json. That disabled all confirmations and granted unrestricted shell execution across Windows, macOS, and Linux. Microsoft patched it in the August 2025 Patch Tuesday release. Then, Orca Security cracked Copilot inside GitHub Codespaces. Hidden instructions in a GitHub issue manipulated Copilot into checking out a malicious PR with a symbolic link to /workspaces/.codespaces/shared/user-secrets-envs.json. A crafted JSON $schema URL exfiltrated the privileged GITHUB_TOKEN. Full repository takeover. Zero user interaction beyond opening the issue. Mike Riemer, CTO at Ivanti, framed the speed dimension in a VentureBeat interview: “Threat actors are reverse engineering patches within 72 hours. If a customer doesn’t patch within 72 hours of release, they’re open to exploit.” Agents compress that window to seconds. Vertex AI, where default scopes reached Gmail, Drive and Google’s own supply chain Unit 42 researcher Ofir Shaty found that the default Google service identity attached to every Vertex AI agent had excessive permissions. Stolen P4SA credentials granted unrestricted read access to every Cloud Storage bucket in the project and reached restricted, Google-owned Artifact Registry repositories at the core of the Vertex AI Reasoning Engine. Shaty described the compromised P4SA as functioning like a "double agent," with access to both user data and Google's own infrastructure. VentureBeat defense grid Security requirement Defense shipped Exploit path The gap Sandbox AI agent execution Codex runs tasks in cloud containers; token scrubbed during agent runtime. Token present during cloning. Branch-name command injection executed before cleanup. No input sanitization on container setup parameters. Restrict file system access Claude Code sandboxes writes via accept-edits mode. Piped sed/echo escaped sandbox (CVE-2026-25723). Settings.json bypassed trust dialog (CVE-2026-33068). 50-subcommand chain dropped deny-rule enforcement. Command chaining not validated. Settings loaded before trust. Deny rules truncated for performance. Block prompt injection in code context Copilot filters PR descriptions for known injection patterns. Hidden injections in PRs, README files, and GitHub issues triggered RCE (CVE-2025-53773 + Orca RoguePilot). Static pattern matching loses to embedded prompts in legitimate review and Codespaces flows. Scope agent credentials to least privilege Vertex AI Agent Engine uses P4SA service agent with OAuth scopes. Default scopes reached Gmail, Calendar, Drive. P4SA credentials read every Cloud Storage bucket and Google’s Artifact Registry. OAuth scopes non-editable by default. Least privilege violated by design. Inventory and govern agent identities No major AI coding agent vendor ships agent identity discovery or lifecycle management. Not attempted. Enterprises do not inventory AI coding agents, their credentials, or their permission scopes. AI coding agents are invisible to IAM, CMDB, and asset inventory. Zero governance exists. Detect credential exfiltration from agent runtime Codex obscures tokens in web portal view. Claude Code logs subcommands. Tokens visible in cleartext inside containers. Unicode obfuscation hid exfil payloads. Subcommand chaining hid intent. No runtime monitoring of agent network calls. Log truncation hid the bypass. Audit AI-generated code for security flaws Anthropic launched Claude Code Security (Feb 2026). OpenAI launched Codex Security (March 2026). Both scan generated code. Neither scans the agent’s own execution environment or credential handling. Code-output security is not agent-runtime security. The agent itself is the attack surface. Every exploit targeted runtime credentials, not model output Every vendor shipped a defense. Every defense was bypassed. The Sonar 2026 State of Code Developer Survey found 25% of developers use AI agents regularly, and 64% have started using them. Veracode tested more than 100 LLMs and found 45% of generated code samples introduced OWASP Top 10 flaws, a separate failure that compounds the runtime credential gap. CrowdStrike CTO Elia Zaitsev framed the rule in an exclusive VentureBeat interview at RSAC 2026: collapse agent identities back to the human, because an agent acting on your behalf should never have more privileges than you do. Codex held a GitHub OAuth token scoped to every repository the developer authorized. Vertex AI’s P4SA read every Cloud Storage bucket in the project. Claude Code traded deny-rule enforcement for token budget. Kayne McGladrey, an IEEE Senior Member who advises enterprises on identity risk, made the same diagnosis in an exclusive interview with VentureBeat. "It uses far more permissions than it should have, more than a human would, because of the speed of scale and intent." Riemer drew the operational line in an exclusive VentureBeat interview. "It becomes, I don't know you until I validate you." The branch name talked to the shell before validation. The GitHub issue talked to Copilot before anyone read it. Security director action plan Inventory every AI coding agent (CIEM). Codex, Claude Code, Copilot, Cursor, Gemini Code Assist, Windsurf. List the credentials and OAuth scopes each received at setup. If your CMDB has no category for AI agent identities, create one. Audit OAuth scopes and patch levels. Upgrade Claude Code to 2.1.90 or later. Verify Copilot's August 2025 patch. Migrate Vertex AI to the bring-your-own-service-account model. Treat branch names, pull request descriptions, GitHub issues, and repo configuration as untrusted input. Monitor for Unicode obfuscation (U+3000), command chaining over 50 subcommands, and changes to .vscode/settings.json or .claude/settings.json that flip permission modes. Govern agent identities the way you govern human privileged identities (PAM/IGA). Credential rotation. Least-privilege scoping. Separation of duties between the agent that writes code and the agent that deploys it. CyberArk, Delinea, and any PAM platform that accepts non-human identities can onboard agent OAuth credentials today; Gravitee's 2026 survey found only 21.9% of teams have done it. Validate before you communicate. "As long as we trust and we check and we validate, I'm fine with letting AI maintain it," Riemer said. Before any AI coding agent authenticates to GitHub, Gmail, or an internal repository, verify the agent's identity, scope, and the human session it is bound to. Ask each vendor in writing before your next renewal. "Show me the identity lifecycle management controls for the AI agent running in my environment, including credential scope, rotation policy, and permission audit trail." If the vendor cannot answer, that is the audit finding. The governance gap in three sentences Most CISOs inventory every human identity and have zero inventory of the AI agents running with equivalent credentials. No IAM framework governs human privilege escalation and agent privilege escalation with the same rigor. Most scanners track every CVE but cannot alert when a branch name exfiltrates a GitHub token through a container that developers trust by default. Zaitsev's advice to RSAC 2026 attendees was blunt: you already know what to do. Agents just made the cost of not doing it catastrophic.
- Most enterprises can't stop stage-three AI agent threats, VentureBeat survey findsA rogue AI agent at Meta passed every identity check and still exposed sensitive data to unauthorized employees in March. Two weeks later, Mercor, a $10 billion AI startup, confirmed a supply-chain breach through LiteLLM. Both are traced to the same structural gap. Monitoring without enforcement, enforcement without isolation. A VentureBeat three-wave survey of 108 qualified enterprises found that the gap is not an edge case. It is the most common security architecture in production today. Gravitee’s State of AI Agent Security 2026 survey of 919 executives and practitioners quantifies the disconnect. 82% of executives say their policies protect them from unauthorized agent actions. Eighty-eight percent reported AI agent security incidents in the last twelve months. Only 21% have runtime visibility into what their agents are doing. Arkose Labs’ 2026 Agentic AI Security Report found 97% of enterprise security leaders expect a material AI-agent-driven incident within 12 months. Only 6% of security budgets address the risk. VentureBeat's survey results show that monitoring investment snapped back to 45% of security budgets in March after dropping to 24% in February, when early movers shifted dollars into runtime enforcement and sandboxing. The March wave (n=20) is directional, but the pattern is consistent with February’s larger sample (n=50): enterprises are stuck at observation while their agents already need isolation. CrowdStrike’s Falcon sensors detect more than 1,800 distinct AI applications across enterprise endpoints. The fastest recorded adversary breakout time has dropped to 27 seconds. Monitoring dashboards built for human-speed workflows cannot keep pace with machine-speed threats. The audit that follows maps three stages. Stage one is observe. Stage two is enforce, where IAM integration and cross-provider controls turn observation into action. Stage three is isolate, sandboxed execution that bounds blast radius when guardrails fail. VentureBeat Pulse data from 108 qualified enterprises ties each stage to an investment signal, an OWASP ASI threat vector, a regulatory surface, and immediate steps security leaders can take. The threat surface stage-one security cannot see The OWASP Top 10 for Agentic Applications 2026 formalized the attack surface last December. The ten risks are: goal hijack (ASI01), tool misuse (ASI02), identity and privilege abuse (ASI03), agentic supply chain vulnerabilities (ASI04), unexpected code execution (ASI05), memory poisoning (ASI06), insecure inter-agent communication (ASI07), cascading failures (ASI08), human-agent trust exploitation (ASI09), and rogue agents (ASI10). Most have no analog in traditional LLM applications. The audit below maps six of these to the stages where they are most likely to surface and the controls that address them. Invariant Labs disclosed the MCP Tool Poisoning Attack in April 2025: malicious instructions in an MCP server’s tool description cause an agent to exfiltrate files or hijack a trusted server. CyberArk extended it to Full-Schema Poisoning. The mcp-remote OAuth proxy patched CVE-2025-6514 after a command-injection flaw put 437,000 downloads at risk. Merritt Baer, CSO at Enkrypt AI and former AWS Deputy CISO, framed the gap in an exclusive VentureBeat interview: “Enterprises believe they’ve ‘approved’ AI vendors, but what they’ve actually approved is an interface, not the underlying system. The real dependencies are one or two layers deeper, and those are the ones that fail under stress.” CrowdStrike CTO Elia Zaitsev put the visibility problem in operational terms in an exclusive VentureBeat interview at RSAC 2026: “It looks indistinguishable if an agent runs your web browser versus if you run your browser.” Distinguishing the two requires walking the process tree, tracing whether Chrome was launched by a human from the desktop or spawned by an agent in the background. Most enterprise logging configurations cannot make that distinction. The regulatory clock and the identity architecture Auditability priority tells the same story in miniature. In January, 50% of respondents ranked it a top concern. By February, that dropped to 28% as teams sprinted to deploy. In March, it surged to 65% when those same teams realized they had no forensic trail for what their agents did. HIPAA’s 2026 Tier 4 willful-neglect maximum is $2.19M per violation category per year. In healthcare, Gravitee’s survey found 92.7% of organizations reported AI agent security incidents versus the 88% all-industry average. For a health system running agents that touch PHI, that ratio is the difference between a reportable breach and an uncontested finding of willful neglect. FINRA’s 2026 Oversight Report recommends explicit human checkpoints before agents that can act or transact execute, along with narrow scope, granular permissions, and complete audit trails of agent actions. Mike Riemer, Field CISO at Ivanti, quantified the speed problem in a recent VentureBeat interview: “Threat actors are reverse engineering patches within 72 hours. If a customer doesn’t patch within 72 hours of release, they’re open to exploit.” Most enterprises take weeks. Agents operating at machine speed widen that window into a permanent exposure. The identity problem is architectural. Gravitee's survey of 919 practitioners found only 21.9% of teams treat agents as identity-bearing entities, 45.6% still use shared API keys, and 25.5% of deployed agents can create and task other agents. A quarter of enterprises can spawn agents that their security team never provisioned. That is ASI08 as architecture. Guardrails alone are not a strategy A 2025 paper by Kazdan and colleagues (Stanford, ServiceNow Research, Toronto, FAR AI) showed a fine-tuning attack that bypasses model-level guardrails in 72% of attempts against Claude 3 Haiku and 57% against GPT-4o. The attack received a $2,000 bug bounty from OpenAI and was acknowledged as a vulnerability by Anthropic. Guardrails constrain what an agent is told to do, not what a compromised agent can reach. CISOs already know this. In VentureBeat's three-wave survey, prevention of unauthorized actions ranked as the top capability priority in every wave at 68% to 72%, the most stable high-conviction signal in the dataset. The demand is for permissioning, not prompting. Guardrails address the wrong control surface. Zaitsev framed the identity shift at RSAC 2026: “AI agents and non-human identities will explode across the enterprise, expanding exponentially and dwarfing human identities. Each agent will operate as a privileged super-human with OAuth tokens, API keys, and continuous access to previously siloed data sets.” Identity security built for humans will not survive this shift. Cisco President Jeetu Patel offered the operational analogy in an exclusive VentureBeat interview: agents behave “more like teenagers, supremely intelligent, but with no fear of consequence.” VentureBeat Prescriptive Matrix: AI Agent Security Maturity Audit Stage Attack Scenario What Breaks Detection Test Blast Radius Recommended Control 1: Observe Attacker embeds goal-hijack payload in forwarded email (ASI01). Agent summarizes email and silently exfiltrates credentials to an external endpoint. See: Meta March 2026 incident. No runtime log captures the exfiltration. SIEM never sees the API call. The security team learns from the victim. Zaitsev: agent activity is “indistinguishable” from human activity in default logging. Inject a canary token into a test document. Route it through your agent. If the token leaves your network, stage one failed. Single agent, single session. With shared API keys (45.6% of enterprises): unlimited lateral movement. Deploy agent API call logging to SIEM. Baseline normal tool-call patterns per agent role. Alert on the first outbound call to an unrecognized endpoint. 2: Enforce Compromised MCP server poisons tool description (ASI04). Agent invokes poisoned tool, writes attacker payload to production DB using inherited service-account credentials. See: Mercor/LiteLLM April 2026 supply-chain breach. IAM allows write because agent uses shared service account. No approval gate on write ops. Poisoned tool indistinguishable from clean tool in logs. Riemer: “72-hour patch window” collapses to zero when agents auto-invoke. Register a test MCP server with a benign-looking poisoned description. Confirm your policy engine blocks the tool call before execution reaches the database. Run mcp-scan on all registered servers. Production database integrity. If agent holds DBA-level credentials: full schema compromise. Lateral movement via trust relationships to downstream agents. Assign scoped identity per agent. Require approval workflow for all write ops. Revoke every shared API key. Run mcp-scan on all MCP servers weekly. 3: Isolate Agent A spawns Agent B to handle subtask (ASI08). Agent B inherits Agent A’s permissions, escalates to admin, rewrites org security policy. Every identity check passes. Source: CrowdStrike CEO George Kurtz, RSAC 2026 keynote. No sandbox boundary between agents. No human gate on agent-to-agent delegation. Security policy modification is a valid action for admin-credentialed process. CrowdStrike CEO George Kurtz disclosed at RSAC 2026 that the agent “wanted to fix a problem, lacked permissions, and removed the restriction itself.” Spawn a child agent from a sandboxed parent. Child should inherit zero permissions by default and require explicit human approval for each capability grant. Organizational security posture. A rogue policy rewrite disables controls for every subsequent agent. 97% of enterprise leaders expect a material incident within 12 months (Arkose Labs 2026). Sandbox all agent execution. Zero-trust for agent-to-agent delegation: spawned agents inherit nothing. Human sign-off before any agent modifies security controls. Kill switch per OWASP ASI10. Sources: OWASP Top 10 for Agentic Applications 2026; Invariant Labs MCP Tool Poisoning (April 2025); CrowdStrike RSAC 2026 Fortune 50 disclosure; Meta March 2026 incident (The Information/Engadget); Mercor/LiteLLM breach (Fortune, April 2, 2026); Arkose Labs 2026 Agentic AI Security Report; VentureBeat Pulse Q1 2026. The stage-one attack scenario in this matrix is not hypothetical. Unauthorized tool or data access ranked as the most feared failure mode in every wave of VentureBeat’s survey, growing from 42% in January to 50% in March. That trajectory and the 70%-plus priority rating for prevention of unauthorized actions are the two most mutually reinforcing signals in the entire dataset. CISOs fear the exact attack this matrix describes, and most have not deployed the controls to stop it. Hyperscaler stage readiness: observe, enforce, isolate The maturity audit tells you where your security program stands. The next question is whether your cloud platform can get you to stage two and stage three, or whether you are building those capabilities yourself. Patel put it bluntly: “It’s not just about authenticating once and then letting the agent run wild.” A stage-three platform running a stage-one deployment pattern gives you stage-one risk. VentureBeat Pulse data surfaces a structural tension in this grid. OpenAI leads enterprise AI security deployments at 21% to 26% across the three survey waves, making the same provider that creates the AI risk also the primary security layer. The provider-as-security-vendor pattern holds across Azure, Google, and AWS. Zero-incremental-procurement convenience is winning by default. Whether that concentration is a feature or a single point of failure depends on how far the enterprise has progressed past stage one. Provider Identity Primitive (Stage 2) Enforcement Control (Stage 2) Isolation Primitive (Stage 3) Gap as of April 2026 Microsoft Azure Entra ID agent scoping. Agent 365 maps agents to owners. GA. Copilot Studio DLP policies. Purview for agent output classification. GA. Azure Confidential Containers for agent workloads. Preview. No per-agent sandbox at GA. No agent-to-agent identity verification. No MCP governance layer. Agent 365 monitors but cannot block in-flight tool calls. Anthropic Managed Agents: per-agent scoped permissions, credential mgmt. Beta (April 8, 2026). $0.08/session-hour. Tool-use permissions, system prompt enforcement, and built-in guardrails. GA. Managed Agents sandbox: isolated containers per session, execution-chain auditability. Beta. Allianz, Asana, Rakuten, and Sentry are in production. Beta pricing/SLA not public. Session data in Anthropic-managed DB (lock-in risk per VentureBeat research). GA timing TBD. Google Cloud Vertex AI service accounts for model endpoints. IAM Conditions for agent traffic. GA. VPC Service Controls for agent network boundaries. Model Armor for prompt/response filtering. GA. Confidential VMs for agent workloads. GA. Agent-specific sandbox in preview. Agent identity ships as a service account, not an agent-native principal. No agent-to-agent delegation audit. Model Armor does not inspect tool-call payloads. OpenAI Assistants API: function-call permissions, structured outputs. Agents SDK. GA. Agents SDK guardrails, input/output validation. GA. Agents SDK Python sandbox. Beta (API and defaults subject to change before GA per OpenAI docs). TypeScript sandbox confirmed, not shipped. No cross-provider identity federation. Agent memory forensics limited to session scope. No kill switch API. No MCP tool-description inspection. AWS Bedrock model invocation logging. IAM policies for model access. CloudTrail for agent API calls. GA. Bedrock Guardrails for content filtering. Lambda resource policies for agent functions. GA. Lambda isolation per agent function. GA. Bedrock agent-level sandboxing on roadmap, not shipped. No unified agent control plane across Bedrock + SageMaker + Lambda. No agent identity standard. Guardrails do not inspect MCP tool descriptions. Status as of April 15, 2026. GA = generally available. Preview/Beta = not production-hardened. “What’s Missing” column reflects VentureBeat’s analysis of publicly documented capabilities; gaps may narrow as vendors ship updates. No provider in this grid ships a complete stage-three stack today. Most enterprises assemble isolation from existing cloud building blocks. That is a defensible choice if it is a deliberate one. Waiting for a vendor to close the gap without acknowledging the gap is not a strategy. The grid above covers hyperscaler-native SDKs. A large segment of AI builders deploys through open-source orchestration frameworks like LangChain, CrewAI, and LlamaIndex that bypass hyperscaler IAM entirely. These frameworks lack native stage-two primitives. There is no scoped agent identity, no tool-call approval workflow, and no built-in audit trails. Enterprises running agents through open-source orchestration need to layer enforcement and isolation on top, not assume the framework provides it. VentureBeat’s survey quantifies the pressure. Policy enforcement consistency grew from 39.5% to 46% between January and February, the largest consistent gain of any capability criterion. Enterprises running agents across OpenAI, Anthropic, and Azure need enforcement that works the same way regardless of which model executes the task. Provider-native controls enforce policy within that provider’s runtime only. Open-source orchestration frameworks enforce it nowhere. One counterargument deserves acknowledgment: not every agent deployment needs stage three. A read-only summarization agent with no tool access and no write permissions may rationally stop at stage one. The sequencing failure this audit addresses is not that monitoring exists. It is that enterprises running agents with write access, shared credentials, and agent-to-agent delegation are treating monitoring as sufficient. For those deployments, stage one is not a strategy. It is a gap. Allianz shows stage-three in production Allianz, one of the world’s largest insurance and asset management companies, is running Claude Managed Agents across insurance workflows, with Claude Code deployed to technical teams and a dedicated AI logging system for regulatory transparency, per Anthropic’s April 8 announcement. Asana, Rakuten, Sentry, and Notion are in production on the same beta. Stage-three isolation, per-agent permissioning, and execution-chain auditability are deployable now, not roadmap. The gating question is whether the enterprise has sequenced the work to use them. The 90-day remediation sequence Days 1–30: Inventory and baseline. Map every agent to a named owner. Log all tool calls. Revoke shared API keys. Deploy read-only monitoring across all agent API traffic. Run mcp-scan against every registered MCP server. CrowdStrike detects 1,800 AI applications across enterprise endpoints; your inventory should be equally comprehensive. Output: agent registry with permission matrix, MCP scan report. Days 31–60: Enforce and scope. Assign scoped identities to every agent. Deploy tool-call approval workflows for write operations. Integrate agent activity logs into existing SIEM. Run a tabletop exercise: What happens when an agent spawns an agent? Conduct a canary-token test from the prescriptive matrix. Output: IAM policy set, approval workflow, SIEM integration, canary-token test results. Days 61–90: Isolate and test. Sandbox high-risk agent workloads (PHI, PII, financial transactions). Enforce per-session least privilege. Require human sign-off for agent-to-agent delegation. Red-team the isolation boundary using the stage-three detection test from the matrix. Output: sandboxed execution environment, red-team report, board-ready risk summary with regulatory exposure mapped to HIPAA tier and FINRA guidance. What changes in the next 30 days EU AI Act Article 14 human-oversight obligations take effect August 2, 2026. Programs without named owners and execution trace capability face enforcement, not operational risk. Anthropic’s Claude Managed Agents is in public beta at $0.08 per session-hour. GA timing, production SLAs, and final pricing have not been announced. OpenAI Agents SDK ships TypeScript support for sandbox and harness capabilities in a future release, per the company’s April 15 announcement. Stage-three sandbox becomes available to JavaScript agent stacks when it ships. What the sequence requires McKinsey’s 2026 AI Trust Maturity Survey pegs the average enterprise at 2.3 out of 4.0 on its RAI maturity model, up from 2.0 in 2025 but still an enforcement-stage number; only one-third of the ~500 organizations surveyed report maturity levels of three or higher in governance. Seventy percent have not finished the transition to stage three. ARMO’s progressive enforcement methodology gives you the path: behavioral profiles in observation, permission baselines in selective enforcement, and full least privilege once baselines stabilize. Monitoring investment was not wasted. It was stage one of three. The organizations stuck in the data treated it as the destination. The budget data makes the constraint explicit. The share of enterprises reporting flat AI security budgets doubled from 7.9% in January to 16% in February in VentureBeat's survey, with the March directional reading at 20%. Organizations expanding agent deployments without increasing security investment are accumulating security debt at machine speed. Meanwhile, the share reporting no agent security tooling at all fell from 13% in January to 5% in March. Progress, but one in twenty enterprises running agents in production still has zero dedicated security infrastructure around them. About this research Total qualified respondents: 108. VentureBeat Pulse AI Security and Trust is a three-wave VentureBeat survey run January 6 through March 15, 2026. Qualified sample (organizations 100+ employees): January n=38, February n=50, March n=20. Primary analysis runs from January to February; March is directional. Industry mix: Tech/Software 52.8%, Financial Services 10.2%, Healthcare 8.3%, Education 6.5%, Telecom/Media 4.6%, Manufacturing 4.6%, Retail 3.7%, other 9.3%. Seniority: VP/Director 34.3%, Manager 29.6%, IC 22.2%, C-Suite 9.3%.
- Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted itA security researcher, working with colleagues at Johns Hopkins University, opened a GitHub pull request, typed a malicious instruction into the PR title, and watched Anthropic’s Claude Code Security Review action post its own API key as a comment. The same prompt injection worked on Google’s Gemini CLI Action and GitHub’s Copilot Agent (Microsoft). No external infrastructure required. Aonan Guan, the researcher who discovered the vulnerability, alongside Johns Hopkins colleagues Zhengyu Liu and Gavin Zhong, published the full technical disclosure last week, calling it “Comment and Control.” GitHub Actions does not expose secrets to fork pull requests by default when using the pull_request trigger, but workflows using pull_request_target, which most AI agent integrations require for secret access, do inject secrets into the runner environment. This limits the practical attack surface but does not eliminate it: collaborators, comment fields, and any repo using pull_request_target with an AI coding agent are exposed. Per Guan’s disclosure timeline: Anthropic classified it as CVSS 9.4 Critical ($100 bounty), Google paid a $1,337 bounty, and GitHub awarded $500 through the Copilot Bounty Program. The $100 amount is notably low relative to the CVSS 9.4 rating; Anthropic’s HackerOne program scopes agent-tooling findings separately from model-safety vulnerabilities. All three patched quietly, and none had issued CVEs in the NVD or published security advisories through GitHub Security Advisories as of Saturday. Comment and Control exploited a prompt injection vulnerability in Claude Code Security Review, a specific GitHub Action feature that Anthropic’s own system card acknowledged is “not hardened against prompt injection.” The feature is designed to process trusted first-party inputs by default; users who opt into processing untrusted external PRs and issues accept additional risk and are responsible for restricting agent permissions. Anthropic updated its documentation to clarify this operating model after the disclosure. The same class of attack operates beneath OpenAI’s safeguard layer at the agent runtime, based on what their system card does not document — not a demonstrated exploit. The exploit is the proof case, but the story is what the three system cards reveal about the gap between what vendors document and what they protect. OpenAI and Google did not respond for comment by publication time. “At the action boundary, not the model boundary,” Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, told VentureBeat when asked where protection actually needs to sit. “The runtime is the blast radius.” What the system cards tell you Anthropic’s Opus 4.7 system card runs 232 pages with quantified hack rates and injection resistance metrics. It discloses a restricted model strategy (Mythos held back as a capability preview) and states directly that Claude Code Security Review is “not hardened against prompt injection.” The system card explains to readers that the runtime was exposed. Comment and Control proved it. Anthropic does gate certain agent actions outside the system card’s scope — Claude Code Auto Mode, for example, applies runtime-level protections — but the system card itself does not document these runtime safeguards or their coverage. OpenAI’s GPT-5.4 system card documents extensive red teaming and publishes model-layer injection evals but not agent-runtime or tool-execution resistance metrics. Trusted Access for Cyber scales access to thousands. The system card tells you what red teamers tested. It does not tell you how resistant the model is to the attacks they found. Google’s Gemini 3.1 Pro model card, shipped in February, defers most safety methodology to older documentation, a VentureBeat review of the card found. Google’s Automated Red Teaming program remains internal only. No external cyber program. Dimension Anthropic (Opus 4.7) OpenAI (GPT-5.4) Google (Gemini 3.1 Pro) System card depth 232 pages. Quantified hack rates, classifier scores, and injection resistance metrics. Extensive. Red teaming hours documented. No injection resistance rates published. Few pages. Defers to older Gemini 3 Pro card. No quantified results. Cyber verification program CVP. Removes cyber safeguards for vetted pentesters and red teamers doing authorized offensive work. Does not address prompt injection defense. Platform and data-retention exclusions not yet publicly documented. TAC. Scaled to thousands. Constrains ZDR. None. No external defender pathway. Restricted model strategy Yes. Mythos held back as a capability preview. Opus 4.7 is the testbed. No restricted model. Full capability released, access gated. No restricted model. No stated plan for one. Runtime agent safeguards Claude Code Security Review: system card states it is not hardened against prompt injection. The feature is designed for trusted first-party inputs. Anthropic applies additional runtime protections (e.g., Claude Code Auto Mode) not documented in the system card. Not documented. TAC governs access, not agent operations. Not documented. ART internal only. Exploit response (Comment and Control) CVSS 9.4 Critical. $100 bounty. Patched. No CVE. Not directly exploited. Structural gap inferred from TAC design, not demonstrated. $1,337 bounty per Guan disclosure. Patched. No CVE. Injection resistance data Published. Quantified rates in the system card. Model-layer injection evals published. No agent-runtime or tool-execution resistance rates. Not published. No quantified data available. Baer offered specific procurement questions. “For Anthropic, ask how safety results actually transfer across capability jumps,” she told VentureBeat. “For OpenAI, ask what ‘trusted’ means under compromise.” For both, she said, directors need to “demand clarity on whether safeguards extend into tool execution, not just prompt filtering.” Seven threat classes neither safeguard approach closes Each row names what breaks, why your controls miss it, what Comment and Control proved, and the recommended action for the week ahead. Threat Class What Breaks Why Your Controls Miss It What Comment and Control Proved Recommended Action 1. Deployment surface mismatch CVP is designed for authorized offensive security research, not prompt injection defense. It does not extend to Bedrock, Vertex, or ZDR tenants. TAC constrains ZDR. Google has no program. Your team may be running a verified model on an unverified surface. Launch announcements describe the program. Support documentation lists the exclusions. Security teams read the announcement. Procurement reads neither. The exploit targets the agent runtime, not the deployment platform. A team running Claude Code on Bedrock is outside CVP coverage, but CVP was not designed to address this class of vulnerability in the first place. Email your Anthropic and OpenAI reps today. One question, in writing: ‘Confirm whether [your platform] and [your data retention config] are covered by your runtime-level prompt injection protections, and describe what those protections include.’ File the response in your vendor risk register. 2. CI secrets exposed to AI agents ANTHROPIC_API_KEY, GEMINI_API_KEY, GITHUB_TOKEN, and any production secret stored as a GitHub Actions env var are readable by every workflow step, including AI coding agents. The default GitHub Actions config does not scope secrets to individual steps. Repo-level and org-level secrets propagate to all workflows. Most teams never audit which steps access which secrets. The agent read the API key from the runner env var, encoded it in a PR comment body, and posted it through GitHub’s API. No attacker-controlled infrastructure required. Exfiltration ran through GitHub’s own API — the platform itself became the C2 channel. Run: grep -r ‘secrets\.’ .github/workflows/ across every repo with an AI agent. List every secret the agent can access. Rotate all exposed credentials. Migrate to short-lived OIDC tokens (GitHub, GitLab, CircleCI). 3. Over-permissioned agent runtimes AI agents granted bash execution, git push, and API write access at setup. Permissions never scoped down. No periodic least-privilege review. Agents accumulate access in the same way service accounts do. Agents are configured once during onboarding and inherited across repos. No tooling flags unused permissions. The Comment and Control agent had bash, write, and env-read access for a code review task. The agent had bash access it did not need for code review. It used that access to read env vars and post exfiltrated data. Stripping bash would have blocked the attack chain entirely. Audit agent permissions repo by repo. Strip bash from code review agents. Set repo access to read-only. Gate write access (PR comments, commits, merges) behind a human approval step. 4. No CVE signal for AI agent vulnerabilities CVSS 9.4 Critical. Anthropic, Google, and GitHub patched. Zero CVE entries in NVD. Zero advisories. Your vulnerability scanner, SIEM, and GRC tool all show green. No CNA has yet issued a CVE for a coding agent prompt injection, and current CVE practices have not captured this class of failure mode. Vendors patch through version bumps. Qualys, Tenable, and Rapid7 have nothing to scan for. A SOC analyst running a full scan on Monday morning would find zero entries for a Critical vulnerability that hit Claude Code Security Review, Gemini CLI Action, and Copilot simultaneously. Create a new category in your supply chain risk register: ‘AI agent runtime.’ Assign a 48-hour check-in cadence with each vendor’s security contact. Do not wait for CVEs. None have come yet, and the taxonomy gap makes them unlikely without industry pressure. 5. Model safeguards do not govern agent actions Opus 4.7 blocks a phishing email prompt. It does not block an agent from reading $ANTHROPIC_API_KEY and posting it as a PR comment. Safeguards gate generation, not operation. Safeguards filter model outputs (text). Agent operations (bash, git push, curl, API POST) bypass safeguard evaluation entirely. The runtime is outside the safeguard perimeter. Anthropic applies some runtime-level protections in features like Claude Code Auto Mode, but these are not documented in the system card and their scope is not publicly defined. The agent never generated prohibited content. It performed a legitimate operation (post a PR comment) containing exfiltrated data. Safeguards never triggered. Map every operation your AI agents perform: bash, git, API calls, file writes. For each, ask the vendor in writing: does your safeguard layer evaluate this action before execution? Document the answer. 6. Untrusted input parsed as instructions PR titles, PR body text, issue comments, code review comments, and commit messages are all parsed by AI coding agents as context. Any can contain injected instructions. No input sanitization layer between GitHub and the agent instruction set. The agent cannot distinguish developer intent from attacker injection in untrusted fields. Claude Code GitHub Action is designed for trusted first-party inputs by default. Users who opt into processing untrusted external PRs accept additional risk. A single malicious PR title became a complete exfiltration command. The agent treated it as a legitimate instruction and executed it without validation or confirmation. Implement input sanitization as defense-in-depth, but do not rely on traditional WAF-style regex patterns. LLM prompt injections are non-deterministic and will evade static pattern matching. Restrict agent context to approved workflow configs and combine with least-privilege permissions. 7. No comparable injection resistance data across vendors Anthropic publishes quantified injection resistance rates in 232 pages. OpenAI publishes model-layer injection evals but no agent-runtime resistance rates. Google publishes a few-page card referencing an older model. No industry standard for AI safety metric disclosure. Vendors may have internal metrics and red-team programs, but published disclosures are not comparable. Procurement has no baseline and no framework to require one. Anthropic, OpenAI, and Google were all approved for enterprise use without comparable injection resistance data. The exploit exposed what unmeasured risk looks like in production. Write one sentence for your next vendor meeting: ‘Show me your quantified injection resistance rate for my model version on my platform.’ Document refusals for EU AI Act high-risk compliance. Deadline: August 2026. OpenAI’s GPT-5.4 was not directly exploited in the Comment and Control disclosure. The gaps identified in the OpenAI and Google columns are inferred from what their system cards and program documentation do not publish, not from demonstrated exploits. That distinction matters. Absence of published runtime metrics is a transparency gap, not proof of a vulnerability. It does mean procurement teams cannot verify what they cannot measure. Eligibility requirements for Anthropic’s Cyber Verification Program and OpenAI’s Trusted Access for Cyber are still evolving, as are platform coverage and program scope, so security teams should validate current vendor docs before treating any coverage described here as definitive. Anthropic’s CVP is designed for authorized offensive security research — removing cyber safeguards for vetted actors — and is not a prompt injection defense program. Security leaders mapping these gaps to existing frameworks can align threat classes 1–3 with NIST CSF 2.0 GV.SC (Supply Chain Risk Management), threat class 4 with ID.RA (Risk Assessment), and threat classes 5–7 with PR.DS (Data Security). Comment and Control focuses on GitHub Actions today, but the seven threat classes generalize to most CI/CD runtimes where AI agents execute with access to secrets, including GitHub Actions, GitLab CI, CircleCI, and custom runners. Safety metric disclosure formats are in flux across all three vendors; Anthropic currently leads on published quantification in its system card documentation, but norms are likely to converge as EU AI Act obligations come into force. Comment and Control targeted Claude Code GitHub Action, a specific product feature, not Anthropic’s models broadly. The vulnerability class, however, applies to any AI coding agent operating in a CI/CD runtime with access to secrets. What to do before your next vendor renewal “Don’t standardize on a model. Standardize on a control architecture,” Baer told VentureBeat. “The risk is systemic to agent design, not vendor-specific. Maintain portability so you can swap models without reworking your security posture.” Build a deployment map. Confirm your platform qualifies for the runtime protections you think cover you. If you run Opus 4.7 on Bedrock, ask your Anthropic account rep what runtime-level prompt injection protections apply to your deployment surface. Email your account rep today. (Anthropic Cyber Verification Program) Audit every runner for secret exposure. Run grep -r ‘secrets\.’ .github/workflows/ across every repo with an AI coding agent. List every secret the agent can access. Rotate all exposed credentials. (GitHub Actions secrets documentation) Start migrating credentials now. Switch stored secrets to short-lived OIDC token issuance. GitHub Actions, GitLab CI, and CircleCI all support OIDC federation. Set token lifetimes to minutes, not hours. Plan full rollout over one to two quarters, starting with repos running AI agents. (GitHub OIDC docs | GitLab OIDC docs | CircleCI OIDC docs) Fix agent permissions repo by repo. Strip bash execution from every AI agent doing code review. Set repository access to read-only. Gate write access behind a human approval step. (GitHub Actions permissions documentation) Add input sanitization as one layer, not the only layer. Filter pull request titles, comments, and review threads for instruction patterns before they reach agents. Combine with least-privilege permissions and OIDC. Static regex will not catch non-deterministic prompt injections on its own. Add “AI agent runtime” to your supply chain risk register. Assign a 48-hour patch verification cadence with each vendor’s security contact. Do not wait for CVEs. None have come yet for this class of vulnerability. Check which hardened GitHub Actions mitigations you already have in place. Hardened GitHub Actions configurations block this attack class today: the permissions key restricts GITHUB_TOKEN scope, environment protection rules require approval before secrets are injected, and first-time-contributor gates prevent external pull requests from triggering agent workflows. (GitHub Actions security hardening guide) Prepare one procurement question per vendor before your next renewal. Write one sentence: “Show me your quantified injection resistance rate for the model version I run on the platform I deploy to.” Document refusals for EU AI Act high-risk compliance. The deadline is August 2026. “Raw zero-days aren’t how most systems get compromised. Composability is,” Baer said. “It’s the glue code, the tokens in CI, the over-permissioned agents. When you wire a powerful model into a permissive runtime, you’ve already done most of the attacker’s work for them.”
- 85% of enterprises are running AI agents. Only 5% trust them enough to ship.Eighty-five percent of enterprises are running AI agent pilots, but only 5% have moved those agents into production. In an exclusive interview at RSA Conference 2026, Cisco President and Chief Product Officer Jeetu Patel said that the gap comes down to one thing: trust — and that closing it separates market dominance from bankruptcy. He also disclosed a mandate that will reshape Cisco's 90,000-person engineering organization. The problem is not rogue agents. The problem is the absence of a trust architecture. The trust deficit behind a 5% production rate A recent Cisco survey of major enterprise customers found that 85% have AI agent pilot programs underway. Only 5% moved those agents into production. That 80-point gap defines the security problem the entire industry is trying to close. It is not closing. "The biggest impediment to scaled adoption in enterprises for business-critical tasks is establishing a sufficient amount of trust," Patel told VentureBeat. "Delegating versus trusted delegating of tasks to agents. The difference between those two, one leads to bankruptcy and the other leads to market dominance." He compared agents to teenagers. "They're supremely intelligent, but they have no fear of consequence. They're pretty immature. And they can be easily sidetracked or influenced," Patel said. "What you have to do is make sure that you have guardrails around them and you need some parenting on the agents." The comparison carries weight because it captures the precise failure mode security teams face. Three years ago, a chatbot that gave the wrong answer was an embarrassment. An agent that takes the wrong action can trigger an irreversible outcome. Patel pointed to a case he cited in his keynote where an AI coding agent deleted a live production database during a code freeze, tried to cover its tracks with fake data, and then apologized. "An apology is not a guardrail," Patel said in his keynote blog. The shift from information risk to action risk is the core reason the pilot-to-production gap persists. Defense Claw and the open-source speed play with Nvidia Cisco's response to the trust deficit at RSAC 2026 spanned three categories: protecting agents from the world, protecting the world from agents, and detecting and responding at machine speed. The product announcements included AI Defense Explorer Edition (a free, self-service red teaming tool), the Agent Runtime SDK for embedding policy enforcement into agent workflows at build time, and the LLM Security Leaderboard for evaluating model resilience against adversarial attacks. The open-source strategy moved faster than any of those. Nvidia launched OpenShell, a secure container for open-source agent frameworks, at GTC the week before RSAC. Cisco packaged its Skills Scanner, MCP Scanner, AI Bill of Materials tool, and CodeGuard into a single open-source framework called Defense Claw and hooked it into OpenShell within 48 hours. "Every single time you actually activate an agent in an Open Shell container, you can now automatically instantiate all the security services that we have built through Defense Claw," Patel told VentureBeat. The integration means security enforcement activates at container launch without manual configuration. That speed matters because the alternative is asking developers to bolt on security after the agent is already running. That 48-hour turnaround was not an anomaly. Patel said several of the Defense Claw capabilities Cisco launched were built in a week. "You couldn't have built it in longer than a week because Open Shell came out last week," he said. A six-to-nine-month product lead and an information asymmetry on top of it Patel made a competitive claim worth examining. "Product wise, we might be six to nine months ahead of most of the market," he told VentureBeat. He added a second layer: "We also have an asymmetric information advantage of, I'd say, three to six months on everyone because, you know, we, by virtue of being in the ecosystem with all the model companies. We're seeing what's coming down the pipe." The 48-hour Defense Claw sprint supports the speed claim, though the lead margin is Cisco's own characterization; no independent benchmarks were provided. Cisco also extended zero trust to the agentic workforce through new Duo IAM and Secure Access capabilities, giving every agent time-bound, task-specific permissions. On the SOC side, Splunk announced Exposure Analytics for continuous risk scoring, Detection Studio for streamlined detection engineering, and Federated Search for investigating across distributed data environments. The zero-human-code engineering mandate AI Defense, the product Cisco launched a year before RSAC 2026, is now 100% built with AI. Zero lines of human-written code. By the end of 2026, half a dozen Cisco products will reach the same milestone. By the end of calendar year 2027, Patel's goal is 70% of Cisco's products built entirely by AI. "Just process that for a second and go: a $60 billion company is gonna have 70% of the products that are gonna have no human lines of code," Patel told VentureBeat. "The concept of a legacy company no longer exists." He connected that mandate to a cultural shift inside the engineering organization. "There's gonna be two kinds of people: ones that code with AI and ones that don't work at Cisco," Patel said. That was not debated. "Changing 30,000 people to change the way that they work at the very core of what they do in engineering cannot happen if you just make it a democratic process. It has to be something that's driven from the top down." Five moats for the agentic era, and what CISOs can verify today Patel laid out five strategic advantages that will separate winning enterprises from failing ones. VentureBeat mapped each moat against actions security teams can begin verifying today. Moat Patel's claim What CISOs can verify today What to validate next Sustained speed "Operating with extreme levels of obsession for speed for a durable length of time" creates compounding value Measure deployment velocity from pilot to production. Track how long agent governance reviews take. Pair speed metrics with telemetry coverage. Fast deployment without observability creates blind acceleration. Trust and delegation Trusted delegation separates market dominance from bankruptcy Audit delegation chains. Flag agent-to-agent handoffs with no human approval. Agent-to-agent trust verification is the next primitive the industry needs. OAuth, SAML, and MCP do not yet cover it. Token efficiency Higher output per token creates a strategic advantage Monitor token consumption per workflow. Benchmark cost-per-action across agent deployments. Token efficiency metrics exist. Token security metrics (what the token accessed, what it changed) are the next build. Human judgment "Just because you can code it doesn't mean you should." Track decision points where agents defer to humans vs. act autonomously. Invest in logging that distinguishes agent-initiated from human-initiated actions. Most configurations cannot yet. AI dexterity "10x to 20x to 50x productivity differential" between AI-fluent and non-fluent workers Measure the adoption rates of AI coding tools across security engineering teams. Pair dexterity training with governance training. One without the other compounds the risk. The telemetry layer the industry is still building Patel's framework operates at the identity and policy layer. The next layer down, telemetry, is where the verification happens. "It looks indistinguishable if an agent runs your web browser versus if you run your browser," CrowdStrike CTO Elia Zaitsev told VentureBeat in an exclusive interview at RSAC 2026. Distinguishing the two requires walking the process tree, tracing whether Chrome was launched by a human from the desktop or spawned by an agent in the background. Most enterprise logging configurations cannot make that distinction yet. A CEO's AI agent rewrote the company's security policy. Not because it was compromised. Because it wanted to fix a problem, lacked permissions, and removed the restriction itself. Every identity check passed. CrowdStrike CEO George Kurtz disclosed that incident and a second one at his RSAC keynote, both at Fortune 50 companies. In the second, a 100-agent Slack swarm delegated a code fix between agents without human approval. Both incidents were caught by accident Etay Maor, VP of Threat Intelligence at Cato Networks, told VentureBeat in a separate exclusive interview at RSAC 2026 that enterprises abandoned basic security principles when deploying agents. Maor ran a live Censys scan during the interview and counted nearly 500,000 internet-facing agent framework instances. The week before: 230,000. Doubling in seven days. Patel acknowledged the delegation risk in the interview. "The agent takes the wrong action and worse yet, some of those actions might be critical actions that are not reversible," he said. Cisco's Duo IAM and MCP gateway enforce policy at the identity layer. Zaitsev's work operates at the kinetic layer: tracking what the agent did after the identity check passed. Security teams need both. Identity without telemetry is a locked door with no camera. Telemetry without identity is footage with no suspect. Token generation as the currency for national competitiveness Patel sees the infrastructure layer as decisive. "Every country and every company in the world is gonna wanna make sure that they can generate their own tokens," he told VentureBeat. "Token generation becomes the currency for success in the future." Cisco's play is to provide the most secure and efficient technology for generating tokens at scale, with Nvidia supplying the GPU layer. The 48-hour Defense Claw integration demonstrated what that partnership produces under pressure. Security director action plan VentureBeat identified five steps security teams can take to begin building toward Patel's framework today: Audit the pilot-to-production gap. Cisco's own survey found 85% of enterprises piloting, 5% in production. Mapping the specific trust deficits keeping agents stuck is the starting point — the answer is rarely the technology. Governance, identity, and delegation controls are what's missing. Patel's trusted delegation framework is designed to close that gap. Test Defense Claw and AI Defense Explorer Edition. Both are free. Red-team your agent workflows before they reach production. Test the workflow, not just the model. Map delegation chains end-to-end. Flag every agent-to-agent handoff with no human approval. This is the "parenting" Patel described. No product fully automates it yet. Do it manually, every week. Establish agent behavioral baselines. Before any agent reaches production, define what normal looks like: API call patterns, data access frequency, systems touched, and hours of activity. Without a baseline, the observability that Patel's moats require has nothing to compare against. Close the telemetry gap in your logging configuration. Verify that your SIEM can distinguish agent-initiated actions from human-initiated actions. If it cannot, the identity layer alone will not catch the incidents Kurtz described at RSAC. Patel built the identity layer. The telemetry layer completes it.