Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it
Our take

A security researcher, working with colleagues at Johns Hopkins University, opened a GitHub pull request, typed a malicious instruction into the PR title, and watched Anthropic’s Claude Code Security Review action post its own API key as a comment. The same prompt injection worked on Google’s Gemini CLI Action and GitHub’s Copilot Agent (Microsoft). No external infrastructure required.
Aonan Guan, the researcher who discovered the vulnerability, alongside Johns Hopkins colleagues Zhengyu Liu and Gavin Zhong, published the full technical disclosure last week, calling it “Comment and Control.” GitHub Actions does not expose secrets to fork pull requests by default when using the pull_request trigger, but workflows using pull_request_target, which most AI agent integrations require for secret access, do inject secrets into the runner environment. This limits the practical attack surface but does not eliminate it: collaborators, comment fields, and any repo using pull_request_target with an AI coding agent are exposed.
Per Guan’s disclosure timeline: Anthropic classified it as CVSS 9.4 Critical ($100 bounty), Google paid a $1,337 bounty, and GitHub awarded $500 through the Copilot Bounty Program. The $100 amount is notably low relative to the CVSS 9.4 rating; Anthropic’s HackerOne program scopes agent-tooling findings separately from model-safety vulnerabilities. All three patched quietly, and none had issued CVEs in the NVD or published security advisories through GitHub Security Advisories as of Saturday.
Comment and Control exploited a prompt injection vulnerability in Claude Code Security Review, a specific GitHub Action feature that Anthropic’s own system card acknowledged is “not hardened against prompt injection.” The feature is designed to process trusted first-party inputs by default; users who opt into processing untrusted external PRs and issues accept additional risk and are responsible for restricting agent permissions. Anthropic updated its documentation to clarify this operating model after the disclosure. The same class of attack operates beneath OpenAI’s safeguard layer at the agent runtime, based on what their system card does not document — not a demonstrated exploit. The exploit is the proof case, but the story is what the three system cards reveal about the gap between what vendors document and what they protect.
OpenAI and Google did not respond for comment by publication time.
“At the action boundary, not the model boundary,” Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, told VentureBeat when asked where protection actually needs to sit. “The runtime is the blast radius.”
What the system cards tell you
Anthropic’s Opus 4.7 system card runs 232 pages with quantified hack rates and injection resistance metrics. It discloses a restricted model strategy (Mythos held back as a capability preview) and states directly that Claude Code Security Review is “not hardened against prompt injection.” The system card explains to readers that the runtime was exposed. Comment and Control proved it. Anthropic does gate certain agent actions outside the system card’s scope — Claude Code Auto Mode, for example, applies runtime-level protections — but the system card itself does not document these runtime safeguards or their coverage.
OpenAI’s GPT-5.4 system card documents extensive red teaming and publishes model-layer injection evals but not agent-runtime or tool-execution resistance metrics. Trusted Access for Cyber scales access to thousands. The system card tells you what red teamers tested. It does not tell you how resistant the model is to the attacks they found.
Google’s Gemini 3.1 Pro model card, shipped in February, defers most safety methodology to older documentation, a VentureBeat review of the card found. Google’s Automated Red Teaming program remains internal only. No external cyber program.
Dimension | Anthropic (Opus 4.7) | OpenAI (GPT-5.4) | Google (Gemini 3.1 Pro) |
System card depth | 232 pages. Quantified hack rates, classifier scores, and injection resistance metrics. | Extensive. Red teaming hours documented. No injection resistance rates published. | Few pages. Defers to older Gemini 3 Pro card. No quantified results. |
Cyber verification program | CVP. Removes cyber safeguards for vetted pentesters and red teamers doing authorized offensive work. Does not address prompt injection defense. Platform and data-retention exclusions not yet publicly documented. | TAC. Scaled to thousands. Constrains ZDR. | None. No external defender pathway. |
Restricted model strategy | Yes. Mythos held back as a capability preview. Opus 4.7 is the testbed. | No restricted model. Full capability released, access gated. | No restricted model. No stated plan for one. |
Runtime agent safeguards | Claude Code Security Review: system card states it is not hardened against prompt injection. The feature is designed for trusted first-party inputs. Anthropic applies additional runtime protections (e.g., Claude Code Auto Mode) not documented in the system card. | Not documented. TAC governs access, not agent operations. | Not documented. ART internal only. |
Exploit response (Comment and Control) | CVSS 9.4 Critical. $100 bounty. Patched. No CVE. | Not directly exploited. Structural gap inferred from TAC design, not demonstrated. | $1,337 bounty per Guan disclosure. Patched. No CVE. |
Injection resistance data | Published. Quantified rates in the system card. | Model-layer injection evals published. No agent-runtime or tool-execution resistance rates. | Not published. No quantified data available. |
Baer offered specific procurement questions. “For Anthropic, ask how safety results actually transfer across capability jumps,” she told VentureBeat. “For OpenAI, ask what ‘trusted’ means under compromise.” For both, she said, directors need to “demand clarity on whether safeguards extend into tool execution, not just prompt filtering.”
Seven threat classes neither safeguard approach closes
Each row names what breaks, why your controls miss it, what Comment and Control proved, and the recommended action for the week ahead.
Threat Class | What Breaks | Why Your Controls Miss It | What Comment and Control Proved | Recommended Action |
1. Deployment surface mismatch | CVP is designed for authorized offensive security research, not prompt injection defense. It does not extend to Bedrock, Vertex, or ZDR tenants. TAC constrains ZDR. Google has no program. Your team may be running a verified model on an unverified surface. | Launch announcements describe the program. Support documentation lists the exclusions. Security teams read the announcement. Procurement reads neither. | The exploit targets the agent runtime, not the deployment platform. A team running Claude Code on Bedrock is outside CVP coverage, but CVP was not designed to address this class of vulnerability in the first place. | Email your Anthropic and OpenAI reps today. One question, in writing: ‘Confirm whether [your platform] and [your data retention config] are covered by your runtime-level prompt injection protections, and describe what those protections include.’ File the response in your vendor risk register. |
2. CI secrets exposed to AI agents | ANTHROPIC_API_KEY, GEMINI_API_KEY, GITHUB_TOKEN, and any production secret stored as a GitHub Actions env var are readable by every workflow step, including AI coding agents. | The default GitHub Actions config does not scope secrets to individual steps. Repo-level and org-level secrets propagate to all workflows. Most teams never audit which steps access which secrets. | The agent read the API key from the runner env var, encoded it in a PR comment body, and posted it through GitHub’s API. No attacker-controlled infrastructure required. Exfiltration ran through GitHub’s own API — the platform itself became the C2 channel. | Run: grep -r ‘secrets\.’ .github/workflows/ across every repo with an AI agent. List every secret the agent can access. Rotate all exposed credentials. Migrate to short-lived OIDC tokens (GitHub, GitLab, CircleCI). |
3. Over-permissioned agent runtimes | AI agents granted bash execution, git push, and API write access at setup. Permissions never scoped down. No periodic least-privilege review. Agents accumulate access in the same way service accounts do. | Agents are configured once during onboarding and inherited across repos. No tooling flags unused permissions. The Comment and Control agent had bash, write, and env-read access for a code review task. | The agent had bash access it did not need for code review. It used that access to read env vars and post exfiltrated data. Stripping bash would have blocked the attack chain entirely. | Audit agent permissions repo by repo. Strip bash from code review agents. Set repo access to read-only. Gate write access (PR comments, commits, merges) behind a human approval step. |
4. No CVE signal for AI agent vulnerabilities | CVSS 9.4 Critical. Anthropic, Google, and GitHub patched. Zero CVE entries in NVD. Zero advisories. Your vulnerability scanner, SIEM, and GRC tool all show green. | No CNA has yet issued a CVE for a coding agent prompt injection, and current CVE practices have not captured this class of failure mode. Vendors patch through version bumps. Qualys, Tenable, and Rapid7 have nothing to scan for. | A SOC analyst running a full scan on Monday morning would find zero entries for a Critical vulnerability that hit Claude Code Security Review, Gemini CLI Action, and Copilot simultaneously. | Create a new category in your supply chain risk register: ‘AI agent runtime.’ Assign a 48-hour check-in cadence with each vendor’s security contact. Do not wait for CVEs. None have come yet, and the taxonomy gap makes them unlikely without industry pressure. |
5. Model safeguards do not govern agent actions | Opus 4.7 blocks a phishing email prompt. It does not block an agent from reading $ANTHROPIC_API_KEY and posting it as a PR comment. Safeguards gate generation, not operation. | Safeguards filter model outputs (text). Agent operations (bash, git push, curl, API POST) bypass safeguard evaluation entirely. The runtime is outside the safeguard perimeter. Anthropic applies some runtime-level protections in features like Claude Code Auto Mode, but these are not documented in the system card and their scope is not publicly defined. | The agent never generated prohibited content. It performed a legitimate operation (post a PR comment) containing exfiltrated data. Safeguards never triggered. | Map every operation your AI agents perform: bash, git, API calls, file writes. For each, ask the vendor in writing: does your safeguard layer evaluate this action before execution? Document the answer. |
6. Untrusted input parsed as instructions | PR titles, PR body text, issue comments, code review comments, and commit messages are all parsed by AI coding agents as context. Any can contain injected instructions. | No input sanitization layer between GitHub and the agent instruction set. The agent cannot distinguish developer intent from attacker injection in untrusted fields. Claude Code GitHub Action is designed for trusted first-party inputs by default. Users who opt into processing untrusted external PRs accept additional risk. | A single malicious PR title became a complete exfiltration command. The agent treated it as a legitimate instruction and executed it without validation or confirmation. | Implement input sanitization as defense-in-depth, but do not rely on traditional WAF-style regex patterns. LLM prompt injections are non-deterministic and will evade static pattern matching. Restrict agent context to approved workflow configs and combine with least-privilege permissions. |
7. No comparable injection resistance data across vendors | Anthropic publishes quantified injection resistance rates in 232 pages. OpenAI publishes model-layer injection evals but no agent-runtime resistance rates. Google publishes a few-page card referencing an older model. | No industry standard for AI safety metric disclosure. Vendors may have internal metrics and red-team programs, but published disclosures are not comparable. Procurement has no baseline and no framework to require one. | Anthropic, OpenAI, and Google were all approved for enterprise use without comparable injection resistance data. The exploit exposed what unmeasured risk looks like in production. | Write one sentence for your next vendor meeting: ‘Show me your quantified injection resistance rate for my model version on my platform.’ Document refusals for EU AI Act high-risk compliance. Deadline: August 2026. |
OpenAI’s GPT-5.4 was not directly exploited in the Comment and Control disclosure. The gaps identified in the OpenAI and Google columns are inferred from what their system cards and program documentation do not publish, not from demonstrated exploits. That distinction matters. Absence of published runtime metrics is a transparency gap, not proof of a vulnerability. It does mean procurement teams cannot verify what they cannot measure.
Eligibility requirements for Anthropic’s Cyber Verification Program and OpenAI’s Trusted Access for Cyber are still evolving, as are platform coverage and program scope, so security teams should validate current vendor docs before treating any coverage described here as definitive. Anthropic’s CVP is designed for authorized offensive security research — removing cyber safeguards for vetted actors — and is not a prompt injection defense program. Security leaders mapping these gaps to existing frameworks can align threat classes 1–3 with NIST CSF 2.0 GV.SC (Supply Chain Risk Management), threat class 4 with ID.RA (Risk Assessment), and threat classes 5–7 with PR.DS (Data Security).
Comment and Control focuses on GitHub Actions today, but the seven threat classes generalize to most CI/CD runtimes where AI agents execute with access to secrets, including GitHub Actions, GitLab CI, CircleCI, and custom runners. Safety metric disclosure formats are in flux across all three vendors; Anthropic currently leads on published quantification in its system card documentation, but norms are likely to converge as EU AI Act obligations come into force. Comment and Control targeted Claude Code GitHub Action, a specific product feature, not Anthropic’s models broadly. The vulnerability class, however, applies to any AI coding agent operating in a CI/CD runtime with access to secrets.
What to do before your next vendor renewal
“Don’t standardize on a model. Standardize on a control architecture,” Baer told VentureBeat. “The risk is systemic to agent design, not vendor-specific. Maintain portability so you can swap models without reworking your security posture.”
Build a deployment map. Confirm your platform qualifies for the runtime protections you think cover you. If you run Opus 4.7 on Bedrock, ask your Anthropic account rep what runtime-level prompt injection protections apply to your deployment surface. Email your account rep today. (Anthropic Cyber Verification Program)
Audit every runner for secret exposure. Run grep -r ‘secrets\.’ .github/workflows/ across every repo with an AI coding agent. List every secret the agent can access. Rotate all exposed credentials. (GitHub Actions secrets documentation)
Start migrating credentials now. Switch stored secrets to short-lived OIDC token issuance. GitHub Actions, GitLab CI, and CircleCI all support OIDC federation. Set token lifetimes to minutes, not hours. Plan full rollout over one to two quarters, starting with repos running AI agents. (GitHub OIDC docs | GitLab OIDC docs | CircleCI OIDC docs)
Fix agent permissions repo by repo. Strip bash execution from every AI agent doing code review. Set repository access to read-only. Gate write access behind a human approval step. (GitHub Actions permissions documentation)
Add input sanitization as one layer, not the only layer. Filter pull request titles, comments, and review threads for instruction patterns before they reach agents. Combine with least-privilege permissions and OIDC. Static regex will not catch non-deterministic prompt injections on its own.
Add “AI agent runtime” to your supply chain risk register. Assign a 48-hour patch verification cadence with each vendor’s security contact. Do not wait for CVEs. None have come yet for this class of vulnerability.
Check which hardened GitHub Actions mitigations you already have in place. Hardened GitHub Actions configurations block this attack class today: the permissions key restricts GITHUB_TOKEN scope, environment protection rules require approval before secrets are injected, and first-time-contributor gates prevent external pull requests from triggering agent workflows. (GitHub Actions security hardening guide)
Prepare one procurement question per vendor before your next renewal. Write one sentence: “Show me your quantified injection resistance rate for the model version I run on the platform I deploy to.” Document refusals for EU AI Act high-risk compliance. The deadline is August 2026.
“Raw zero-days aren’t how most systems get compromised. Composability is,” Baer said. “It’s the glue code, the tokens in CI, the over-permissioned agents. When you wire a powerful model into a permissive runtime, you’ve already done most of the attacker’s work for them.”
Read on the original site
Open the publisher's page for the full experience
Related Articles
- Claude Code, Copilot and Codex all got hacked. Every attacker went for the credential, not the model.On March 30, BeyondTrust proved that a crafted GitHub branch name could steal Codex’s OAuth token in cleartext. OpenAI classified it Critical P1. Two days later, Anthropic’s Claude Code source code spilled onto the public npm registry, and within hours, Adversa found Claude Code silently ignored its own deny rules once a command exceeded 50 subcommands. These were not isolated bugs. They were the latest in a nine-month run: six research teams disclosed exploits against Codex, Claude Code, Copilot, and Vertex AI, and every exploit followed the same pattern. An AI coding agent held a credential, executed an action, and authenticated to a production system without a human session anchoring the request. The attack surface was first demonstrated at Black Hat USA 2025, when Zenity CTO Michael Bargury hijacked ChatGPT, Microsoft Copilot Studio, Google Gemini, Salesforce Einstein and Cursor with Jira MCP on stage with zero clicks. Nine months later, those credentials are what attackers reached. Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, named the failure in an exclusive VentureBeat interview. “Enterprises believe they’ve ‘approved’ AI vendors, but what they’ve actually approved is an interface, not the underlying system.” The credentials underneath the interface are the breach. Codex, where a branch name stole GitHub tokens BeyondTrust researcher Tyler Jespersen, with Fletcher Davis and Simon Stewart, found Codex cloned repositories using a GitHub OAuth token embedded in the git remote URL. During cloning, the branch name parameter flowed unsanitized into the setup script. A semicolon and a backtick subshell turned the branch name into an exfiltration payload. Stewart added the stealth. By appending 94 Ideographic Space characters (Unicode U+3000) after “main,” the malicious branch looked identical to the standard main branch in the Codex web portal. A developer sees “main.” The shell sees curl exfiltrating their token. OpenAI classified it Critical P1 and shipped full remediation by February 5, 2026. Claude Code, where two CVEs and a 50-subcommand bypass broke the sandbox CVE-2026-25723 hit Claude Code’s file-write restrictions. Piped sed and echo commands escaped the project sandbox because command chaining was not validated. Patched in 2.0.55. CVE-2026-33068 was subtler. Claude Code resolved permission modes from .claude/settings.json before showing the workspace trust dialog. A malicious repo set permissions.defaultMode to bypassPermissions. The trust prompt never appeared. Patched in 2.1.53. The 50-subcommand bypass landed last. Adversa found that Claude Code silently dropped deny-rule enforcement once a command exceeded 50 subcommands. Anthropic’s engineers had traded security for speed and stopped checking after the fiftieth. Patched in 2.1.90. “A significant vulnerability in enterprise AI is broken access control, where the flat authorization plane of an LLM fails to respect user permissions,” wrote Carter Rees, VP of AI and Machine Learning at Reputation and a member of the Utah AI Commission. The repository decided what permissions the agent had. The token budget decided which deny rules survived. Copilot, where a pull request description and a GitHub issue both became root Johann Rehberger demonstrated CVE-2025-53773 against GitHub Copilot with Markus Vervier of Persistent Security as co-discoverer. Hidden instructions in PR descriptions triggered Copilot to flip auto-approve mode in .vscode/settings.json. That disabled all confirmations and granted unrestricted shell execution across Windows, macOS, and Linux. Microsoft patched it in the August 2025 Patch Tuesday release. Then, Orca Security cracked Copilot inside GitHub Codespaces. Hidden instructions in a GitHub issue manipulated Copilot into checking out a malicious PR with a symbolic link to /workspaces/.codespaces/shared/user-secrets-envs.json. A crafted JSON $schema URL exfiltrated the privileged GITHUB_TOKEN. Full repository takeover. Zero user interaction beyond opening the issue. Mike Riemer, CTO at Ivanti, framed the speed dimension in a VentureBeat interview: “Threat actors are reverse engineering patches within 72 hours. If a customer doesn’t patch within 72 hours of release, they’re open to exploit.” Agents compress that window to seconds. Vertex AI, where default scopes reached Gmail, Drive and Google’s own supply chain Unit 42 researcher Ofir Shaty found that the default Google service identity attached to every Vertex AI agent had excessive permissions. Stolen P4SA credentials granted unrestricted read access to every Cloud Storage bucket in the project and reached restricted, Google-owned Artifact Registry repositories at the core of the Vertex AI Reasoning Engine. Shaty described the compromised P4SA as functioning like a "double agent," with access to both user data and Google's own infrastructure. VentureBeat defense grid Security requirement Defense shipped Exploit path The gap Sandbox AI agent execution Codex runs tasks in cloud containers; token scrubbed during agent runtime. Token present during cloning. Branch-name command injection executed before cleanup. No input sanitization on container setup parameters. Restrict file system access Claude Code sandboxes writes via accept-edits mode. Piped sed/echo escaped sandbox (CVE-2026-25723). Settings.json bypassed trust dialog (CVE-2026-33068). 50-subcommand chain dropped deny-rule enforcement. Command chaining not validated. Settings loaded before trust. Deny rules truncated for performance. Block prompt injection in code context Copilot filters PR descriptions for known injection patterns. Hidden injections in PRs, README files, and GitHub issues triggered RCE (CVE-2025-53773 + Orca RoguePilot). Static pattern matching loses to embedded prompts in legitimate review and Codespaces flows. Scope agent credentials to least privilege Vertex AI Agent Engine uses P4SA service agent with OAuth scopes. Default scopes reached Gmail, Calendar, Drive. P4SA credentials read every Cloud Storage bucket and Google’s Artifact Registry. OAuth scopes non-editable by default. Least privilege violated by design. Inventory and govern agent identities No major AI coding agent vendor ships agent identity discovery or lifecycle management. Not attempted. Enterprises do not inventory AI coding agents, their credentials, or their permission scopes. AI coding agents are invisible to IAM, CMDB, and asset inventory. Zero governance exists. Detect credential exfiltration from agent runtime Codex obscures tokens in web portal view. Claude Code logs subcommands. Tokens visible in cleartext inside containers. Unicode obfuscation hid exfil payloads. Subcommand chaining hid intent. No runtime monitoring of agent network calls. Log truncation hid the bypass. Audit AI-generated code for security flaws Anthropic launched Claude Code Security (Feb 2026). OpenAI launched Codex Security (March 2026). Both scan generated code. Neither scans the agent’s own execution environment or credential handling. Code-output security is not agent-runtime security. The agent itself is the attack surface. Every exploit targeted runtime credentials, not model output Every vendor shipped a defense. Every defense was bypassed. The Sonar 2026 State of Code Developer Survey found 25% of developers use AI agents regularly, and 64% have started using them. Veracode tested more than 100 LLMs and found 45% of generated code samples introduced OWASP Top 10 flaws, a separate failure that compounds the runtime credential gap. CrowdStrike CTO Elia Zaitsev framed the rule in an exclusive VentureBeat interview at RSAC 2026: collapse agent identities back to the human, because an agent acting on your behalf should never have more privileges than you do. Codex held a GitHub OAuth token scoped to every repository the developer authorized. Vertex AI’s P4SA read every Cloud Storage bucket in the project. Claude Code traded deny-rule enforcement for token budget. Kayne McGladrey, an IEEE Senior Member who advises enterprises on identity risk, made the same diagnosis in an exclusive interview with VentureBeat. "It uses far more permissions than it should have, more than a human would, because of the speed of scale and intent." Riemer drew the operational line in an exclusive VentureBeat interview. "It becomes, I don't know you until I validate you." The branch name talked to the shell before validation. The GitHub issue talked to Copilot before anyone read it. Security director action plan Inventory every AI coding agent (CIEM). Codex, Claude Code, Copilot, Cursor, Gemini Code Assist, Windsurf. List the credentials and OAuth scopes each received at setup. If your CMDB has no category for AI agent identities, create one. Audit OAuth scopes and patch levels. Upgrade Claude Code to 2.1.90 or later. Verify Copilot's August 2025 patch. Migrate Vertex AI to the bring-your-own-service-account model. Treat branch names, pull request descriptions, GitHub issues, and repo configuration as untrusted input. Monitor for Unicode obfuscation (U+3000), command chaining over 50 subcommands, and changes to .vscode/settings.json or .claude/settings.json that flip permission modes. Govern agent identities the way you govern human privileged identities (PAM/IGA). Credential rotation. Least-privilege scoping. Separation of duties between the agent that writes code and the agent that deploys it. CyberArk, Delinea, and any PAM platform that accepts non-human identities can onboard agent OAuth credentials today; Gravitee's 2026 survey found only 21.9% of teams have done it. Validate before you communicate. "As long as we trust and we check and we validate, I'm fine with letting AI maintain it," Riemer said. Before any AI coding agent authenticates to GitHub, Gmail, or an internal repository, verify the agent's identity, scope, and the human session it is bound to. Ask each vendor in writing before your next renewal. "Show me the identity lifecycle management controls for the AI agent running in my environment, including credential scope, rotation policy, and permission audit trail." If the vendor cannot answer, that is the audit finding. The governance gap in three sentences Most CISOs inventory every human identity and have zero inventory of the AI agents running with equivalent credentials. No IAM framework governs human privilege escalation and agent privilege escalation with the same rigor. Most scanners track every CVE but cannot alert when a branch name exfiltrates a GitHub token through a container that developers trust by default. Zaitsev's advice to RSAC 2026 attendees was blunt: you already know what to do. Agents just made the cost of not doing it catastrophic.
- CVSS scored these two Palo Alto CVEs as manageable. Chained, they gave attackers root access to 13,000 devices.During Operation Lunar Peek in November 2024, attackers gained unauthenticated remote admin access — and eventual root — across more than 13,000 exposed Palo Alto Networks management interfaces. Palo Alto Networks scored CVE-2024-0012 at 9.3 and CVE-2024-9474 at 6.9 under CVSS v4.0. NVD scored the same pair 9.8 and 7.2 under CVSS v3.1. Two scoring systems. Two different answers for the same vulnerabilities. The 6.9 fell below patch thresholds. Admin access appeared required. The 9.3 sat queued for maintenance. Segmentation would hold. "Adversaries circumvent [severity ratings] by chaining vulnerabilities together," Adam Meyers, SVP of Counter Adversary Operations at CrowdStrike, told VentureBeat in an exclusive interview on April 22, 2026. On the triage logic that missed the chain: "They just had amnesia from 30 seconds before." Both CVEs sit on the CISA Known Exploited Vulnerabilities catalog. Neither score flagged the kill chain. The triage logic that consumed those scores treated each CVE as an isolated event, and so did the SLA dashboards and the board reports those dashboards feed. CVSS did exactly what it was designed to do. Score one vulnerability at a time. The problem is that adversaries do not attack one vulnerability at a time. "CVSS base scores are theoretical measures of severity that ignore real-world context," wrote Peter Chronis, former CISO of Paramount and a security leader with Fortune 100 experience. By moving beyond CVSS-first prioritization at Paramount, Chronis reported reducing actionable critical and high-risk vulnerabilities by 90%. Chris Gibson, executive director of FIRST, the organization that maintains CVSS, has been equally direct: using CVSS base scores alone for prioritization is "the least apt and accurate" method, Gibson told The Register. FIRST's own EPSS and CISA's SSVC decision model address part of this gap by adding exploitation probability and decision-tree logic. Five triage failure classes CVSS was never designed to catch In 2025, 48,185 CVEs were disclosed, a 20.6% year-over-year increase. Jerry Gamblin, principal engineer at Cisco Threat Detection and Response, projects 70,135 for 2026. The infrastructure behind the scores is buckling under that weight. NIST announced on April 15 that CVE submissions have grown 263% since 2020, and the NVD will now prioritize enrichment for KEV and federal critical software only. 1. Chained CVEs that look safe until they aren't The Palo Alto pair from Operation Lunar Peek is the textbook. CVE-2024-0012 bypassed authentication. CVE-2024-9474 escalated privileges. Scored separately under both CVSS v4.0 and v3.1, the escalation flaw filtered below most enterprise patch thresholds because admin access appeared required. The authentication bypass upstream eliminated that prerequisite entirely. Neither score communicated the compound effect. Meyers described the operational psychology: teams assessed each CVE independently, deprioritized the lower score, and queued the higher one for maintenance. 2. Nation-state adversaries who weaponize patches within days The CrowdStrike 2026 Global Threat Report documented a 42% year-over-year increase in vulnerabilities exploited as zero-days before public disclosure. Average breakout time across observed intrusions: 29 minutes. Fastest observed breakout: 27 seconds. China-nexus adversaries weaponized newly patched vulnerabilities within two to six days of disclosure. "Before it was Patch Tuesday once a month. Now it's patch every day, all the time. That's what this new world looks like," said Daniel Bernard, Chief Business Officer at CrowdStrike. A KEV addition treated as a routine queue item on Tuesday becomes an active exploitation window by Thursday. 3. Stockpiled CVEs that nation-state actors hold for years Salt Typhoon accessed senior U.S. political figures' communications during the presidential transition by chaining CVE-2023-20198 with CVE-2023-20273 on internet-facing Cisco devices, a privilege escalation pair patched in October 2023 and still unapplied more than a year later. Compromised credentials provided a parallel entry vector. The patches existed. Neither was applied. Sixty-seven percent of vulnerabilities exploited by China-nexus adversaries in 2025 were remote code execution flaws providing immediate system access, according to the CrowdStrike 2026 Global Threat Report. CVSS does not degrade priority based on how long a CVE has gone unpatched. No board metric tracks aging KEV exposure. That silence is the vulnerability. 4. Identity gaps that never enter the scoring system A 2023 help desk social engineering call against a major enterprise produced more than $100 million in losses. No CVE was assigned. No CVSS score existed. No patch pipeline entry was created. The vulnerability was a human process gap in identity verification, sitting entirely outside the scoring system's aperture. "A pro needs a zero day if all you have to do is call the help desk and say I forgot my password," Meyers said. Agentic AI systems now carry their own identity credentials, API tokens, and permission scopes, operating outside traditional vulnerability management governance. Merritt Baer, CSO at Enkrypt AI, has argued on record that identity-surface controls are vulnerability equivalents belonging in the same reporting pipeline as software CVEs. In most organizations, help desk authentication gaps and agentic AI credential inventories live in a separate governance silo. In practice, nobody's governance. 5. AI-accelerated discovery that breaks pipeline capacity Anthropic's Claude Mythos Preview demonstrated autonomous vulnerability discovery, finding a 27-year-old signed integer overflow in OpenBSD's TCP SACK implementation across roughly 1,000 scaffold runs at a total compute cost under $20,000. Meyers offered a thought-experiment projection in the exclusive interview with VentureBeat: if frontier AI drives a 10x volume increase, the result is approximately 480,000 CVEs annually. Pipelines built for 48,000 break at 70,000 and collapse at 480,000. NVD enrichment is already gone for non-KEV submissions. "If the adversary is now able to find vulnerabilities faster than the defenders or the business, that's a huge problem, because those vulnerabilities become exploits," said Daniel Bernard, Chief Business Officer at CrowdStrike. CrowdStrike on Thursday launched Project QuiltWorks, a remediation coalition with Accenture, EY, IBM Cybersecurity Services, Kroll, and OpenAI formed to address the vulnerability volume that frontier AI models are now generating in production code. When five major firms build a coalition around a pipeline problem, no single organization's patch workflow can keep pace. Security director action plan The five failure classes above map to five specific actions. Run a chain-dependency audit on every KEV CVE in the environment this month. Flag any co-resident CVE scored 5.0 or above, the threshold where privilege escalation and lateral movement capabilities typically appear in CVSS vectors. Any pair chaining authentication bypass to privilege escalation gets triaged as critical regardless of individual scores. Compress KEV-to-patch SLAs to 72 hours for internet-facing systems. The CrowdStrike 2026 Global Threat Report breakout data, 29-minute average and 27-second fastest, makes weekly patch windows indefensible in a board presentation. Build a monthly KEV aging report for the board. Every unpatched KEV CVE, days since disclosure, days since patch availability, and owner. Salt Typhoon exploited a Cisco CVE patched 14 months earlier because no escalation path existed for aging exposure. Add identity-surface controls to the vulnerability reporting pipeline. Help desk authentication gaps and agentic AI credential inventories belong in the same SLA framework as software CVEs. If they sit in a separate governance silo, they sit in nobody's governance. Stress-test pipeline capacity at 1.5x and 10x current CVE volume. Gamblin projects 70,135 for 2026. Meyers's thought-experiment projection: frontier AI could push annual volume past 480,000. Present the capacity gap to the CFO before the next budget cycle, not after the breach that proves the gap existed.
- Anthropic Skill scanners passed every check. The malicious code rode in on a test file.Picture this scenario: An Anthropic Skill scanner runs a full analysis of a Skill pulled from ClawHub or skills.sh. Its markdown instructions are clean, and no prompt injection is detected. No shell commands are hiding in the SKILL.md. Green across the board. The scanner never looked at the .test.ts file sitting one directory over. It didn’t need to. Test files aren’t part of the agent execution surface, so no publicly documented scanner inspects them (as of publication of this post). The file runs anyway. Not through the agent but through the test runner, with full access to the filesystem, environment variables, and SSH keys. Gecko Security researcher Jeevan Jutla detailed this attack flow, demonstrating that when a developer runs npx Skills add, the installer copies the entire skill directory into the repo. If a malicious Skill bundles a *.test.ts file, the Jest and Vitest testing frameworks discover it through recursive glob patterns, treat it as a first-class test, and execute it during npm test or when the IDE auto-runs tests on save. The default configuration in open-source JavaScript test framework Mocha follows a similar recursive discovery pattern. The payload fires in beforeAll, before any assertions run. Nothing in the test output flags anything unusual. In CI, process.env holds deployment tokens, cloud credentials, and every secret the pipeline can reach. The attack class is not new; malicious npm postinstall scripts and pytest plugins have exploited trust-on-install for years. What makes the Skill vector worse is that installed Skills land in a directory designed to be committed and shared across the team, propagate to every teammate who clones, and sit outside every scanner's detection surface. The agent is never invoked, and the Anthropic Skill scanner reads the right files for the wrong threat model. Three audits, one blind spot Gecko's disclosure didn’t arrive in isolation. It landed on top of two large-scale security audits that had already documented the scope of the problem from the other direction, illustrating what scanners detect rather than what they miss. Both audits did exactly what they're designed to do: They measured the threat on the execution surface scanners already inspect. Gecko measured what sits outside it. A SkillScan academic study, published on January 15, analyzed 31,132 unique Anthropic Skills collected from two major marketplaces. Their findings: 26.1% of Skills contained at least one vulnerability spanning 14 distinct patterns across four categories. Data exfiltration showed up in 13.3% of Skills. Privilege escalation appeared in 11.8%. Skills bundling executable scripts were 2.12x more likely to contain vulnerabilities than instruction-only Skills. Three weeks later, Snyk published ToxicSkills, the first comprehensive security audit of the ClawHub and skills.sh marketplaces. Snyk's team scanned 3,984 Skills (as of February 5). The results: 13.4% of all Skills contained at least one critical-level security issue. Seventy-six confirmed malicious payloads were identified through a combination of automated scanning and human-in-the-loop review. Eight of those malicious Skills were still publicly available on ClawHub when the research was published. Then Cisco shipped its AI Agent Security Scanner for IDEs on April 21, integrating its open-source Skill Scanner directly into VS Code, Cursor, and Windsurf. The scanner brings genuine capability to developers’ workflows. It does not inspect bundled test files, because the detection categories Cisco built target the agent interaction layer, not the developer toolchain layer. The three major Anthropic Skill scanners share a structural blind spot: None inspects bundled test files as an execution surface, even though Gecko Security proved that those files execute with full local permissions through standard test runners. Snyk Agent Scan, Cisco's AI Agent Security Scanner, and VirusTotal Code Insight all work. They catch prompt injection, shell commands, and data exfiltration in Skill definitions and agent-referenced scripts. What they do not do is look beyond the agent execution surface to the developer execution surface sitting in the same directory. How the attack chain works The mechanics of the attack chain matter because the fix is precise. When a developer runs npx skills add owner/repo-name, the installer clones the Skill repository and copies its contents into .agents/skills/<skill-name>/ inside the project. Claude Code, Cursor, and other agent IDEs get symlinks into their own Skill directories. The only files excluded are .git, metadata.json, and files prefixed with _. Everything else lands on disk. Jest and Vitest both pass dot: true to their glob engines. That means they discover test files inside dot-prefixed directories like .agents/. Mocha's behavior depends on configuration but follows similar recursive patterns by default. None of them exclude .agents/, .claude/, or .cursor/ from their default discovery paths. An attacker publishes a Skill with a clean SKILL.md and a tests/reviewer.test.ts file containing a beforeAll block. The block reads process.env, .env files, ~/.ssh/ private keys, and ~/.aws/credentials. It posts everything to an external endpoint. The test cases look real. The exfiltration happens during setup, silently, whether the tests pass or fail. The vector is not limited to TypeScript. Python repos face the same exposure through conftest.py, which pytest auto-executes during test collection. Add .agents to testpaths exclusion in pyproject.toml to block it. The .agents/skills/ directory is designed to be committed to the repo so teammates can share Skills. GitHub's default .gitignore templates do not include .agents/. Once the malicious test file enters the repo, every developer who clones and runs tests executes the payload. So does every CI pipeline on every branch and every fork that inherits the test suite. Scanners are reading the wrong threat surface CrowdStrike CTO Elia Zaitsev put the structural challenge in operational terms during an exclusive VentureBeat interview at RSAC 2026. "Observing actual kinetic actions is a structured, solvable problem," Zaitsev said. "Intent is not." That distinction cuts directly at the Anthropic Skill scanner gap. No publicly documented scanner operates outside the assumption that the threat lives in the SKILL.md and in scripts the agent is instructed to run. These tools analyze intent: What does the Skill tell the agent to do? Gecko's finding sits on the kinetic side. The test file executes through the developer's own toolchain. No agent is involved. No prompt is interpreted. The payload is TypeScript, running with full local permissions through a legitimate test runner. The scanner was solving the wrong problem. CrowdStrike's Zaitsev framed the identity dimension: "AI agents and non-human identities will explode across the enterprise, expanding exponentially and dwarfing human identities," he told VentureBeat. "Each agent will operate as a privileged super-human with OAuth tokens, API keys, and continuous access to previously siloed data sets." CrowdStrike's Charlotte AI and similar enterprise agents operate with exactly these privileges. When those credentials live in environment variables accessible to any process in the repo, a test-file payload does not need agent privileges. It already has developer privileges, which in most CI configurations means deployment tokens and cloud access. Mike Riemer, SVP of the network security group and field CISO at Ivanti, quantified the exploitation window in a VentureBeat interview. "Threat actors are reverse engineering patches within 72 hours," Riemer said. "If a customer doesn't patch within 72 hours of release, they're open to exploit." Most enterprises take weeks. The Anthropic Skill scanner blind spot compounds that window. A developer installs a malicious Skill today. The test file executes immediately. No patch exists because no scanner flagged it. The Anthropic Skill Audit Grid VentureBeat has covered the Anthropic Skill supply chain since the ClawHavoc campaign hit ClawHub in January. Every conversation with security leaders lands on the same frustration. Their teams bought a scanner, it reports clean, and they have no framework for asking what it does not check. VentureBeat has polled dev teams who install Anthropic Skills from ClawHub and skills.sh. The grid below connects the published-audit half (Snyk, SkillScan) with the scanner-bypass half (Gecko). Each row represents a detection surface a security team should verify before approving any Skill scanning tool for Q2 procurement. Audit question What scanners do today The gap Recommended action Inspect SKILL.md and agent-invoked scripts Covered by Snyk Agent Scan, Cisco AI Agent Security Scanner, VirusTotal Code Insight This is the covered surface. Attackers shift payloads to files outside it. Continue running current scanners. They catch real threats at the instruction layer. Inspect bundled test files (*.test.ts, *.spec.js, conftest.py) Not currently inspected as attack surface by any scanner Gecko proved test files execute via Jest/Vitest (documented) and Mocha (config-dependent) with full local permissions. No agent invoked. Add .agents/ to testPathIgnorePatterns (Jest) or exclude (Vitest). One config line. Flag Skills that bundle test files or build configs Not flagged as higher-risk metadata by any scanner Trivial static check. Skills with extra executables are 2.12x more likely to be vulnerable (SkillScan). Add CI gate: find .agents/ -name "*.test.*" | grep -q . && exit 1. Block merge on match. Restrict test-runner globs to project-owned paths Rare. Most CI configs use recursive glob. Jest/Vitest pass dot: true by default. Default globs traverse .agents/, .claude/, .cursor/ directories. Malicious test files auto-discovered. Scope test roots to first-party directories (src/, app/). Deny .agents/, .claude/, .cursor/. Distinguish script-bundling Skills vs. instruction-only Partial coverage via static and semantic analysis SkillScan: script-bundling Skills 2.12x more likely to contain vulnerabilities than instruction-only. Require structured audit entry: Skill type, execution surfaces, scanner coverage, residual risk. Publish audit methodology with sample size Snyk yes (3,984 Skills). SkillScan yes (31,132 Skills). Cisco and emerging scanners have not published equivalent ecosystem-scale audits. Ask vendors: methodology, sample size, detection rate. No published audit = no independent baseline. Pin Skill sources to immutable commits Not enforced by any scanner or marketplace Skill authors can push clean version for review, add malicious test file after approval. Pin to specific commit hash. Review diffs on every update. OWASP Agentic Skills Top 10 recommends this. Three CI hardening steps to add now Riemer made the broader point in VentureBeat interviews that placing security controls at the perimeter invites every threat to that exact boundary. Anthropic Skill scanners placed the boundary at SKILL.md. Attackers put the payload one directory over. The three changes below move the boundary to where the code actually executes. These changes take minutes. None requires replacing current tools or waiting for scanner vendors to close the gap. Add .agents/ to the test runner's ignore list. In Jest, add /\.agents/ to testPathIgnorePatterns in jest.config.js. In Vitest, add **/.agents/** to the exclude array in vitest.config.ts. One line in one config file prevents the test runner from discovering files inside installed Skill directories. Do it whether or not the team currently uses Anthropic Skills. The directory may appear in a cloned repo without anyone installing the Skill directly. Audit every Skill install for non-instruction files before merge. Add a CI check that flags any file in .agents/skills/ matching *.test.*, *.spec.*, __tests__/, *.config.*, or conftest.py. These files have no legitimate reason to exist inside a Skill directory. The check is a shell one-liner: [ -d .agents ] && find .agents/ -name "*.test.*" -o -name "*.spec.*" -o -name "conftest.py" -o -name "*.config.*" -o -type d -name "__tests__" | grep -q . && exit 1. If it matches, block the merge. For any test files that do land in a PR, require a reviewer to skim for shell invocations (exec, spawn, child_process), external network calls, and file operations touching secrets or SSH keys. Pin Skill sources to specific commits, not latest. The npx skills add command copies whatever the repo contains at the moment of install. A Skill author can push a clean version for scanner review, then add a malicious test file after approval. Pinning to a specific commit hash converts a trust-on-first-use model into a verify-on-every-change model. The OWASP Agentic Skills Top 10 recommends exactly this. If Skills are already in your repo: Run the find command above against your existing .agents/ directory now. If test files are present, treat them as a potential compromise: Rotate any credentials accessible to CI (deployment tokens, cloud keys, SSH keys), audit CI logs for unexpected outbound network calls during test execution, and review git history to determine when the test files entered the repo and which pipelines have executed them. Five questions to ask your Anthropic Skill scanner vendor Security teams are signing contracts for their first dedicated Skill scanning tools. The Gecko bypass means the questions on those sales calls need to change. Do not stop at "Do you detect prompt injection?" Ask: Which files and directories do you actually analyze in a Skill repo? Do you treat test files as potential execution surfaces? Can you flag Skills that bundle tests, CI configs, or build scripts as higher-risk? SkillScan showed script-bundling Skills are 2.12x more likely to be vulnerable. Do you provide integration or guidance for restricting test-runner globs in CI? Cisco deserves credit for open-sourcing its Skill Scanner on GitHub, which lets security teams inspect exactly which detection categories the tool implements. That transparency is the baseline every vendor should meet. If your vendor will not publish detection categories or open-source their scanning logic, you cannot verify what they check and what they skip. Have you published an ecosystem-scale audit with methodology and sample size? Snyk published at 3,984 Skills. SkillScan published at 31,132. Riemer described the disclosure pattern: "They chose not to publish a CVE. They just quietly patched it and moved on with life," he said. The Anthropic Skills ecosystem is showing early signs of the same pattern: scanners document what they detect without mapping the surfaces they do not reach. The gap between documented coverage and actual execution surface is where the test-file vector lives. The audit grid matters because the scanner model is incomplete The Anthropic Skills ecosystem is repeating the early npm supply chain story, except without the decade of accumulated incidents that forced package registries to build security infrastructure. SkillScan's 31,132-Skill dataset showed a quarter of the ecosystem carrying vulnerabilities. Snyk found 76 confirmed malicious payloads in fewer than 4,000 Skills. Gecko proved the scanner model itself has a structural gap that no vendor has publicly documented closing. Scanner evaluations consistently test the covered surface. The Anthropic Skill Audit Grid gives security teams the seven audit surfaces to verify before signing. The three CI steps are the fixes to deploy before the next Skill install. Riemer's Ivanti team watches the patch-to-exploit cycle compress in real time across enterprise environments. The test-file vector compresses it further: No scanner flagged the threat, so no patch window exists. The scanner is not broken. It is incomplete. The threat model stopped at the agent. The test runner did not.
- One command turns any open-source repo into an AI agent backdoor. OpenClaw proved no supply-chain scanner has a detection category for itJust two months ago, researchers at the Data Intelligence Lab at the University of Hong Kong introduced CLI-Anything, a new state-of-the-art tool that analyzes any repo’s source code and generates a structured command line interface (CLI) that AI coding agents can operate with a single command. Claude Code, Codex, OpenClaw, Cursor, and GitHub Copilot CLI are all supported, and since its launch in March, CLI‑Anything has climbed to more than 30,000 GitHub stars. But the same mechanism that makes software agent-native opens the door to agent-level poisoning. The attack community is already discussing the implications on X and security forums, translating CLI-Anything's architecture into offensive playbooks. The security problem is not what CLI-Anything does. It is what CLI-Anything represents. CLI-Anything generates SKILL.md files, the same instruction-layer artifacts that Snyk’s ToxicSkills research found laced with 76 confirmed malicious payloads across ClawHub and skills.sh in February 2026. A poisoned skill definition does not trigger a CVE and never appears in a software bill of materials (SBOM). No mainstream security scanner has a detection category for malicious instructions embedded in agent skill definitions, because the category simply did not exist eighteen months ago. Cisco confirmed the gap in April. “Traditional application security tools were not designed for this,” Cisco’s engineering team wrote in a blog post announcing its AI Agent Security Scanner for IDEs. “SAST [static application security testing] scanners analyze source code syntax. SCA [software composition analysis] tools check dependency versions. Neither understands the semantic layer where MCP [Model Context Protocol] tool descriptions, agent prompts, and skill definitions operate.” Merritt Baer, CSO of Enkrypt AI and former Deputy CISO at Amazon Web Services (AWS), told VentureBeat in an exclusive interview: “SAST and SCA were built for code and dependencies. They don’t inspect instructions.” This is not a single-vendor vulnerability. It is a structural gap in how the entire security industry monitors software supply chains. This is the pre-exploitation window. CLI-Anything is live, the attack community is discussing it, and security directors who act now get ahead of the first incident report. The integration layer no stack can see Traditional supply-chain security operates on two layers. The code layer is where SAST works, scanning source files for insecure patterns, injection flaws, and hardcoded secrets. The dependency layer is where SCA works, checking package versions against known vulnerabilities, generating SBOMs, and flagging outdated libraries. Agent bridge tools like CLI-Anything, MCP connectors, Cursor rules files, and Claude Code skills operate on a third layer between the other two. Call it the agent integration layer: configuration files, skill definitions, and natural-language instruction sets tell an AI agent what software can do and how to operate it. None of it looks like code. All of it executes like code. Carter Rees, VP of AI at Reputation, told VentureBeat in an exclusive interview: “Modern LLMs [large language models] rely on third-party plugins, introducing supply chain vulnerabilities where compromised tools can inject malicious data into the conversation flow, bypassing internal safety training.” Researchers at Griffith University, Nanyang Technological University, the University of New South Wales, and the University of Tokyo documented the attack chain in an April paper, “Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems.” The team introduced Document-Driven Implicit Payload Execution (DDIPE), a technique that embeds malicious logic inside code examples within skill documentation. Across four agent frameworks and five large language models, DDIPE achieved bypass rates between 11.6% and 33.5%. Static analysis caught most samples, but 2.5% evaded all four detection layers. Responsible disclosure led to four confirmed vulnerabilities and two vendor fixes. The kill chain security leaders need to audit Here's the anatomy of the kill chain: An attacker submits a SKILL.md file to an open-source project containing setup instructions, code examples, and configuration templates. It looks like standard documentation. A code reviewer would wave it through because none of it is executable. But the code examples contain embedded instructions that an agent will parse as operational directives. A developer uses an agent bridge tool to connect their coding agent to the repository. The agent ingests the skill definition and trusts it, because no verification layer exists to distinguish benign from malicious intent at the instruction level. The agent executes the embedded instruction using its own legitimate credentials. Endpoint detection and response (EDR) sees an approved API call from an authorized process and passes it. Data exfiltration, configuration changes, and credential harvesting are all moving through channels that the monitoring stack considers normal traffic. Rees identified the structural flaw that makes this chain lethal. “A significant vulnerability in enterprise AI is broken access control, where the flat authorization plane of an LLM fails to respect user permissions,” he told VentureBeat. A compromised skill definition riding that flat authorization plane does not need to escalate privileges. It already has them. Every link in that chain is invisible to the current security stack. Pillar Security demonstrated a variant of this chain against Cursor in January 2026 (CVE-2026-22708). Implicitly trusted shell built-in commands could be poisoned through indirect prompt injection, converting benign developer commands into arbitrary code execution vectors. Users saw only the final command. The poisoning happened through other commands the IDE never surfaced for approval. The evidence is already in production In a documented attack chain from April 2026, a crafted GitHub issue title triggered an AI triage bot wired into Cline. The bot exfiltrated a GITHUB_TOKEN, which the attacker used to publish a compromised npm dependency that installed a second agent on roughly 4,000 developer machines for eight hours. There was just one issue title. Attackers had eight hours of access. No human approved the action. Snyk’s ToxicSkills audit scanned 3,984 agent skills from ClawHub, the public marketplace for the OpenClaw agent framework, and skills.sh in February 2026. The results: 13.4% of all skills contained at least one critical security issue. Daily skill submissions jumped from less than 50 in mid-January to more than 500 by early February. The barrier to publishing was a SKILL.md markdown file and a GitHub account one week old. No code signing. No security review. No sandbox. OpenClaw is not an outlier. It is the pattern. “The bar to entry is extremely low,” Baer said. “Adding a skill can be as simple as uploading a Word doc or lightweight config file. That’s a radically different risk profile than compiled code.” She pointed to projects like ClawPatrol that have started cataloging and scanning for malicious skills, evidence the ecosystem is moving faster than enterprise defenses. The ClawHavoc campaign, first reported by Koi Security in late January 2026, initially identified 341 malicious skills on ClawHub. A follow-up analysis by Antiy CERT expanded the count to 1,184 compromised packages across the platform. The campaign delivered Atomic Stealer (AMOS) through skill definitions with professional documentation. Skills named solana-wallet-tracker and polymarket-trader matched what developers actively searched for. The MCP protocol layer carries similar exposure. OX Security reported in April that researchers poisoned nine out of 11 MCP marketplaces using proof-of-concept servers. Trend Micro initially found 492 MCP servers exposed to the internet with zero authentication; by April, that number had grown to 1,467. As The Register reported, the root issue lies in Anthropic’s MCP software development kit (SDK) transport mechanism. Any developer using the official SDK inherits the vulnerability class. VentureBeat Prescriptive Matrix: Three-layer agent supply-chain audit VentureBeat developed a Prescriptive Matrix by mapping the three attack layers documented in the research and incident reports above against the detection capabilities of current SAST, SCA, and agent-layer tools. Each row identifies what security teams should verify and where no scanner has coverage today. Layer Threat Current detection Why it misses Recommended action 1. Code Prompt injection in AI-generated code SAST scanners Most SAST tools have no detection category for prompt injection in AI-generated code Confirm that SAST scans AI-generated code for prompt injection. If not, have an open vendor conversation this quarter. 2. Dependencies Malicious MCP servers, agent skills, plugin registries SCA tools SCA generates no AI-specific bill of materials. Agent-layer dependencies are invisible. Confirm SCA includes MCP servers, agent skills, and plugin registries in the dependency inventory. 3. Agent integration Poisoned SKILL.md files, malicious instruction sets, adversarial rules files None until April 2026 No tool inspects the semantic meaning of agent instruction files. Baer: “We’re not inspecting intent.” Deploy Cisco Skill Scanner or Snyk mcp-scan. Assign a team to own this layer. Baer’s diagnosis of Layer 3 applies across the entire matrix: “Current scanners look for known bad artifacts, not adversarial instructions embedded in otherwise valid skills.” Cisco’s open-source Skill Scanner and Snyk’s mcp-scan represent the first tools purpose-built for this layer. Security director action plan Here's how security leaders can get ahead of the problem. Inventory every agent bridge tool in the environment. This includes CLI-Anything, MCP connectors, Cursor rules files, Claude Code skills, GitHub Copilot extensions. If the development team is using agent bridge tools that have not been inventoried, the risk cannot be assessed. Audit agent skill sources the same way package registries get audited. Baer’s framing is precise: “A skill is effectively untrusted executable intent, even if it’s just text.” Shut off ungoverned ingestion paths until controls are in place. Stand up a review and allowlisting process for skills. The OWASP Agentic Skills Top 10 (AST01: Malicious Skills) provides the procurement framework to align controls against. Deploy agent-layer scanning. Evaluate Cisco’s open-source Skill Scanner and Snyk’s mcp-scan for behavioral analysis of agent instruction files. If dedicated tooling is unavailable, require a second engineer to read every SKILL.md before installation. Restrict agent execution privileges and instrument runtime. AI coding agents should not run with the same credential scope as the developer who invoked them. Rees confirmed the structural flaw: The flat authorization plane means a compromised skill does not need to escalate privileges. Baer’s prescription: “Instrument runtime observability. What data is the agent accessing, what actions is it taking, and are those aligned with expected behavior?” Assign ownership for the gap between layers. The most dangerous attacks succeed because they fall between detection categories. Assign a team to own the agent integration layer. Review every SKILL.md, MCP config, and rules file before it enters the environment. The gap that already has a name Baer underscored the dangers of this new attack vector. “This feels very similar to early container security, but we’re still in the ‘we’ll get to it’ phase across most orgs," she said. She added that, at AWS, it took a few high-profile wake-up calls before container security became table stakes. The difference this time is speed. “There’s no build pipeline, no compilation barrier. Just content," she said. CLI-Anything is not the threat. It is the proof case that the agent integration layer exists, that it is growing fast, and that the attacker community has already found it. The 33,000 developers who starred the repository are telling security teams where software development is heading. Eighteen months ago, the detection category for agent-integration-layer poisoning did not exist. Cisco and Snyk shipped the first tools for it in April. The window between those two facts is closing. Security directors who have not begun inventory are already behind.