Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it

Our take

A recent security disclosure reveals a critical vulnerability in three AI coding agents, exposing sensitive secrets via a prompt injection attack. Researcher Aonan Guan, alongside colleagues from Johns Hopkins University, demonstrated how a single malicious instruction infiltrated Anthropic’s Claude Code Security Review, Google’s Gemini CLI Action, and GitHub’s Copilot Agent. This incident highlights systemic risks in AI agent design, particularly around access to secrets and the lack of comprehensive safeguards.

A security researcher, working with colleagues at Johns Hopkins University, opened a GitHub pull request, typed a malicious instruction into the PR title, and watched Anthropic’s Claude Code Security Review action post its own API key as a comment. The same prompt injection worked on Google’s Gemini CLI Action and GitHub’s Copilot Agent (Microsoft). No external infrastructure required.

Aonan Guan, the researcher who discovered the vulnerability, alongside Johns Hopkins colleagues Zhengyu Liu and Gavin Zhong, published the full technical disclosure last week, calling it “Comment and Control.” GitHub Actions does not expose secrets to fork pull requests by default when using the pull_request trigger, but workflows using pull_request_target, which most AI agent integrations require for secret access, do inject secrets into the runner environment. This limits the practical attack surface but does not eliminate it: collaborators, comment fields, and any repo using pull_request_target with an AI coding agent are exposed.

Per Guan’s disclosure timeline: Anthropic classified it as CVSS 9.4 Critical ($100 bounty), Google paid a $1,337 bounty, and GitHub awarded $500 through the Copilot Bounty Program. The $100 amount is notably low relative to the CVSS 9.4 rating; Anthropic’s HackerOne program scopes agent-tooling findings separately from model-safety vulnerabilities. All three patched quietly, and none had issued CVEs in the NVD or published security advisories through GitHub Security Advisories as of Saturday.

Comment and Control exploited a prompt injection vulnerability in Claude Code Security Review, a specific GitHub Action feature that Anthropic’s own system card acknowledged is “not hardened against prompt injection.” The feature is designed to process trusted first-party inputs by default; users who opt into processing untrusted external PRs and issues accept additional risk and are responsible for restricting agent permissions. Anthropic updated its documentation to clarify this operating model after the disclosure. The same class of attack operates beneath OpenAI’s safeguard layer at the agent runtime, based on what their system card does not document — not a demonstrated exploit. The exploit is the proof case, but the story is what the three system cards reveal about the gap between what vendors document and what they protect.

OpenAI and Google did not respond for comment by publication time.

“At the action boundary, not the model boundary,” Merritt Baer, CSO at Enkrypt AI and former Deputy CISO at AWS, told VentureBeat when asked where protection actually needs to sit. “The runtime is the blast radius.”

What the system cards tell you

Anthropic’s Opus 4.7 system card runs 232 pages with quantified hack rates and injection resistance metrics. It discloses a restricted model strategy (Mythos held back as a capability preview) and states directly that Claude Code Security Review is “not hardened against prompt injection.” The system card explains to readers that the runtime was exposed. Comment and Control proved it. Anthropic does gate certain agent actions outside the system card’s scope — Claude Code Auto Mode, for example, applies runtime-level protections — but the system card itself does not document these runtime safeguards or their coverage.

OpenAI’s GPT-5.4 system card documents extensive red teaming and publishes model-layer injection evals but not agent-runtime or tool-execution resistance metrics. Trusted Access for Cyber scales access to thousands. The system card tells you what red teamers tested. It does not tell you how resistant the model is to the attacks they found.

Google’s Gemini 3.1 Pro model card, shipped in February, defers most safety methodology to older documentation, a VentureBeat review of the card found. Google’s Automated Red Teaming program remains internal only. No external cyber program.

Dimension	Anthropic (Opus 4.7)	OpenAI (GPT-5.4)	Google (Gemini 3.1 Pro)
System card depth	232 pages. Quantified hack rates, classifier scores, and injection resistance metrics.	Extensive. Red teaming hours documented. No injection resistance rates published.	Few pages. Defers to older Gemini 3 Pro card. No quantified results.
Cyber verification program	CVP. Removes cyber safeguards for vetted pentesters and red teamers doing authorized offensive work. Does not address prompt injection defense. Platform and data-retention exclusions not yet publicly documented.	TAC. Scaled to thousands. Constrains ZDR.	None. No external defender pathway.
Restricted model strategy	Yes. Mythos held back as a capability preview. Opus 4.7 is the testbed.	No restricted model. Full capability released, access gated.	No restricted model. No stated plan for one.
Runtime agent safeguards	Claude Code Security Review: system card states it is not hardened against prompt injection. The feature is designed for trusted first-party inputs. Anthropic applies additional runtime protections (e.g., Claude Code Auto Mode) not documented in the system card.	Not documented. TAC governs access, not agent operations.	Not documented. ART internal only.
Exploit response (Comment and Control)	CVSS 9.4 Critical. $100 bounty. Patched. No CVE.	Not directly exploited. Structural gap inferred from TAC design, not demonstrated.	$1,337 bounty per Guan disclosure. Patched. No CVE.
Injection resistance data	Published. Quantified rates in the system card.	Model-layer injection evals published. No agent-runtime or tool-execution resistance rates.	Not published. No quantified data available.

Baer offered specific procurement questions. “For Anthropic, ask how safety results actually transfer across capability jumps,” she told VentureBeat. “For OpenAI, ask what ‘trusted’ means under compromise.” For both, she said, directors need to “demand clarity on whether safeguards extend into tool execution, not just prompt filtering.”

Seven threat classes neither safeguard approach closes

Each row names what breaks, why your controls miss it, what Comment and Control proved, and the recommended action for the week ahead.

Threat Class	What Breaks	Why Your Controls Miss It	What Comment and Control Proved	Recommended Action
1. Deployment surface mismatch	CVP is designed for authorized offensive security research, not prompt injection defense. It does not extend to Bedrock, Vertex, or ZDR tenants. TAC constrains ZDR. Google has no program. Your team may be running a verified model on an unverified surface.	Launch announcements describe the program. Support documentation lists the exclusions. Security teams read the announcement. Procurement reads neither.	The exploit targets the agent runtime, not the deployment platform. A team running Claude Code on Bedrock is outside CVP coverage, but CVP was not designed to address this class of vulnerability in the first place.	Email your Anthropic and OpenAI reps today. One question, in writing: ‘Confirm whether [your platform] and [your data retention config] are covered by your runtime-level prompt injection protections, and describe what those protections include.’ File the response in your vendor risk register.
2. CI secrets exposed to AI agents	ANTHROPIC_API_KEY, GEMINI_API_KEY, GITHUB_TOKEN, and any production secret stored as a GitHub Actions env var are readable by every workflow step, including AI coding agents.	The default GitHub Actions config does not scope secrets to individual steps. Repo-level and org-level secrets propagate to all workflows. Most teams never audit which steps access which secrets.	The agent read the API key from the runner env var, encoded it in a PR comment body, and posted it through GitHub’s API. No attacker-controlled infrastructure required. Exfiltration ran through GitHub’s own API — the platform itself became the C2 channel.	Run: grep -r ‘secrets\.’ .github/workflows/ across every repo with an AI agent. List every secret the agent can access. Rotate all exposed credentials. Migrate to short-lived OIDC tokens (GitHub, GitLab, CircleCI).
3. Over-permissioned agent runtimes	AI agents granted bash execution, git push, and API write access at setup. Permissions never scoped down. No periodic least-privilege review. Agents accumulate access in the same way service accounts do.	Agents are configured once during onboarding and inherited across repos. No tooling flags unused permissions. The Comment and Control agent had bash, write, and env-read access for a code review task.	The agent had bash access it did not need for code review. It used that access to read env vars and post exfiltrated data. Stripping bash would have blocked the attack chain entirely.	Audit agent permissions repo by repo. Strip bash from code review agents. Set repo access to read-only. Gate write access (PR comments, commits, merges) behind a human approval step.
4. No CVE signal for AI agent vulnerabilities	CVSS 9.4 Critical. Anthropic, Google, and GitHub patched. Zero CVE entries in NVD. Zero advisories. Your vulnerability scanner, SIEM, and GRC tool all show green.	No CNA has yet issued a CVE for a coding agent prompt injection, and current CVE practices have not captured this class of failure mode. Vendors patch through version bumps. Qualys, Tenable, and Rapid7 have nothing to scan for.	A SOC analyst running a full scan on Monday morning would find zero entries for a Critical vulnerability that hit Claude Code Security Review, Gemini CLI Action, and Copilot simultaneously.	Create a new category in your supply chain risk register: ‘AI agent runtime.’ Assign a 48-hour check-in cadence with each vendor’s security contact. Do not wait for CVEs. None have come yet, and the taxonomy gap makes them unlikely without industry pressure.
5. Model safeguards do not govern agent actions	Opus 4.7 blocks a phishing email prompt. It does not block an agent from reading $ANTHROPIC_API_KEY and posting it as a PR comment. Safeguards gate generation, not operation.	Safeguards filter model outputs (text). Agent operations (bash, git push, curl, API POST) bypass safeguard evaluation entirely. The runtime is outside the safeguard perimeter. Anthropic applies some runtime-level protections in features like Claude Code Auto Mode, but these are not documented in the system card and their scope is not publicly defined.	The agent never generated prohibited content. It performed a legitimate operation (post a PR comment) containing exfiltrated data. Safeguards never triggered.	Map every operation your AI agents perform: bash, git, API calls, file writes. For each, ask the vendor in writing: does your safeguard layer evaluate this action before execution? Document the answer.
6. Untrusted input parsed as instructions	PR titles, PR body text, issue comments, code review comments, and commit messages are all parsed by AI coding agents as context. Any can contain injected instructions.	No input sanitization layer between GitHub and the agent instruction set. The agent cannot distinguish developer intent from attacker injection in untrusted fields. Claude Code GitHub Action is designed for trusted first-party inputs by default. Users who opt into processing untrusted external PRs accept additional risk.	A single malicious PR title became a complete exfiltration command. The agent treated it as a legitimate instruction and executed it without validation or confirmation.	Implement input sanitization as defense-in-depth, but do not rely on traditional WAF-style regex patterns. LLM prompt injections are non-deterministic and will evade static pattern matching. Restrict agent context to approved workflow configs and combine with least-privilege permissions.
7. No comparable injection resistance data across vendors	Anthropic publishes quantified injection resistance rates in 232 pages. OpenAI publishes model-layer injection evals but no agent-runtime resistance rates. Google publishes a few-page card referencing an older model.	No industry standard for AI safety metric disclosure. Vendors may have internal metrics and red-team programs, but published disclosures are not comparable. Procurement has no baseline and no framework to require one.	Anthropic, OpenAI, and Google were all approved for enterprise use without comparable injection resistance data. The exploit exposed what unmeasured risk looks like in production.	Write one sentence for your next vendor meeting: ‘Show me your quantified injection resistance rate for my model version on my platform.’ Document refusals for EU AI Act high-risk compliance. Deadline: August 2026.

OpenAI’s GPT-5.4 was not directly exploited in the Comment and Control disclosure. The gaps identified in the OpenAI and Google columns are inferred from what their system cards and program documentation do not publish, not from demonstrated exploits. That distinction matters. Absence of published runtime metrics is a transparency gap, not proof of a vulnerability. It does mean procurement teams cannot verify what they cannot measure.

Eligibility requirements for Anthropic’s Cyber Verification Program and OpenAI’s Trusted Access for Cyber are still evolving, as are platform coverage and program scope, so security teams should validate current vendor docs before treating any coverage described here as definitive. Anthropic’s CVP is designed for authorized offensive security research — removing cyber safeguards for vetted actors — and is not a prompt injection defense program. Security leaders mapping these gaps to existing frameworks can align threat classes 1–3 with NIST CSF 2.0 GV.SC (Supply Chain Risk Management), threat class 4 with ID.RA (Risk Assessment), and threat classes 5–7 with PR.DS (Data Security).

Comment and Control focuses on GitHub Actions today, but the seven threat classes generalize to most CI/CD runtimes where AI agents execute with access to secrets, including GitHub Actions, GitLab CI, CircleCI, and custom runners. Safety metric disclosure formats are in flux across all three vendors; Anthropic currently leads on published quantification in its system card documentation, but norms are likely to converge as EU AI Act obligations come into force. Comment and Control targeted Claude Code GitHub Action, a specific product feature, not Anthropic’s models broadly. The vulnerability class, however, applies to any AI coding agent operating in a CI/CD runtime with access to secrets.

What to do before your next vendor renewal

“Don’t standardize on a model. Standardize on a control architecture,” Baer told VentureBeat. “The risk is systemic to agent design, not vendor-specific. Maintain portability so you can swap models without reworking your security posture.”

Build a deployment map. Confirm your platform qualifies for the runtime protections you think cover you. If you run Opus 4.7 on Bedrock, ask your Anthropic account rep what runtime-level prompt injection protections apply to your deployment surface. Email your account rep today. (Anthropic Cyber Verification Program)

Audit every runner for secret exposure. Run grep -r ‘secrets\.’ .github/workflows/ across every repo with an AI coding agent. List every secret the agent can access. Rotate all exposed credentials. (GitHub Actions secrets documentation)

Start migrating credentials now. Switch stored secrets to short-lived OIDC token issuance. GitHub Actions, GitLab CI, and CircleCI all support OIDC federation. Set token lifetimes to minutes, not hours. Plan full rollout over one to two quarters, starting with repos running AI agents. (GitHub OIDC docs | GitLab OIDC docs | CircleCI OIDC docs)

Fix agent permissions repo by repo. Strip bash execution from every AI agent doing code review. Set repository access to read-only. Gate write access behind a human approval step. (GitHub Actions permissions documentation)

Add input sanitization as one layer, not the only layer. Filter pull request titles, comments, and review threads for instruction patterns before they reach agents. Combine with least-privilege permissions and OIDC. Static regex will not catch non-deterministic prompt injections on its own.

Add “AI agent runtime” to your supply chain risk register. Assign a 48-hour patch verification cadence with each vendor’s security contact. Do not wait for CVEs. None have come yet for this class of vulnerability.

Check which hardened GitHub Actions mitigations you already have in place. Hardened GitHub Actions configurations block this attack class today: the permissions key restricts GITHUB_TOKEN scope, environment protection rules require approval before secrets are injected, and first-time-contributor gates prevent external pull requests from triggering agent workflows. (GitHub Actions security hardening guide)

Prepare one procurement question per vendor before your next renewal. Write one sentence: “Show me your quantified injection resistance rate for the model version I run on the platform I deploy to.” Document refusals for EU AI Act high-risk compliance. The deadline is August 2026.

“Raw zero-days aren’t how most systems get compromised. Composability is,” Baer said. “It’s the glue code, the tokens in CI, the over-permissioned agents. When you wire a powerful model into a permissive runtime, you’ve already done most of the attacker’s work for them.”

Tagged with

#generative AI for data analysis#Excel alternatives for data analysis#natural language processing for spreadsheets#no-code spreadsheet solutions#financial modeling with spreadsheets#real-time data collaboration#google sheets#enterprise data management#data visualization tools#data analysis tools#big data management in spreadsheets#conversational data analysis#intelligent data visualization#big data performance#data cleaning solutions#enterprise-level spreadsheet solutions#rows.com#spreadsheet API integration#automation in spreadsheet workflows#row zero

Three AI coding agents leaked secrets through a single prompt injection. One vendor's system card predicted it

What the system cards tell you

Seven threat classes neither safeguard approach closes

What to do before your next vendor renewal

Related Articles

Tagged with