AI agents that automatically prevent, detect and fix software issues are here as NeuBird AI launches Falcon, FalconClaw
Our take

The mantra of the modern tech industry was arguably coined by Facebook (before it became Meta): "move fast and break things."
But as enterprise infrastructure has shifted into a dizzying maze of hybrid clouds, microservices, and ephemeral compute clusters, the "breaking" part has become a structural tax that many organizations can no longer afford to pay. Today, two-year-old startup NeuBird AI is launching a full-scale offensive against this "chaos tax," announcing a $19.3 million funding round alongside the release of its Falcon autonomous production operations agent.
The launch isn't just a product update; it is a philosophical pivot. For years, the industry has focused on "Incident Response"—making the fire trucks faster and the hoses bigger. NeuBird AI is arguing that the only sustainable path forward is "Incident Avoidance".
As Venkat Ramakrishnan, President and COO of NeuBird AI, put it in a recent interview: "Incident management is so old school. Incident resolution is so old school. Incident avoidance is what is going to be enabled by AI".
By grounding AI in real-time enterprise context rather than just large language model reasoning, the company aims to move site reliability engineering and devops teams from a reactive posture to a predictive one.
The AI divide: a reality check on automation
Accompanying the launch is NeuBird AI’s 2026 State of Production Reliability and AI Adoption Report, a survey of over 1,000 professionals that reveals a massive disconnect between the boardroom and the server room.
While 74% of C-suite executives believe their organizations are actively using AI to manage incidents, only 39% of the practitioners—the engineers actually on-call at 2:00 AM—agree.
This 35-point "AI Divide" suggests that while leadership is writing checks for AI platforms, the technology is often failing to reach the frontline.
For engineers, the reality remains manual and grueling: the study found that engineering teams spend an average of 40% of their time on incident management rather than building new products.
Gou Rao, co-founder CEO of NeuBird AI, told VentureBeat that this is a persistent operational reality: “Over the past 18 months that we have been in production, this is not a marketing slide. We have concretely been able to demonstrate a massive reduction in time to incident response and resolution”.
The consequences of this "toil" are more than just lost productivity. Alert fatigue has transitioned from a morale issue to a direct reliability risk.
According to the report, 83% of organizations have teams that ignore or dismiss alerts occasionally, and 44% of companies experienced an outage in the past year tied directly to a suppressed or ignored alert. In many cases, the systems are so noisy that customers discover failures before the monitoring tools do.
Introducing NeuBird AI Falcon
NeuBird AI’s answer to this systemic failure is the Falcon engine. While the company’s previous iteration, Hawkeye, focused on autonomous resolution, Falcon extends that capability into predictive intelligence. "When we launched NeuBird AI in 2023, our first version of the agent was called Hawkeye," Rao explains. "What we’re announcing next week at HumanX is our next-generation version of the agent, codenamed Falcon. Falcon is easily three times faster than Hawkeye and is averaging around 92% in confidence scores".
This level of accuracy allows engineers to trust the agent's output at face value. Falcon represents a significant leap over previous generative AI applications in the space, particularly in its ability to forecast failure. "Falcon is really good at preventive prediction, so it can tell you what can go wrong," Rao says. "It’s pretty accurate on a 72-hour window, even better at 48 hours, and by 24 hours it gets really, really accurate”.
One of the standout features of the new release is the Advanced Context Map. Unlike static dashboards, this is a real-time view of infrastructure dependencies and service health. It allows teams to visualize the "blast radius" of an issue as it propagates across an environment, helping engineers understand not just what is broken, but why it is failing in the context of its neighbors.
'Minority Report' for incident management
While many AI tools favor flashy web interfaces, NeuBird AI is leaning into the developer's native habitat with NeuBird AI Desktop. This allows engineers to invoke the production ops agent directly from a command-line interface to explore root causes and system dependencies.
"Falcon has a desktop mode which allows it to interact with a developer’s local tools," Rao noted. "We’re getting a lot more traction from a hands-on developer audience, especially as people go to Claude Desktop and Cursor. They’re completing the loop by using production agents talking to their coding agents”.
This integration enables a "multi-agent" workflow where an engineer can use NeuBird AI’s agent to diagnose a root cause in production and then hand off that diagnosis to a coding agent like Claude Code to implement the fix.
During a live demo, Rao showcased how the agent could be set to "Sentinel Mode," constantly sweeping a cluster for risks. If it detects an anomaly—such as a projected 5% spike in AWS costs or a misconfigured Kubernetes pod—it can flag the specific engineer on-call who has the domain expertise to fix it.
"This is like 'Minority Report for Incident Management'," one financial services executive reportedly told the team after a demo.
Context engineering: a gateway for security
A primary concern for enterprises deploying AI is security—ensuring large language models don't go "crazy" or exfiltrate sensitive data. NeuBird AI addresses this through a proprietary approach to "context engineering".
"The way we implemented our agent is that the large language models themselves are never actually touching the data directly," Rao explains. "We become the gateway for how the context can be accessed”. This means the model is the reasoning engine, but NeuBird AI is the middleman that wraps the data.
Furthermore, the company has implemented strict guardrails on what the agent can actually execute. “We’ve created a language that confines and restricts the agent from what it can do," says Rao. "If it comes up with something anomalous, or something we don’t know, it won’t run. We won’t do it”.
This architectural choice allows NeuBird AI to remain model-agnostic. If a newer model from Anthropic or Google outperforms the current reasoning engine, NeuBird AI can simply switch it out without requiring the customer to change their platform. "Customers don’t want to be tied to a specific way of reasoning," Rao asserts. "They want to be tied to a platform from which they can get the value of an agentic system”.
Displacing the "army": displacing expensive observability
One of the most radical claims NeuBird AI makes is that agentic systems can actually reduce the amount of data enterprises need to store in the first place. Currently, teams rely on massive storage platforms with complex query languages.
"People use very complex observability tools like Datadog, Dynatrace, and Sysdig," Rao says. "This is the norm today, which is why it takes an army of people to solve a problem. What we’ve been able to demonstrate with agentic systems is that you don’t need to store all that data in the first place”. Because the agent can reason across raw data sources, it can identify which signals are junk and which are critical. This shift, Rao argues, “reduces human toil and effort while simultaneously reducing your reliance on these insanely expensive observability tools”.
The practical impact of this "incident avoidance" was recently demonstrated at Deep Health. Rao recounts how their agent detected a systemic issue that was invisible to traditional tools: “Our agent was able to go in and prevent an issue from happening which would have caused this company, Deep Health, a major production outage. The customer is completely beside themselves and happy about what it could do”.
FalconClaw: operationalizing 'tribal knowledge'
One of the most persistent problems in IT operations is the loss of "tribal knowledge"—the hard-won expertise of senior engineers that exists only in their heads. NeuBird AI is attempting to solve this with FalconClaw, a curated, enterprise-grade skills hub compatible with the OpenClaw ecosystem.
FalconClaw allows teams to capture best practices and resolution steps as "validated and compliant skills". The tech preview launched today with 15 initial skills that work natively with NeuBird AI’s toolchain.
According to Francois Martel, Field CTO at NeuBird AI, this turns hard-won expertise into a reusable asset that the AI can use automatically.
It’s an attempt to standardize how agents interact with infrastructure, moving away from proprietary "black box" systems toward a multi-agent world where different AI tools can share a common set of operational abilities.
Scaling the moat: funding and leadership
The $19.3 million round was led by Xora Innovation, a Temasek-backed firm, with participation from Mayfield, M12, StepStone Group, and Prosperity7 Ventures. This brings NeuBird AI’s total funding to approximately $64 million.
The investor interest is fueled largely by the pedigree of the founding team. Gou Rao and Vinod Jayaraman previously co-founded Portworx, which was acquired by Pure Storage, and Ocarina Networks, acquired by Dell. They have recently bolstered their leadership with Venkat Ramakrishnan, another Pure Storage veteran, as President and COO.
For investors like Phil Inagaki of Xora, the value lies in NeuBird AI’s "best-in-class results across accuracy, speed and token consumption". As cloud costs continue to spiral, the ability of an AI agent to not only fix bugs but also optimize infrastructure capacity is becoming a "must-have" rather than a "nice-to-have". NeuBird AI claims its agent can save enterprise teams more than 200 engineering hours per month.
The path to 'self-healing' infrastructure
As the State of Production Reliability report notes, current incident management practices are "no longer sustainable". With 61% of organizations estimating that a single hour of downtime costs $50,000 or more, the financial stakes of staying in a reactive loop are enormous.
NeuBird AI's launch of Falcon and FalconClaw marks a definitive attempt to break that loop. By focusing on prevention and the "context engineering" required to make AI trustworthy for enterprise production, the company is positioning itself as the critical intelligence layer for the modern stack.
While the "AI Divide" between executives and practitioners remains a significant hurdle for the industry, NeuBird AI is betting that as engineers see the value of a cli-driven, 92%-accurate agent that can "see around corners," the skepticism will fade. For the site reliability engineers currently drowning in a flood of non-actionable alerts, the arrival of a reliable ai teammate couldn't come soon enough.
NeuBird AI Falcon is available starting today, with organizations able to sign up for a free trial at neubird.ai.
Read on the original site
Open the publisher's page for the full experience
Related Articles
- CrowdStrike, Cisco and Palo Alto Networks all shipped agentic SOC tools at RSAC 2026 — the agent behavioral baseline gap survived all threeCrowdStrike CEO George Kurtz highlighted in his RSA Conference 2026 keynote that the fastest recorded adversary breakout time has dropped to 27 seconds. The average is now 29 minutes, down from 48 minutes in 2024. That is how much time defenders have before a threat spreads. Now CrowdStrike sensors detect more than 1,800 distinct AI applications running on enterprise endpoints, representing nearly 160 million unique application instances. Every one generates detection events, identity events, and data access logs flowing into SIEM systems architected for human-speed workflows. Cisco found that 85% of surveyed enterprise customers have AI agent pilots underway. Only 5% moved agents into production, according to Cisco President and Chief Product Officer Jeetu Patel in his RSAC blog post. That 80-point gap exists because security teams cannot answer the basic questions agents force. Which agents are running, what are they authorized to do, and who is accountable when one goes wrong. “The number one threat is security complexity. But we’re running towards that direction in AI as well,” Etay Maor, VP of Threat Intelligence at Cato Networks, told VentureBeat at RSAC 2026. Maor has attended the conference for 16 consecutive years. “We’re going with multiple point solutions for AI. And now you’re creating the next wave of security complexity.” Agents look identical to humans in your logs In most default logging configurations, agent-initiated activity looks identical to human-initiated activity in security logs. “It looks indistinguishable if an agent runs Louis’s web browser versus if Louis runs his browser,” Elia Zaitsev, CTO of CrowdStrike, told VentureBeat in an exclusive interview at RSAC 2026. Distinguishing the two requires walking the process tree. “I can actually walk up that process tree and say, this Chrome process was launched by Louis from the desktop. This Chrome process was launched from Louis’s Claude Cowork or ChatGPT application. Thus, it’s agentically controlled.” Without that depth of endpoint visibility, a compromised agent executing a sanctioned API call with valid credentials fires zero alerts. The exploit surface is already being tested. During his keynote, Kurtz described ClawHavoc, the first major supply chain attack on an AI agent ecosystem, targeting ClawHub, OpenClaw's public skills registry. Koi Security's February audit found 341 malicious skills out of 2,857; a follow-up analysis by Antiy CERT identified 1,184 compromised packages historically across the platform. Kurtz noted ClawHub now hosts 13,000 skills in its registry. The infected skills contained backdoors, reverse shells, and credential harvesters; Kurtz said in his keynote that some erased their own memory after installation and could remain latent before activating. "The frontier AI creators will not secure itself," Kurtz said. "The frontier labs are following the same playbook. They're building it. They're not securing it." Two agentic SOC architectures, one shared blind spot Approach A: AI agents inside the SIEM. Cisco and Splunk announced six specialized AI agents for Splunk Enterprise Security: Detection Builder, Triage, Guided Response, Standard Operating Procedures (SOP), Malware Threat Reversing, and Automation Builder. Malware Threat Reversing is currently available in Splunk Attack Analyzer and Detection Studio is generally available as a unified workspace; the remaining five agents are in alpha or prerelease through June 2026. Exposure Analytics and Federated Search follow the same timeline. Upstream of the SOC, Cisco's DefenseClaw framework scans OpenClaw skills and MCP servers before deployment, while new Duo IAM capabilities extend zero trust to agents with verified identities and time-bound permissions. “The biggest impediment to scaled adoption in enterprises for business-critical tasks is establishing a sufficient amount of trust,” Patel told VentureBeat. “Delegating and trusted delegating, the difference between those two, one leads to bankruptcy. The other leads to market dominance.” Approach B: Upstream pipeline detection. CrowdStrike pushed analytics into the data ingestion pipeline itself, integrating its Onum acquisition natively into Falcon’s ingestion system for real-time analytics, detection, and enrichment before events reach the analyst’s queue. Falcon Next-Gen SIEM now ingests Microsoft Defender for Endpoint telemetry natively, so Defender shops do not need additional sensors. CrowdStrike also introduced federated search across third-party data stores and a Query Translation Agent that converts legacy Splunk queries to accelerate SIEM migration. Falcon Data Security for the Agentic Enterprise applies cross-domain data loss prevention to data agents' access at runtime. CrowdStrike’s adversary-informed cloud risk prioritization connects agent activity in cloud workloads to the same detection pipeline. Agentic MDR through Falcon Complete adds machine-speed managed detection for teams that cannot build the capability internally. “The agentic SOC is all about, how do we keep up?” Zaitsev said. “There’s almost no conceivable way they can do it if they don’t have their own agentic assistance.” CrowdStrike opened its platform to external AI providers through Charlotte AI AgentWorks, announced at RSAC 2026, letting customers build custom security agents on Falcon using frontier AI models. Launch partners include Accenture, Anthropic, AWS, Deloitte, Kroll, NVIDIA, OpenAI, Salesforce, and Telefónica Tech. IBM validated buyer demand through a collaboration integrating Charlotte AI with its Autonomous Threat Operations Machine for coordinated, machine-speed investigation and containment. The ecosystem contenders. Palo Alto Networks, in an exclusive pre-RSAC briefing with VentureBeat, outlined Prisma AIRS 3.0, extending its AI security platform to agents with artifact scanning, agent red teaming, and a runtime that catches memory poisoning and excessive permissions. The company introduced an agentic identity provider for agent discovery and credential validation. Once Palo Alto Networks closes its proposed acquisition of Koi, the company adds agentic endpoint security. Cortex delivers agentic security orchestration across its customer base. Intel announced that CrowdStrike’s Falcon platform is being optimized for Intel-powered AI PCs, leveraging neural processing units and silicon-level telemetry to detect agent behavior on the device. Kurtz framed AIDR, AI Detection and Response, as the next category beyond EDR, tracking agent-speed activity across endpoints, SaaS, cloud, and AI pipelines. He said that “humans are going to have 90 agents that work for them on average” as adoption scales but did not specify a timeline. The gap no vendor closed What security leaders need Approach A: agents inside the SIEM (Cisco/Splunk) Approach B: upstream pipeline detection (CrowdStrike) Gap neither closes Triage at agent volume Six AI agents handle triage, detection, and response inside Splunk ES Onum-powered pipeline detects and enriches threats before the analyst sees them Neither baselines normal agent behavior before flagging anomalies Agent vs. human differentiation Duo IAM tracks agent identities but does not differentiate agent from human activity in SOC telemetry Process tree lineage distinguishes at runtime. AIDR extends to agent-specific detection No vendor’s announced capabilities include an out-of-the-box agent behavioral baseline 27-second response window Guided Response Agent executes containment at machine speed In-pipeline detection reduces queue volume. Agentic MDR adds managed response Human-in-the-loop governance has not been reconciled with machine-speed response in either approach Legacy SIEM portability Native Splunk integration preserves existing workflows Query Translation Agent converts Splunk queries. Native Defender ingestion lets Microsoft shops migrate Neither addresses teams running multiple SIEMs during migration Agent supply chain DefenseClaw scans skills and MCP servers pre-deployment. Explorer Edition red-teams agents EDR AI Runtime Protection catches compromised skills post-deployment. Charlotte AI AgentWorks enables custom agents Neither covers the full lifecycle. Pre-deployment scanning misses runtime exploits and vice versa The matrix makes one thing visible that the keynotes did not. No vendor shipped an agent behavioral baseline. Both approaches automate triage and accelerate detection. Based on VentureBeat's review of announced capabilities, neither defines what normal agent behavior looks like in a given enterprise environment. Teams running Microsoft Sentinel and Copilot for Security represent a third architecture not formally announced as a competing approach at RSAC this week, but CISOs in Microsoft-heavy environments need to test whether Sentinel's native agent telemetry ingestion and Copilot's automated triage close the same gaps identified above. Maor cautioned that the vendor response recycles a pattern he has tracked for 16 years. “I hope we don’t have to go through this whole cycle,” he told VentureBeat. “I hope we learned from the past. It doesn’t really look like it.” Zaitsev’s advice was blunt. “You already know what to do. You’ve known what to do for five, ten, fifteen years. It’s time to finally go do it.” Five things to do Monday morning These steps apply regardless of your SOC platform. None requires ripping and replacing current tools. Start with visibility, then layer in controls as agent volume grows. Inventory every agent on your endpoints. CrowdStrike detects 1,800 AI applications across enterprise devices. Cisco’s Duo Identity Intelligence discovers agentic identities. Palo Alto Networks’ agentic IDP catalogs agents and maps them to human owners. If you run a different platform, start with an EDR query for known agent directories and binaries. You cannot set policy for agents you do not know exist. Determine whether your SOC stack can differentiate agent from human activity. CrowdStrike’s Falcon sensor and AIDR do this through process tree lineage. Palo Alto Networks’ agent runtime catches memory poisoning at execution. If your tools cannot make this distinction, your triage rules are applying the wrong behavioral models. Match the architectural approach to your current SIEM. Splunk shops gain agent capabilities through Approach A. Teams evaluating migration get pipeline detection with Splunk query translation and native Defender ingestion through Approach B. Palo Alto Networks’ Cortex delivers a third option. Teams on Microsoft Sentinel, Google Chronicle, Elastic, or other platforms should evaluate whether their SIEM can ingest agent-specific telemetry at this volume. Build an agent behavioral baseline before your next board meeting. No vendor ships one. Define what your agents are authorized to do: which APIs, which data stores, which actions, at which times. Create detection rules for anything outside that scope. Pressure-test your agent supply chain. Cisco’s DefenseClaw and Explorer Edition scan and red-team agents before deployment. CrowdStrike’s runtime detection catches compromised agents post-deployment. Both layers are necessary. Kurtz said in his keynote that ClawHavoc compromised over a thousand ClawHub skills with malware that erased its own memory after installation. If your playbook does not account for an authorized agent executing unauthorized actions at machine speed, rewrite it. The SOC was built to protect humans using machines. It now protects machines using machines. The response window shrank from 48 minutes to 27 seconds. Any agent generating an alert is now a suspect, not just a sensor. The decisions security leaders make in the next 90 days will determine whether their SOC operates in this new reality or gets buried under it.
- Nvidia launches enterprise AI agent platform with Adobe, Salesforce, SAP among 17 adopters at GTC 2026Jensen Huang walked onto the GTC stage Monday wearing his trademark leather jacket and carrying, as it turned out, the blueprints for a new kind of industry dominance. The Nvidia CEO unveiled the Agent Toolkit, an open-source platform for building autonomous AI agents, and then rattled off the names of the companies that will use it: Adobe, Salesforce, SAP, ServiceNow, Siemens, CrowdStrike, Atlassian, Cadence, Synopsys, IQVIA, Palantir, Box, Cohesity, Dassault Systèmes, Red Hat, Cisco and Amdocs. Seventeen enterprise software companies, touching virtually every industry and every Fortune 500 corporation, all agreeing to build their next generation of AI products on a shared foundation that Nvidia designed, Nvidia optimizes and Nvidia maintains. The toolkit provides the models, the runtime, the security framework and the optimization libraries that AI agents need to operate autonomously inside organizations — resolving customer service tickets, designing semiconductors, managing clinical trials, orchestrating marketing campaigns. Each component is open source. Each is optimized for Nvidia hardware. The combination means that as AI agents proliferate across the corporate world, they will generate demand for Nvidia GPUs not because companies choose to buy them but because the software they depend on was engineered to require them. "The enterprise software industry will evolve into specialized agentic platforms," Huang told the crowd, "and the IT industry is on the brink of its next great expansion." What he left unsaid is that Nvidia has just positioned itself as the tollbooth at the entrance to that expansion — open to all, owned by one. Inside Nvidia's Agent Toolkit: the software stack designed to power every corporate AI worker To grasp the significance of Monday's announcements, it helps to understand the problem Nvidia is solving. Building an enterprise AI agent today is an exercise in frustration. A company that wants to deploy an autonomous system — one that can, say, monitor a telecommunications network and proactively resolve customer issues before anyone calls to complain — must assemble a language model, a retrieval system, a security layer, an orchestration framework and a runtime environment, typically from different vendors whose products were never designed to work together. Nvidia's Agent Toolkit collapses that complexity into a unified platform. It includes Nemotron, a family of open models optimized for agentic reasoning; AI-Q, an open blueprint that lets agents perceive, reason and act on enterprise knowledge; OpenShell, an open-source runtime enforcing policy-based security, network and privacy guardrails; and cuOpt, an optimization skill library. Developers can use the toolkit to create specialized AI agents that act autonomously while using and building other software to complete tasks. The AI-Q component addresses a pain point that has dogged enterprise AI adoption: cost. Its hybrid architecture routes complex orchestration tasks to frontier models while delegating research tasks to Nemotron's open models, which Nvidia says can cut query costs by more than 50 percent while maintaining top-tier accuracy. Nvidia used the AI-Q Blueprint to build what it claims is the top-ranking AI agent on both the DeepResearch Bench and DeepResearch Bench II leaderboards — benchmarks that, if they hold under independent validation, position the toolkit as not merely convenient but competitively necessary. OpenShell tackles what has been the single biggest obstacle in every boardroom conversation about letting AI agents loose inside corporate systems: trust. The runtime creates isolated sandboxes that enforce strict policies around data access, network reach and privacy boundaries. Nvidia is collaborating with Cisco, CrowdStrike, Google, Microsoft Security and TrendAI to integrate OpenShell with their existing security tools — a calculated move that enlists the cybersecurity industry as a validation layer for Nvidia's approach rather than a competing one. The partner list that reads like the Fortune 500: who signed on and what they're building The breadth of Monday's enterprise adoption announcements reveals Nvidia's ambitions more clearly than any specification sheet could. Adobe, in a simultaneously announced strategic partnership, will adopt Agent Toolkit software as the foundation for running hybrid, long-running creativity, productivity and marketing agents. Shantanu Narayen, Adobe's chair and CEO, said the companies will bring together "our Firefly models, CUDA libraries into our applications, 3D digital twins for marketing, and Agent Toolkit and Nemotron to our agentic frameworks to deliver high-quality, controllable and enterprise-grade AI workflows of the future." The partnership extends deep: Adobe will explore OpenShell and Nemotron as foundations for personalized, secure agentic loops, and will evaluate the toolkit for large-scale workflows powered by Adobe Experience Platform. Nvidia will provide engineering expertise, early access to software and targeted go-to-market support. Salesforce's integration may be the one enterprise IT leaders parse most carefully. The company is working with Nvidia Agent Toolkit software including Nemotron models, enabling customers to build, customize and deploy AI agents using Agentforce for service, sales and marketing. The collaboration introduces a reference architecture where employees can use Slack as the primary conversational interface and orchestration layer for Agentforce agents — powered by Nvidia infrastructure — that participate directly in business workflows and pull from data stores in both on-premises and cloud environments. For the millions of knowledge workers who already conduct their professional lives inside Slack, this turns a messaging app into the command center for corporate AI. SAP, whose software underpins the financial and operational plumbing of most Global 2000 companies, is using open Agent Toolkit software including NeMo for enabling AI agents through Joule Studio on SAP Business Technology Platform, enabling customers and partners to design agents tailored to their own business needs. ServiceNow's Autonomous Workforce of AI Specialists leverage Agent Toolkit software, the AI-Q Blueprint and a combination of closed and open models, including Nemotron and ServiceNow's own Apriel models — a hybrid approach that suggests the toolkit is designed not to replace existing AI investments but to become the connective tissue between them. From chip design to clinical trials: how agentic AI is reshaping specialized industries The partner list extends well beyond horizontal software platforms into deeply specialized verticals where autonomous agents could compress timelines measured in years. In semiconductor design — where a single advanced chip can cost billions of dollars and take half a decade to develop — three of the four major electronic design automation companies are building agents on Nvidia's stack. Cadence will leverage Agent Toolkit and Nemotron with its ChipStack AI SuperAgent for semiconductor design and verification. Siemens is launching its Fuse EDA AI Agent, which uses Nemotron to autonomously orchestrate workflows across its entire electronic design automation portfolio, from design conception through manufacturing sign-off. Synopsys is building a multi-agent framework powered by its AgentEngineer technology using Nemotron and Nemo Agent Toolkit. Healthcare and life sciences present perhaps the most consequential use case. IQVIA is integrating Nemotron and other Agent Toolkit software with IQVIA.ai, a unified agentic AI platform designed to help life sciences organizations work more efficiently across clinical, commercial and real-world operations. The scale is already significant: IQVIA has deployed more than 150 agents across internal teams and client environments, including 19 of the top 20 pharmaceutical companies. The security sector is embedding itself into the architecture from the ground floor. CrowdStrike unveiled a Secure-by-Design AI Blueprint that embeds its Falcon platform protection directly into Nvidia AI agent architectures — including agents built on AI-Q and OpenShell — and is advancing agentic managed detection and response using Nemotron reasoning models. Cisco AI Defense will provide AI security protection for OpenShell, adding controls and guardrails to govern agent actions. These are not aftermarket bolt-ons; they are foundational integrations that signal the security industry views Nvidia's agent platform as the substrate it needs to protect. Dassault Systèmes is exploring Agent Toolkit software and Nemotron for its role-based AI agents, called Virtual Companions, on its 3DEXPERIENCE agentic platform. Atlassian is working with the toolkit as it evolves its Rovo AI agentic strategy for Jira and Confluence. Box is using it to enable enterprise agents to securely execute long-running business processes. Palantir is developing AI agents on Nemotron that run on its sovereign AI Operating System Reference Architecture. The open-source gambit: why giving software away is Nvidia's most aggressive business move There is something almost paradoxical about a company with a multi-trillion-dollar market capitalization giving away its most strategically important software. But Nvidia's open-source approach to Agent Toolkit is less an act of generosity than a carefully constructed competitive moat. OpenShell is open source. Nemotron models are open. AI-Q blueprints are publicly available. LangChain, the agent engineering company whose open-source frameworks have been downloaded over 1 billion times, is working with Nvidia to integrate Agent Toolkit components into the LangChain deep agent library for developing advanced, accurate enterprise AI agents at scale. When the most popular independent framework for building AI agents absorbs your toolkit, you have transcended the category of vendor and entered the category of infrastructure. But openness in AI has a way of being strategically selective. The models are open, but they are optimized for Nvidia's CUDA libraries — the proprietary software layer that has locked developers into Nvidia GPUs for two decades. The runtime is open, but it integrates most deeply with Nvidia's security partners. The blueprints are open, but they perform best on Nvidia hardware. Developers can explore Agent Toolkit and OpenShell on build.nvidia.com today, running on inference providers and Nvidia Cloud Partners including Baseten, CoreWeave, DeepInfra, DigitalOcean and others — all of which run Nvidia GPUs. The strategy has a historical analog in Google's approach to Android: give away the operating system to ensure that the entire mobile ecosystem generates demand for your core services. Nvidia is giving away the agent operating system to ensure that the entire enterprise AI ecosystem generates demand for its core product — the GPU. Every Salesforce agent running Nemotron, every SAP workflow orchestrated through OpenShell, every Adobe creative pipeline accelerated by CUDA creates another strand of dependency on Nvidia silicon. This also explains the Nemotron Coalition announced Monday — a global collaboration of model builders including Mistral AI, Cursor, LangChain, Perplexity, Reflection AI, Sarvam and Thinking Machines Lab, all working to advance open frontier models. The coalition's first project will be a base model codeveloped by Mistral AI and Nvidia, trained on Nvidia DGX Cloud, that will underpin the upcoming Nemotron 4 family. By seeding the open model ecosystem with Nvidia-optimized foundations, the company ensures that even models it does not build will run best on its hardware. What could go wrong: the risks enterprise buyers should weigh before going all-in For all the ambition on display Monday, several realities temper the narrative. Adoption announcements are not deployment announcements. Many of the partner disclosures use carefully hedged language — "exploring," "evaluating," "working with" — that is standard in embargoed press releases but should not be confused with production systems serving millions of users. Adobe's own forward-looking statements note that "due to the non-binding nature of the agreement, there are no assurances that Adobe will successfully negotiate and execute definitive documentation with Nvidia on favorable terms or at all." The gap between a GTC keynote demonstration and an enterprise-grade rollout remains substantial. Nvidia is not the only company chasing this market. Microsoft, with its Copilot ecosystem and Azure AI infrastructure, pursues a parallel strategy with the advantage of owning the operating systems and productivity software that most enterprises already use. Google, through Gemini and its cloud platform, has its own agent vision. Amazon, via Bedrock and AWS, is building comparable primitives. The question is not whether enterprise AI agents will be built on some platform but whether the market will consolidate around one stack or fragment across several. The security claims, while architecturally sound, remain unproven at scale. OpenShell's policy-based guardrails are a promising design pattern, but autonomous agents operating in complex enterprise environments will inevitably encounter edge cases that no policy framework has anticipated. CrowdStrike's Secure-by-Design AI Blueprint and Cisco AI Defense's OpenShell integration are exactly the kind of layered defense enterprise buyers will demand — but both are newly unveiled, not battle-hardened through years of adversarial testing. Deploying agents that can autonomously access data, execute code and interact with production systems introduces a threat surface that the industry has barely begun to map. And there is the question of whether enterprises are ready for agents at all. The technology may be available, but organizational readiness — the governance structures, the change management, the regulatory frameworks, the human trust — often lags years behind what the platforms can deliver. Beyond agents: the full scope of what Nvidia announced at GTC 2026 Monday's Agent Toolkit announcement did not arrive in isolation. It landed amid an avalanche of product launches that, taken together, describe a company remaking itself at every layer of the computing stack. Nvidia unveiled the Vera Rubin platform — seven new chips in full production, including the Vera CPU purpose-built for agentic AI, the Rubin GPU, and the newly integrated Groq 3 LPU inference accelerator — designed to power every phase of AI from pretraining to real-time agentic inference. The Vera Rubin NVL72 rack integrates 72 Rubin GPUs and 36 Vera CPUs, delivering what Nvidia claims is up to 10x higher inference throughput per watt at one-tenth the cost per token compared with the Blackwell platform. Dynamo 1.0, an open-source inference operating system that Nvidia describes as the "operating system for AI factories," entered production with adoption from AWS, Microsoft Azure, Google Cloud and Oracle Cloud Infrastructure alongside companies like Cursor, Perplexity, PayPal and Pinterest. The BlueField-4 STX storage architecture promises up to 5x token throughput for the long-context reasoning that agents demand, with early adopters including CoreWeave, Crusoe, Lambda, Mistral AI and Nebius. BYD, Geely, Isuzu and Nissan announced Level 4 autonomous vehicle programs on Nvidia's DRIVE Hyperion platform, and Uber disclosed plans to launch Nvidia-powered robotaxis across 28 cities and four continents by 2028, beginning with Los Angeles and San Francisco in the first half of 2027. Roche, the pharmaceutical giant, announced it is deploying more than 3,500 Nvidia Blackwell GPUs across hybrid cloud and on-premises environments in the U.S. and Europe — what it calls the largest announced GPU footprint available to a pharmaceutical company. Nvidia also launched physical AI tools for healthcare robotics, with CMR Surgical, Johnson & Johnson MedTech and others adopting the platform, and released Open-H, the world's largest healthcare robotics dataset with over 700 hours of surgical video. And Nvidia even announced a Space Module based on the Vera Rubin architecture, promising to bring data-center-class AI to orbital environments. The real meaning of GTC 2026: Nvidia is no longer selling picks and shovels Strip away the product specifications and benchmark claims and what emerges from GTC 2026 is a single, clarifying thesis: Nvidia believes the era of AI agents will be larger than the era of AI models, and it intends to own the platform layer of that transition the way it already owns the hardware layer of the current one. The 17 enterprise software companies that signed on Monday are making a bet of their own. They are wagering that building on Nvidia's agent infrastructure will let them move faster than building alone — and that the benefits of a shared platform outweigh the risks of shared dependency. For Salesforce, it means Agentforce agents that can draw from both cloud and on-premises data through a single Slack interface. For Adobe, it means creative AI pipelines that span image, video, 3D and document intelligence. For SAP, it means agents woven into the transactional fabric of global commerce. Each partnership is rational on its own terms. Together, they form something larger: an industry-wide endorsement of Nvidia as the default substrate for enterprise intelligence. Huang, who opened his career designing graphics chips for video games, closed his keynote by gesturing toward a future in which AI agents do not just assist human workers but operate as autonomous colleagues — reasoning through problems, building their own tools, learning from their mistakes. He compared the moment to the birth of the personal computer, the dawn of the internet, the rise of mobile computing. Technology executives have a professional obligation to describe every product cycle as a revolution. But here is what made Monday different: this time, 17 of the world's most important software companies showed up to agree with him. Whether they did so out of conviction or out of a calculated fear of being left behind may be the most important question in enterprise technology — and it is one that only the next few years can answer.
- RSAC 2026 shipped five agent identity frameworks and left three critical gaps open“You can deceive, manipulate, and lie. That’s an inherent property of language. It’s a feature, not a flaw,” CrowdStrike CTO Elia Zaitsev told VentureBeat in an exclusive interview at RSA Conference 2026. If deception is baked into language itself, every vendor trying to secure AI agents by analyzing their intent is chasing a problem that cannot be conclusively solved. Zaitsev is betting on context instead. CrowdStrike’s Falcon sensor walks the process tree on an endpoint and tracks what agents did, not what agents appeared to intend. “Observing actual kinetic actions is a structured, solvable problem,” Zaitsev told VentureBeat. “Intent is not.” That argument landed 24 hours after CrowdStrike CEO George Kurtz disclosed two production incidents at Fortune 50 companies. In the first, a CEO's AI agent rewrote the company's own security policy — not because it was compromised, but because it wanted to fix a problem, lacked the permissions to do so, and removed the restriction itself. Every identity check passed; the company caught the modification by accident. The second incident involved a 100-agent Slack swarm that delegated a code fix between agents with no human approval. Agent 12 made the commit. The team discovered it after the fact. Two incidents at two Fortune 50 companies. Caught by accident both times. Every identity framework that shipped at RSAC this week missed them. The vendors verified who the agent was. None of them tracked what the agent did. The urgency behind every framework launch reflects a broader market shift. "The difficulty of securing agentic AI is likely to push customers toward trusted platform vendors that can offer broader coverage across the expanding attack surface," according to William Blair's RSA Conference 2026 equity research report by analyst Jonathan Ho. Five vendors answered that call at RSAC this week. None of them answered it completely. Attackers are already inside enterprise pilots The scale of the exposure is already visible in production data. CrowdStrike's Falcon sensors detect more than 1,800 distinct AI applications across the company's customer fleet, generating 160 million unique instances on enterprise endpoints. Cisco found that 85% of its enterprise customers surveyed have pilot agent programs; only 5% have moved to production, meaning the vast majority of these agents are running without the governance structures production deployments typically require. "The biggest impediment to scaled adoption in enterprises for business-critical tasks is establishing a sufficient amount of trust," Cisco President and Chief Product Officer Jeetu Patel told VentureBeat in an exclusive interview at RSA Conference 2026. "Delegating versus trusted delegating of tasks to agents. The difference between those two, one leads to bankruptcy and the other leads to market dominance." Etay Maor, VP of Threat Intelligence at Cato Networks, ran a live Censys scan during an exclusive VentureBeat interview at RSA Conference 2026 and counted nearly 500,000 internet-facing OpenClaw instances. The week before: 230,000. Cato CTRL senior researcher Vitaly Simonovich documented a BreachForums listing from February 22, 2026, published on the Cato CTRL blog on February 25, where a threat actor advertised root shell access to a UK CEO’s computer for $25,000 in cryptocurrency. The selling point was the CEO’s OpenClaw AI personal assistant, which had accumulated the company’s production database, Telegram bot tokens, and Trading 212 API keys in plain-text Markdown with no encryption at rest. “Your AI? It’s my AI now. It’s an assistant for the attacker,” Maor told VentureBeat. The exposure data from multiple independent researchers tells the same story. Bitsight found more than 30,000 OpenClaw instances exposed to the public internet between January 27 and February 8, 2026. SecurityScorecard identified 15,200 of those instances as vulnerable to remote code execution through three high-severity CVEs, the worst rated CVSS 8.8. Koi Security found 824 malicious skills on ClawHub — 335 of them tied to ClawHavoc, which Kurtz flagged in his keynote as the first major supply chain attack on an AI agent ecosystem. Five vendors, three gaps none of them closed Cisco went deepest on identity governance. Duo Agentic Identity registers agents as distinct identity objects mapped to human owners, and every tool call routes through an MCP gateway in Secure Access SSE. Cisco Identity Intelligence catches shadow agents by monitoring network traffic rather than authentication logs. Patel told VentureBeat that today’s agents behave “more like teenagers — supremely intelligent, but with no fear of consequence, easily sidetracked or influenced.” CrowdStrike made the biggest philosophical bet, treating agents as endpoint telemetry and tracking the kinetic layer through Falcon’s process-tree lineage. CrowdStrike expanded AIDR to cover Microsoft Copilot Studio agents and shipped Shadow SaaS and AI Agent Discovery across Copilot, Salesforce Agentforce, ChatGPT Enterprise, and OpenAI Enterprise GPT. Palo Alto Networks built Prisma AIRS 3.0 with an agentic registry, an agentic IDP, and an MCP gateway for runtime traffic control. Palo Alto Networks’ pending Koi acquisition adds supply chain and runtime visibility. Microsoft spread governance across Entra, Purview, Sentinel, and Defender, with Microsoft Sentinel embedding MCP natively and a Claude MCP connector in public preview April 1. Cato CTRL delivered the adversarial proof that the identity gaps the other four vendors are trying to close are already being exploited. Maor told VentureBeat that enterprises abandoned basic security principles when deploying agents. “We just gave these AI tools complete autonomy,” Maor said. Gap 1: Agents can rewrite the rules governing their own behavior The Kurtz incident illustrates the gap exactly. Every credential check passed — the action was authorized. Zaitsev argues that the only reliable detection happens at the kinetic layer: which file was modified, by what process, initiated by what agent, compared against a behavioral baseline. Intent-based controls evaluate whether the call looks malicious. This one did not. Palo Alto Networks offers pre-deployment red teaming in Prisma AIRS 3.0, but red teaming runs before deployment, not during runtime when self-modification happens. No vendor ships behavioral anomaly detection for policy-modifying actions as a production capability. Patel framed the stakes in the VentureBeat interview: “The agent takes the wrong action and worse yet, some of those actions might be critical actions that are not reversible.” Board question: An authorized agent modifies the policy governing the agent’s future actions. What fires? Gap 2: Agent-to-agent handoffs have no trust verification The 100-agent swarm is the proof point. Agent A found a defect and posted to Slack. Agent 12 executed the fix. No human approved the delegation. Zaitsev’s approach: collapse agent identities back to the human. An agent acting on your behalf should never have more privileges than you do. But no product follows the delegation chain between agents. IAM was built for human-to-system. Agent-to-agent delegation needs a trust primitive that does not exist in OAuth, SAML, or MCP. Gap 3: Ghost agents hold live credentials with no offboarding Organizations adopt AI tools, run a pilot, lose interest, and move on. The agents keep running. The credentials stay active. Maor calls these abandoned instances ghost agents. Zaitsev connected ghost agents to a broader failure: agents expose where enterprises delayed action on basic identity hygiene. Standing privileged accounts, long-lived credentials, and missing offboarding procedures. These problems existed for humans. Agents running at machine speed make the consequences catastrophic. Maor demonstrated a Living Off the AI attack at the RSA Conference 2026, chaining Atlassian’s MCP and Jira Service Management to show that attackers do not separate trusted tools, services, and models. Attackers chain all three. “We need an HR view of agents,” Maor told VentureBeat. “Onboarding, monitoring, offboarding. If there’s no business justification? Removal.” Why these three gaps resist a product fix Human IAM assumes the identity holder will not rewrite permissions, spawn new identities, or leave. Agents violate all three. OAuth handles user-to-service. SAML handles federated human identity. MCP handles model-to-tool. None includes agent-to-agent verification. Five vendors against three gaps Cisco CrowdStrike Microsoft Palo Alto Networks Unsolved Registration. Can the vendor discover and inventory agents? Duo Agentic Identity. Agents registered as identity objects with human owners. Shadow agent detection via network traffic. Falcon sensor auto-discovery. 1,800+ agent apps, ~160M instances across customer fleet. Security Dashboard for AI + Entra shadow AI detection at the network layer. Agentic registry in Prisma AIRS 3.0. Agents inventoried before operating. All four register agents. No cross-vendor identity standard exists. Self-modification. Can the vendor detect when an agent changes its own policies? MCP gateway catches anomalous tool-call patterns in real time, but does not monitor for direct policy file modifications on the endpoint. Process-tree lineage tracks file modifications at the action layer. Could detect a policy file change, but no dedicated self-modification rule ships. Defender predictive shielding adjusts access policies reactively during active attacks. Not proactive self-modification detection. AI Red Teaming tests for this before deployment. No runtime detection after the agent is live. OPEN. No vendor detects an agent rewriting the policy governing the agent’s own behavior as a shipping capability. Delegation. Can the vendor track when one agent hands work to another? Maps each agent to a human owner. Does not track agent-to-agent handoffs. Collapses the agent identity to the human operator. Does not correlate the delegation chains between agents. Entra governs individual non-human identities. No multi-agent chain tracking. AI Agent Gateway governs individual agents. No delegation primitive between agents. OPEN. No trust primitive for agent-to-agent delegation exists in OAuth, SAML, or MCP. Decommission. Can the vendor confirm a killed agent holds zero credentials? Identity Intelligence runs a continuous inventory of active agents. Shadow SaaS + AI Agent Discovery finds running agents across SaaS and endpoints. Entra's shadow AI detection surfaces unmanaged AI applications. Koi acquisition (pending) adds endpoint visibility for agent applications. OPEN. All four discover running agents. None verifies zero residual credentials after decommission. Runtime / Kinetic. Can the vendor monitor what agents do in real time? MCP gateway enforces policy per tool call at the network layer. Contextual anomaly detection on call patterns. Falcon EDR tracks commands, scripts, file activity, and network connections at the process level. Defender endpoint + cloud monitoring. Predictive shielding during active incidents. Prisma AIRS AI Agent Gateway for runtime traffic control. CrowdStrike is the only vendor framing endpoint runtime as the primary safety net for agentic behavior. Five things to do Monday morning before your board asks Audit self-modification risk. Pull every agent with write access to security policies, IAM configs, firewall rules, or ACLs. Flag any agent that can modify controls governing the agent’s own behavior. No vendor automates this. Map delegation paths. Document every agent-to-agent invocation. Flag delegation without human approval. Human-in-the-loop on every delegation event until a trust primitive ships. Kill ghost agents. Build a registry. For each agent: business justification, human owner, credentials held, systems accessed. No justification? Manual revoke. Weekly. Stress test the MCP gateway enforcement. Cisco, Palo Alto Networks, and Microsoft all announced MCP gateways this week. Verify that agent tool traffic actually routes through the gateway. A misconfigured gateway creates false confidence while agents call tools directly. Baseline agent behavioral norms. Before any agent reaches production, establish what normal looks like: typical API calls, data access patterns, systems touched, and hours of activity. Without a behavioral baseline, the kinetic-layer anomaly detection Zaitsev describes has nothing to compare against. Zaitsev’s advice was blunt: you already know what to do. Agents just made the cost of not doing it catastrophic. Every vendor at RSAC verified who the agent was. None of them tracked what the agent did.
- The three disciplines separating AI agent demos from real-world deploymentGetting AI agents to perform reliably in production — not just in demos — is turning out to be harder than enterprises anticipated. Fragmented data, unclear workflows, and runaway escalation rates are slowing deployments across industries. “The technology itself often works well in demonstrations,” said Sanchit Vir Gogia, chief analyst with Greyhound Research. “The challenge begins when it is asked to operate inside the complexity of a real organization.” Burley Kawasaki, who oversees agent deployment at Creatio, and team have developed a methodology built around three disciplines: data virtualization to work around data lake delays; agent dashboards and KPIs as a management layer; and tightly bounded use-case loops to drive toward high autonomy. In simpler use cases, Kawasaki says these practices have enabled agents to handle up to 80-90% of tasks on their own. With further tuning, he estimates they could support autonomous resolution in at least half of use cases, even in more complex deployments. “People have been experimenting a lot with proof of concepts, they've been putting a lot of tests out there,” Kawasaki told VentureBeat. “But now in 2026, we’re starting to focus on mission-critical workflows that drive either operational efficiencies or additional revenue.” Why agents keep failing in production Enterprises are eager to adopt agentic AI in some form or another — often because they're afraid to be left out, even before they even identify real-world tangible use cases — but run into significant bottlenecks around data architecture, integration, monitoring, security, and workflow design. The first obstacle almost always has to do with data, Gogia said. Enterprise information rarely exists in a neat or unified form; it is spread across SaaS platforms, apps, internal databases, and other data stores. Some are structured, some are not. But even when enterprises overcome the data retrieval problem, integration is a big challenge. Agents rely on APIs and automation hooks to interact with applications, but many enterprise systems were designed long before this kind of autonomous interaction was a reality, Gogia pointed out. This can result in incomplete or inconsistent APIs, and systems can respond unpredictably when accessed programmatically. Organizations also run into snags when they attempt to automate processes that were never formally defined, Gogia said. “Many business workflows depend on tacit knowledge,” he said. That is, employees know how to resolve exceptions they’ve seen before without explicit instructions — but, those missing rules and instructions become startlingly obvious when workflows are translated into automation logic. The tuning loop Creatio deploys agents in a “bounded scope with clear guardrails,” followed by an “explicit” tuning and validation phase, Kawasaki explained. Teams review initial outcomes, adjust as needed, then re-test until they’ve reached an acceptable level of accuracy. That loop typically follows this pattern: Design-time tuning (before go-live): Performance is improved through prompt engineering, context wrapping, role definitions, workflow design, and grounding in data and documents. Human-in-the-loop correction (during execution): Devs approve, edit, or resolve exceptions. In instances where humans have to intervene the most (escalation or approval), users establish stronger rules, provide more context, and update workflow steps; or, they’ll narrow tool access. Ongoing optimization (after go-live): Devs continue to monitor exception rates and outcomes, then tune repeatedly as needed, helping to improve accuracy and autonomy over time. Kawasaki’s team applies retrieval-augmented generation to ground agents in enterprise knowledge bases, CRM data, and other proprietary sources. Once agents are deployed in the wild, they are monitored with a dashboard providing performance analytics, conversion insights, and auditability. Essentially, agents are treated like digital workers. They have their own management layer with dashboards and KPIs. For instance, an onboarding agent will be incorporated as a standard dashboard interface providing agent monitoring and telemetry. This is part of the platform layer — orchestration, governance, security, workflow execution, monitoring, and UI embedding — that sits "above the LLM," Kawasaki said. Users see a dashboard of agents in use and each of their processes, workflows, and executed results. They can “drill down” into an individual record (like a referral or renewal) that shows a step-by-step execution log and related communications to support traceability, debugging, and agent tweaking. The most common adjustments involve logic and incentives, business rules, prompt context, and tool access, Kawasaki said. The biggest issues that come up post-deployment: Exception handling volume can be high: Early spikes in edge cases often occur until guardrails and workflows are tuned. Data quality and completeness: Missing or inconsistent fields and documents can cause escalations; teams can identify which data to prioritize for grounding and which checks to automate. Auditability and trust: Regulated customers, particularly, require clear logs, approvals, role-based access control (RBAC), and audit trails. “We always explain that you have to allocate time to train agents,” Creatio’s CEO Katherine Kostereva told VentureBeat. “It doesn't happen immediately when you switch on the agent, it needs time to understand fully, then the number of mistakes will decrease.” "Data readiness" doesn’t always require an overhaul When looking to deploy agents, “Is my data ready?,” is a common early question. Enterprises know data access is important, but can be turned off by a massive data consolidation project. But virtual connections can allow agents access to underlying systems and get around typical data lake/lakehouse/warehouse delays. Kawasaki’s team built a platform that integrates with data, and is now working on an approach that will pull data into a virtual object, process it, and use it like a standard object for UIs and workflows. This way, they don’t have to “persist or duplicate” large volumes of data in their database. This technique can be helpful in areas like banking, where transaction volumes are simply too large to copy into CRM, but are “still valuable for AI analysis and triggers,” Kawasaki said. Once integrations and virtual objects are established, teams can evaluate data completeness, consistency, and availability, and identify low-friction starting points (like document-heavy or unstructured workflows). Kawasaki emphasized the importance of “really using the data in the underlying systems, which tends to actually be the cleanest or the source of truth anyway.” Matching agents to the work The best fit for autonomous (or near-autonomous) agents are high-volume workflows with “clear structure and controllable risk,” Kawasaki said. For instance, document intake and validation in onboarding or loan preparation, or standardized outreach like renewals and referrals. “Especially when you can link them to very specific processes inside an industry — that's where you can really measure and deliver hard ROI,” he said. For instance, financial institutions are often siloed by nature. Commercial lending teams perform in their own environment, wealth management in another. But an autonomous agent can look across departments and separate data stores to identify, for instance, commercial customers who might be good candidates for wealth management or advisory services. “You think it would be an obvious opportunity, but no one is looking across all the silos,” Kawasaki said. Some banks that have applied agents to this very scenario have seen “benefits of millions of dollars of incremental revenue,” he claimed, without naming specific institutions. However, in other cases — particularly in regulated industries — longer-context agents are not only preferable, but necessary. For instance, in multi-step tasks like gathering evidence across systems, summarizing, comparing, drafting communications, and producing auditable rationales. “The agent isn't giving you a response immediately,” Kawasaki said. “It may take hours, days, to complete full end-to-end tasks.” This requires orchestrated agentic execution rather than a “single giant prompt,” he said. This approach breaks work down into deterministic steps to be performed by sub-agents. Memory and context management can be maintained across various steps and time intervals. Grounding with RAG can help keep outputs tied to approved sources, and users have the ability to dictate expansion to file shares and other document repositories. This model typically doesn’t require custom retraining or a new foundation model. Whatever model enterprises use (GPT, Claude, Gemini), performance improves through prompts, role definitions, controlled tools, workflows, and data grounding, Kawasaki said. The feedback loop puts “extra emphasis” on intermediate checkpoints, he said. Humans review intermediate artifacts (such as summaries, extracted facts, or draft recommendations) and correct errors. Those can then be converted into better rules and retrieval sources, narrower tool scopes, and improved templates. “What is important for this style of autonomous agent, is you mix the best of both worlds: The dynamic reasoning of AI, with the control and power of true orchestration,” Kawasaki said. Ultimately, agents require coordinated changes across enterprise architecture, new orchestration frameworks, and explicit access controls, Gogia said. Agents must be assigned identities to restrict their privileges and keep them within bounds. Observability is critical; monitoring tools can record task completion rates, escalation events, system interactions, and error patterns. This kind of evaluation must be a permanent practice, and agents should be tested to see how they react when encountering new scenarios and unusual inputs. “The moment an AI system can take action, enterprises have to answer several questions that rarely appear during copilot deployments,” Gogia said. Such as: What systems is the agent allowed to access? What types of actions can it perform without approval? Which activities must always require a human decision? How will every action be recorded and reviewed? “Those [enterprises] that underestimate the challenge often find themselves stuck in demonstrations that look impressive but cannot survive real operational complexity,” Gogia said.