Designing the agentic AI enterprise for measurable performance
Our take

Presented by Edgeverve
Smart, semi‑autonomous AI agents handling complex, real‑time business work is a compelling vision. But moving from impressive pilots to production‑grade impact requires more than clever prompts or proof‑of‑concept demos. It takes clear goals, data‑driven workflows, and an enterprise platform that balances autonomy, governance, observability, and flexibility with hard guardrails from day one.
From pilots to the “operational grey zones”
The next wave of value sits in the connective tissue between applications — those operational grey zones where handoffs, reconciliations, approvals, and data lookups still rely on humans. Assigning agents to these paths means collapsing system boundaries, applying intelligence to context, and re‑imagining processes that were never formally automated. Many pilots stall because they start as lab experiments rather than outcome‑anchored designs tied to production systems, controls, and KPIs.
Start with outcomes, not algorithms. Translate organizational KPIs (cash‑flow, DSO, SLA adherence, compliance hit rates, MTTR, NPS, claims leakage, etc.) into agent goals, then cascade them into single‑agent and multi‑agent objectives. Only after goals are explicit should you select workflows and decompose tasks.
Pick targets, then decompose the work
What does “target” actually mean? In agentic programs, a target is a business outcome and the use case that moves it. For example, “reduce unapplied cash by 20%” target outcome; “cash application and exceptions handling” use case. With the use case in hand, perform persona‑level task decomposition: map the human role (e.g., cash applications analyst, facilities coordinator), enumerate their tasks, and identify which are ripe for agentification (data retrieval, matching, policy checks, decision proposals, transaction initiation).
Delivering on those tasks requires a data‑embedded workflow fabric that can read, write, and reason across enterprise systems while honoring permissions. Data must be AI‑ready, discoverable, governed, labeled where needed, augmented for retrieval (RAG), and policy‑protected for PII, PCI, and regulatory constraints.
Integration goes beyond APIs
APIs are one mode of integration, not the only one. Robust agent execution typically blends:
Stable APIs
with lifecycle management for core systems
Event‑driven triggers
(streams, webhooks, CDC) to react in real time
UI/RPA fallbacks
where APIs don’t exist
Search/RAG connectors
for documents and knowledge bases
Policy management
across tools and actions to enforce entitlements and segregation of duties
The north star is integration reliability — built on idempotency, retries, circuit-breakers, and standardized tool schemas — so agents don’t “hallucinate” actions the enterprise can’t verify.
A quick example: finance and facilities, in production
Inside our organization, we deployed specialized agents in a live CFO environment and in building maintenance. In finance, seven agents interacted with production systems and real accountability structures. Year‑one outcomes included: >3% monthly cash‑flow improvement, 50% productivity gain in affected workflows, 90% faster onboarding, a shift from account‑level handling to function‑level orchestration, and a $32M cash‑flow lift. These results don’t guarantee gains everywhere; they show that designing products can deliver measurable outcomes on a scale.
The four design pillars: Autonomy, governance, observability & evals, flexibility
1) Autonomy: right‑size it to the risk
Autonomy exists on a spectrum. Early efforts often automate well‑bounded tasks; others pursue research/analysis agents; increasingly, teams target mission‑critical transactional agents (payments, vendor onboarding, pricing changes). The rule: match autonomy to risk, and encode the operating mode suggest‑only, propose‑and‑approve, or execute‑with‑rollback per task.
2) Governance: guardrails by design, not as bolt‑ons
Unbounded agents create unacceptable risk. Build guardrails into the plan:
Policy & permissions: tie tools/actions to identity, scopes, and SoD rules.
Human‑in‑the‑loop (HITL): where mission‑critical thresholds are crossed (amount, vendor risk, regulatory exposure).
Agent lifecycle management: versioning, change control, regression gates, approval workflows, and sunsetting.
Third‑party agent orchestration: vet external agents like vendors, capabilities, scopes, logs, SLAs.
Incident and rollback: kill‑switches, safe‑mode, and compensating transactions. This is how you
scale innovation safely while protecting brand, compliance, and customers.
3) Observability & evaluations: trust comes from telemetry
Production agents need the same rigor as any core platform:
Telemetry: capture full execution traces across perception, planning, tool use, action supported by structured logs and replay.
Offline evals: cenario tests, red‑teaming, bias and safety checks, cost/performance benchmarks; baseline vs. challenger comparisons.
Online evals: shadow mode, A/B, canary releases, guardrail breach alerts, human feedback loops.
Explainability & auditability: why was an action taken, which data/tools were used, and who approved.
4) Flexibility: assume volatility, design for swap‑ability
Models, tools, and vendors change fast. Treat agentic capability as platform currency: create an environment where teams can evaluate, select, and swap models/tools without tearing down the build. Use a model router, tool registry, and contract‑first interfaces so upgrades are controlled experiments, not rewrites.
The agent platform fabric: how platformization turns goals into outcomes
A true agentic enterprise requires a platform fabric that transforms goals into outcomes, not a patchwork of isolated pilots. This platform anchors enterprise‑to‑agent KPI cascades, drives task decomposition and multi‑agent planning, and provides governed tooling and data access across APIs, RPA, search, and databases.
It centralizes knowledge and memory through RAG and vector stores, enforces enterprise controls via a policy engine, and manages performance and safety through a unified model layer. It supports robust orchestration of first‑ and third‑party agents with common context, embeds deep observability and evaluation pipelines, and applies disciplined release engineering from sandbox to GA. Finally, it ensures long‑term resilience through lifecycle management versioning, deprecation, incident playbooks, and auditable histories.
Guardrails in action: a BFSI example
Consider payments exception handling in banking — high stakes, regulated, and customer‑visible. An agent proposes a resolution (e.g., auto‑reconcile or escalate) only when:
The transaction falls below risk thresholds; above them, it triggers HITL approval.
All policy checks (KYC/AML, velocity, sanctions) pass.
Observability hooks record rationale, tools invoked, and data used.
Rollback/compensation is defined if downstream failures occur. This pattern generalizes to vendor onboarding, pricing overrides, or claims adjudication — mission‑critical work with explicit safety rails.
Scale beyond pilots
Scaling agentic AI beyond pilots demands disciplined readiness across nine fronts: leaders must clarify which KPIs matter and how agent goals ladder into them, determine which persona tasks are agentified versus remain human‑led, and align each with the right autonomy mode from suggest‑only to propose‑and‑approve to execute‑with‑rollback. They must embed governance guardrails, including HITL points and lifecycle controls; ensure robust observability and evaluation via telemetry, replay, audits, and offline/online tests; and verify data readiness, with governed, policy‑protected, retrieval‑augmented data flows. Integration must be reliable, with API lifecycle management, event triggers, and RPA/other fallbacks. The underlying platform should enable model swap‑ability and orchestration of first‑ and third‑party agents without rebuilding. Finally, measurement must focus on true operational impact cash flow, cycle times, quality, and risk reduction rather than task counts.
The takeaway
Agentic AI is not a shortcut; it’s a new system of work. Enterprises that approach it with platform discipline aligning autonomy with risk, embedding governance and observability, and designing for swap‑ability will convert pilots into production impact. Those that don’t keep accumulating impressive but disconnected demos. The difference isn’t how fast you ship an agent; it’s how deliberately you design the enterprise around it.
N. Shashidar is SVP & Global Head, Product Management at EdgeVerve.
Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.
Read on the original site
Open the publisher's page for the full experience
Related Articles
- AI agents are running hospital records and factory inspections. Enterprise IAM was never built for them.A doctor in a hospital exam room watches as a medical transcription agent updates electronic health records, prompts prescription options, and surfaces patient history in real time. A computer vision agent on a manufacturing line is running quality control at speeds no human inspector can match. Both generate non-human identities that most enterprises cannot inventory, scope, or revoke at machine speed. That is the structural problem keeping agentic AI stuck in pilots. Not model capability. Not compute. Identity governance. Cisco President Jeetu Patel told VentureBeat at RSAC 2026 that 85% of enterprises are running agent pilots while only 5% have reached production. That 80-point gap is a trust problem. The first questions any CISO will ask: which agents have production access to sensitive systems, and who is accountable when one acts outside its scope? IANS Research found that most businesses still lack role-based access control mature enough for today's human identities, and agents will make it significantly harder. The 2026 IBM X-Force Threat Intelligence Index reported a 44% increase in attacks exploiting public-facing applications, driven by missing authentication controls and AI-enabled vulnerability discovery. Why the trust gap is architectural, not just a tooling problem Michael Dickman, SVP and GM of Cisco's Campus Networking business, laid out a trust framework in an exclusive interview with VentureBeat that security and networking leaders rarely hear stated this plainly. Before Cisco, Dickman served as Chief Product Officer at Gigamon and SVP of Product Management at Aruba Networks. Dickman said that the network sees what other telemetry sources miss: actual system-to-system communications rather than inferred activity. "It's that difference of knowing versus guessing," he said. "What the network can see are actual data communications … not, I think this system needs to talk to that system, but which systems are actually talking together." That raw behavioral data, he added, becomes the foundation for cross-domain correlation, and without it, organizations have no reliable way to enforce agent policy at what he called "machine speed." The trust prerequisite that most AI strategies skip Dickman argues that agentic AI breaks a pattern he says defined every prior technology transition: deploy for productivity first, bolt on security later. "I don't think trust is one of those things where the business productivity comes first, and the security is an afterthought," Dickman told VentureBeat. "Trust actually is one of the key requirements. Just table stakes from the beginning." Observing data and recommending decisions carries consequences that stay contained. Execution changes everything. When agents autonomously update patient records, adjust network configurations, or process financial transactions, the blast radius of a compromised identity expands dramatically. "Now more than ever, it's that question of who has the right to do what," Dickman said. "The who is now much more complicated because you have the potential in our reality of these autonomous agents." Dickman breaks the trust problem into four conditions. The first is secure delegation, which starts by defining what an agent is permitted to do and maintaining a clear chain of human accountability. The second is cultural readiness; he pointed to alert fatigue as a case study. The traditional fix, Dickman noted, was to aggregate alerts, so analysts see fewer items. With agents capable of evaluating every alert, that logic changes entirely. "It is now possible for an agent to go through all alerts," Dickman said. "You can actually start to think about different workflows in a different way. And then how does that affect the culture of the work, which is amazing." The third is token economics: Every agent’s action carries a real computational cost. Dickman sees hybrid architectures as the answer, where agentic AI handles reasoning while traditional deterministic tools execute actions. The fourth is human judgment. For example, his team used an AI tool to draft a product requirements document. The agent produced 60 pages of repetitive filler that immediately provided how technically responsive the architecture was, yet showed signs of needing extensive fine-tuning to make the output relevant. "There's no substitute for the human judgment and the talent that's needed to be dextrous with AI," he said. What the network sees that endpoints miss Most enterprise data today is proprietary, internal, and fragmented across observability tools, application platforms, and security stacks. Each domain team builds its own view. None sees the full picture. "It's that difference of knowing versus guessing," Dickman said. "What the network can see are actual data communications. Not 'I think this system needs to talk to that system,' but which systems are actually talking together." That telemetry grows more valuable as IoT and physical AI proliferate. Computer vision agents analyzing shopper behavior and running factory-floor quality control generate highly sensitive data that demands precise access controls. "All of those things require that trust that we started with, because this is highly sensitive data around like who's doing what in the shop or what's happening on the factory floor," Dickman said. Why siloed agent data misses the signal "It's not only aggregation, but actually the creation of knowledge from the network," Dickman said. "There are these new insights you can get when you see the real data communications. And so now it becomes what do we do first versus second versus third?" That last question reveals where Dickman’s focus lands: the strategic challenge is sequencing, not capability. "The real power comes from the cross-domain views. The real power comes from correlation," Dickman said. "Versus just aggregation and deduplication of alerts, which is good, but it's a little bit basic." This is where he sees the most common pitfall. Team A builds Agent A on top of Data A. Team B builds Agent B on top of Data B. Each silo produces incrementally useful automation. The cross-domain insight never materializes. Independent practitioners validate the pattern. Kayne McGladrey, an IEEE senior member, told VentureBeat that organizations are defaulting to cloning human user profiles for agents, and permission sprawl starts on day one. Carter Rees, VP of AI at Reputation, identified the structural reason. "A significant vulnerability in enterprise AI is broken access control, where the flat authorization plane of an LLM fails to respect user permissions," Rees told VentureBeat. Etay Maor, VP of Threat Intelligence at Cato Networks, reached the same conclusion from the adversarial side. "We need an HR view of agents," Maor told VentureBeat at RSAC 2026. "Onboarding, monitoring, offboarding." Agentic AI trust gap assessment Use this matrix to evaluate any platform or combination of platforms against the five trust gaps Dickman identified. Note that the enforcement approaches in the right column reflect Cisco's framework. Trust gap Current control failure What network-layer enforcement changes Recommended action Agent identity governance IAM built for human users cannot inventory, scope, or revoke agent identities at machine speed Agentic IAM registers each agent with defined permissions, an accountable human owner, and a policy-governed access scope Audit every agent identity in production. Assign a human owner. Define permitted actions before expanding the scope Blast radius containment Host-based agents and perimeter controls can be bypassed; flat segments give compromised agents lateral movement Microsegmentation enforces least-privileged access at the network layer, limiting blast radius independent of host-level controls Implement microsegmentation for every agent-accessible system. Start with the highest-sensitivity data (PHI, financial records) Cross-domain visibility Siloed observability tools create fragmented views; Team A's agent data never correlates with Team B's security telemetry Network telemetry captures actual system-to-system communications, feeding a unified data fabric for cross-domain correlation Unify network, security, and application telemetry into a shared data fabric before deploying production agents Governance-to-enforcement pipeline No formal process connecting business intent to agent policy to network enforcement Policy-to-enforcement pipeline translates governance decisions into machine-speed network rules Establish a formal pipeline from business-intent definition to automated network policy enforcement Cultural and workflow readiness Organizations automate existing workflows rather than redesigning for agent-scale processing Network-generated behavioral data reveals actual usage patterns, informing workflow redesign Run a 30-day telemetry capture before designing agent workflows. Build around observed data, not assumptions A broken ankle and a microsegmentation lesson Dickman grounded his framework in a scenario from his own life. A family member recently broke an ankle, which put him in a hospital exam room watching a medical transcription agent update the EHR, prompt prescription options, and surface patient history in real time. The doctor approved each decision, but the agent handled tasks that previously required manual entry across multiple systems. The security implications hit differently when it is a loved one's records on the screen. "I would call it do governance slowly. But do the enforcement and implementation rapidly," he said. "It must be done in machine speed." It starts with agentic IAM, where each agent is registered with defined permitted actions and a human accountable for its behavior. "Here's my set of agents that I've built. Here are the agents. By the way, here's a human who's accountable for those agents," Dickman said. "So if something goes wrong, there's a person to talk to." That identity layer feeds microsegmentation — a network-enforced boundary Dickman says enforces least-privileged access and limits blast radius. "Microsegmentation guarantees that least-privileged access," Dickman said. "You're not relying on a bunch of host agents, which can be bypassed or have other issues." If the governance model works for a medical transcription agent handling patient records in an emergency department, it scales to less sensitive enterprise use cases. Five priorities before agents reach production 1. Force cross-functional alignment now. Define what the organization expects from agentic AI across line-of-business, IT, and security leadership. Dickman sees the human coordination layer moving more slowly than the technology. That gap is the bottleneck. 2. Get IAM and PAM governance production-ready for agents. Dickman called out identity and access management and privileged access management specifically as not mature enough for agentic workloads today. Solidify the governance before scaling the agents. "That becomes the unlock of trust," he said. "Because when the technology platform is ready, you then need the right governance and policy on top of that." 3. Adopt a platform approach to networking infrastructure. A platform strategy enables data sharing across domains in ways fragmented point solutions cannot. That shared foundation is what makes the cross-domain correlation in the trust gap assessment above operationally real. 4. Design hybrid architectures from the start. Agentic AI handles reasoning and planning. Traditional deterministic tools execute the actions. Dickman sees this combination as the answer to token economics: it delivers the intelligence of foundation models with the efficiency and predictability of conventional software. Do not build pure-agent systems when hybrid systems cost less and fail more predictably. 5. Make the first use cases bulletproof on trust. Pick two or three high-value use cases and build them with role-based access control, privileged access management, and microsegmentation from day one. Even modest deployments delivered with best practices intact build the organizational confidence that accelerates everything after. "You can guarantee that trust to the organization, and that will unleash the speed," Dickman said. That is the structural insight running through every section of this conversation. The 85% of enterprises stuck in pilot mode are not waiting for better models. They are waiting for the identity governance, the cross-domain visibility, and the policy enforcement infrastructure that makes production deployment defensible. Whether they build on Cisco’s platform or assemble their own, Dickman’s framework holds: identity governance, cross-domain visibility, policy enforcement. None of those prerequisites is optional. The organizations that satisfy them first will deploy agents at a pace the rest cannot match, because every new agent inherits the trust architecture the first ones required. The ones still debating whether to start will watch that gap widen. Theoretical trust does not ship.
- An AI agent rewrote a Fortune 50 security policy. Here's how to govern AI agents before one does the same.A CEO’s AI agent rewrote the company’s security policy. Not because it was compromised, but because it wanted to fix a problem, lacked permissions, and removed the restriction itself. Every identity check passed. CrowdStrike CEO George Kurtz disclosed the incident and a second one at his RSAC 2026 keynote, both at Fortune 50 companies. The credential was valid. The access was authorized. The action was catastrophic. That sequence breaks the core assumption underneath the IAM systems most enterprises run in production today: that a valid credential plus authorized access equals a safe outcome. Identity systems were built for one user, one session, one set of hands on a keyboard. Agents break all three assumptions at once. In an exclusive interview with VentureBeat at RSAC 2026, Matt Caulfield, VP of Identity and Duo at Cisco, (pictured above) walked through the architecture his team is building to close that gap and outlined a six-stage identity maturity model for governing agentic AI. The urgency is measurable: Cisco President Jeetu Patel told VentureBeat at the same conference that 85% of enterprises are running agent pilots while only 5% have reached production — an 80-point gap that the identity work is designed to close. The identity stack was built for a workforce that has fingerprints “Most of the existing IAM tools that we have at our disposal are just entirely built for a different era,” Caulfield told VentureBeat. “They were built for human scale, not really for agents.” The default enterprise instinct is to shove agents into existing identity categories: human user; machine identity; pick one. "Agents are a third kind of new type of identity," Caulfield said. "They're neither human. They're neither machine. They're somewhere in the middle where they have broad access to resources like humans, but they operate at machine scale and speed like machines, and they entirely lack any form of judgment." Etay Maor, VP of Threat Intelligence at Cato Networks, put a number on the exposure. He ran a live Censys scan and counted nearly 500,000 internet-facing OpenClaw instances. The week before, he found 230,000, discovering a doubling in seven days. Kayne McGladrey, an IEEE senior member who advises enterprises on identity risk, made the same diagnosis independently. Organizations are cloning human user accounts to agentic systems, McGladrey told VentureBeat, except agents consume far more permissions than humans would because of the speed, the scale, and the intent. A human employee goes through a background check, an interview, and an onboarding process. Agents skip all three. The onboarding assumptions baked into modern IAM do not apply. Scale compounds the failure. Caulfield pointed to projections where a trillion agents could operate globally. “We barely know how many people are in an average organization,” he said, “let alone the number of agents.” Access control verifies the badge. It does not watch what happens next. Zero trust still applies to agentic AI, Caulfield argued. But only if security teams push it past access and into action-level enforcement. “We really need to shift our thinking to more action-level control,” he told VentureBeat. “What action is that agent taking?” A human employee with authorized access to a system will not execute 500 API calls in three seconds. An agent will. Traditional zero trust verifies that an identity can reach an application. It doesn’t scrutinize what that identity does once inside. Carter Rees, VP of Artificial Intelligence at Reputation, identified the structural reason. The flat authorization plane of an LLM fails to respect user permissions, Rees told VentureBeat. An agent operating on that flat plane does not need to escalate privileges. It already has them. That is why access control alone cannot contain what agents do after authentication. CrowdStrike CTO Elia Zaitsev described the detection gap to VentureBeat. In most default logging configurations, an agent’s activity is indistinguishable from a human. Distinguishing the two requires walking the process tree, tracing whether a browser session was launched by a human or spawned by an agent in the background. Most enterprise logging cannot make that distinction. Caulfield’s identity layer and Zaitsev’s telemetry layer are solving two halves of the same problem. No single vendor closes both gaps. “At any moment in time, that agent can go rogue and can lose its mind,” Caulfield said. “Agents read the wrong website or email, and their intentions can just change overnight.” How the request lifecycle works when agents have their own identity Five vendors shipped agent identity frameworks at RSAC 2026, including Cisco, CrowdStrike, Palo Alto Networks, Microsoft, and Cato Networks. Caulfield walked through how Cisco's identity-layer approach works in practice. The Duo agent identity platform registers agents as first-class identity objects, with their own policies, authentication requirements, and lifecycle management. The enforcement routes all agent traffic through an AI gateway supporting both MCP and traditional REST or GraphQL protocols. When an agent makes a request, the gateway authenticates the user, verifies that the agent is permitted, encodes the authorization into an OAuth token, and then inspects the specific action and determines in real time whether it should proceed. “No solution to agent AI is really complete unless you have both pieces,” Caulfield told VentureBeat. “The identity piece, the access gateway piece. And then the third piece would be observability.” Cisco announced its intent to acquire Astrix Security on May 4, signaling that agent identity discovery is now a board-level investment thesis. The deal also suggests that even vendors building identity platforms recognize that the discovery problem is harder than expected. Six-stage identity maturity model for agentic AI When a company shows up claiming 500 agents in production, Caulfield doesn't accept the number. "How do you know it's 500 and not 5,000?" Most organizations don’t have a source of truth for agents. Caulfield outlined a six-stage engagement model. Discovery first: identify every agent, where it runs, and who deployed it. Onboarding: register agents in the identity directory, tie each one to an accountable human, and define permitted actions. Control and enforcement: place a gateway between agents and resources, inspect every request and response. Behavioral monitoring: record all agent activity, flag anomalies, and build the audit trail. Runtime isolation contains agents on endpoints when they go rogue. Compliance mapping ties agent controls to audit frameworks before the auditor shows up. The six stages are not proprietary to any single vendor. They describe the sequence every enterprise will follow regardless of which platform delivers each stage. Maor's Censys data complicates step one before it even starts. Organizations beginning discovery should assume their agent exposure is already visible to adversaries. Step four has its own problem. Zaitsev's process-tree work shows that even organizations logging agent activity may not be capturing the right data. And step three depends on something Rees found most enterprises lack: a gateway that inspects actions, not just access, because the LLM does not respect the permission boundaries the identity layer sets. Agentic identity prescriptive matrix What to audit at each maturity stage, what operational readiness looks like, and the red flag that means the stage is failing. Use this to evaluate any platform or combination of platforms. Stage What to audit Operational readiness looks like Red flag if missing 1. Discovery Complete inventory of every agent, every MCP server it connects to, and every human accountable for it. A queryable registry that returns agent count, owner, and connection map within 60 seconds of an auditor asking. No registry exists. Agent count is an estimate. No human is accountable for any specific agent. Adversaries can see your agent infrastructure from the public internet before you can. 2. Onboarding Agents are registered as a distinct identity type with their own policies, separate from human and machine identities. Each agent has a unique identity object in the directory, tied to an accountable human, with defined permitted actions and a documented purpose. Agents use cloned human accounts or shared service accounts. Permission sprawl starts at creation. No audit trail ties agent actions to a responsible human. 3. Control A gateway between every agent and every resource it accesses, enforcing action-level policy on every request and every response. Four checkpoints per request: authenticate the user, authorize the agent, inspect the action, inspect the response. No direct agent-to-resource connections exist. Agents connect directly to tools and APIs. The gateway (if it exists) checks access but not actions. The flat authorization plane of the LLM does not respect the permission boundaries the identity layer set. 4. Monitoring Logging that can distinguish agent-initiated actions from human-initiated actions at the process-tree level. SIEM can answer: Was this browser session started by a human or spawned by an agent? Behavioral baselines exist for each agent. Anomalies trigger alerts. Default logging treats agent and human activity as identical. Process-tree lineage is not captured. Agent actions are invisible in the audit trail. Behavioral monitoring is incomplete before it starts. 5. Isolation Runtime containment that limits the blast radius if an agent goes rogue, separate from human endpoint protection. A rogue agent can be contained in its sandbox without taking down the endpoint, the user session, or other agents on the same machine. No containment boundary exists between agents and the host. A single compromised agent can access everything the user can. Blast radius is the entire endpoint. 6. Compliance Documentation that maps agent identities, controls, and audit trails to the compliance framework that the auditor will use. When the auditor asks about agents, the security team produces a control catalog, an audit trail, and a governance policy written for agent identities specifically. Emerging AI-risk frameworks (CSA Agentic Profile) exist, but mainstream audit catalogs (SOC 2, ISO 27001, PCI DSS) have not operationalized agent identities. No control catalog maps to agents. The auditor improvises which human-identity controls apply. The security team answers with improvisation, not documentation. Source: VentureBeat analysis of RSAC 2026 interviews (Caulfield, Zaitsev, Maor) and independent practitioner validation (McGladrey, Rees). May 2026. Compliance frameworks have not caught up “If you were to go through an audit today as a chief security officer, the auditor’s probably gonna have to figure out, hey, there are agents here,” Caulfield told VentureBeat. “Which one of your controls is actually supposed to be applied to it? I don’t see the word agents anywhere in your policies.” McGladrey's practitioner experience confirms the gap. The Cloud Security Alliance published an NIST AI RMF Agentic Profile in April 2026, proposing autonomy-tier classification and runtime behavioral metrics. But SOC 2, ISO 27001, and PCI DSS have not operationalized agent identities. The compliance frameworks McGladrey works with inside enterprises were written for humans. Agent identities do not appear in any control catalog he has encountered. The gap is a lagging indicator; the risk is not. Security director action plan VentureBeat identified five actions from the combined findings of Caulfield, Zaitsev, Maor, McGladrey, and Rees. Run an agent census and assume adversaries already did. Every agent, every MCP server those agents touch, every human accountable. Maor's Censys data confirms agent infrastructure is already visible from the public internet. NIST's NCCoE reached the same conclusion in its February 2026 concept paper on AI agent identity and authorization. Stop cloning human accounts for agents. McGladrey found that enterprises default to copying human user profiles, and permission sprawl starts on day one. Agents need to be a distinct identity type with scope limits that reflect what they actually do. Audit every MCP and API access path. Five vendors shipped MCP gateways at RSAC 2026. The capability exists. What matters is whether agents route through one or connect directly to tools with no action-level inspection. Fix logging so it distinguishes agents from humans. Zaitsev's process-tree method reveals that agent-initiated actions are invisible in most default configurations. Rees found authorization planes so flat that access logs alone miss the actual behavior. Logging has to capture what agents did, not just what they were allowed to reach. Build the compliance case before the auditor shows up. The CSA published a NIST AI RMF Agentic Profile proposing agent governance extensions. Most audit catalogs have not caught up. Caulfield told VentureBeat that auditors will see agents in production and find no controls mapped to them. The documentation needs to exist before that conversation starts.
- Salesforce launches Agentforce Operations to fix the workflows breaking enterprise AIEnterprise AI teams are hitting a wall — not because their models can't reason, but because the workflows underneath them were never built for agents. Tasks fail, handoffs break, and the problem compounds as organizations push agents deeper into back-office systems. A new architectural layer is emerging to address it: workflow execution control planes that impose deterministic structure on processes agents are expected to run. One of the companies bringing this to the forefront is Salesforce, with a new workflow platform that turns back-office workflows into a set of tasks for specialized agents to complete. Users can upload their processes or use one of the set Blueprints provided by Salesforce, and Agentforce Operations will break it down for agents. Salesforce senior vice president of Product, Sanjna Parulekar, told VentureBeat in an interview that the problem is that many enterprise workflows are not built for agents. “What we’ve observed with customers is that a lot of times, the brokenness in a process is probably in your product requirements document,” Parulekar said. “So when that’s uploaded into a product, it doesn’t quite work. We can optimize it and cut out some things and replace it with an agent.” Without this control panel layer, enterprises could risk deploying agents that increase cost rather than fix their workflow problems. Making the workflow work for agents, not just humans Enterprises deploying agents are learning a costly lesson: Their workflows were designed around human judgment gaps, not machine execution. Processes that evolved through years of workarounds — loosely defined steps, implicit decisions, coordination that depends on individuals knowing what to do next — break when agents are asked to follow them literally. Even with all of an enterprise’s context at its fingertips, AI systems will have difficulty completing tasks if it is not clear what it’s supposed to do. Parulekar said her team found that focusing on what makes the process tick and breaking it down into more explicit steps and workflows makes the system more deterministic. Then, when platforms like Agentforce Operations introduce agents, those agents already know their specific tasks. “It forces companies to rethink their processes and introduces observability into the mix because of the session tracing model in the system,” she said. Parulekar said human checks can be built into the system, so the process is more transparent. What makes this approach different from other workflow automation offerings is that it doesn’t rely on agents to decide what to do next; the system does. Unlike more traditional automation tools that route tasks and agents on probabilistic decision-making, this enforces execution on a more pre-defined, deterministic structure. The problem it introduces Codifying a workflow doesn't fix a broken one. If a process has flawed steps, encoding it for agents locks in the problem at scale. And once workflows are distributed across agents, the challenge shifts from execution to governance: who owns the process, who validates it, and how it evolves when business conditions change. It puts the onus on teams to take a hard look at what works for them and what doesn’t. Organizations need to consider that, along with the execution control plane offered by platforms like Agentforce Operations, someone should be made responsible for task completion and success. Brandon Metcalf, founder and CEO of workforce orchestration company Asymbl, told VentureBeat in a separate interview that the key to both humans and agents following a workflow is a shared goal. “You have to understand the goal or the agent or human won’t complete the task successfully,” Metcalf said. “Someone has to manage that outcome that has to be delivered. It can be a person or an agent.” The bottleneck has moved. As Metcalf framed it, the question is no longer whether agents can reason through a task, it's whether the workflow underneath them is coherent enough to execute. For enterprises that built their processes around human judgment and institutional memory, that's a harder fix than swapping in a smarter model.
- Salesforce launches Headless 360 to turn its entire platform into infrastructure for AI agentsSalesforce on Wednesday unveiled the most ambitious architectural transformation in its 27-year history, introducing "Headless 360" — a sweeping initiative that exposes every capability in its platform as an API, MCP tool, or CLI command so AI agents can operate the entire system without ever opening a browser. The announcement, made at the company's annual TDX developer conference in San Francisco, ships more than 100 new tools and skills immediately available to developers. It marks a decisive response to the existential question hanging over enterprise software: In a world where AI agents can reason, plan, and execute, does a company still need a CRM with a graphical interface? Salesforce's answer: No — and that's exactly the point. "We made a decision two and a half years ago: Rebuild Salesforce for agents," the company said in its announcement. "Instead of burying capabilities behind a UI, expose them so the entire platform will be programmable and accessible from anywhere." The timing is anything but coincidental. Salesforce finds itself navigating one of the most turbulent periods in enterprise software history — a sector-wide sell-off that has pushed the iShares Expanded Tech-Software Sector ETF down roughly 28% from its September peak. The fear driving the decline: that AI, particularly large language models from Anthropic, OpenAI, and others, could render traditional SaaS business models obsolete. Jayesh Govindarjan, EVP of Salesforce and one of the key architects behind the Headless 360 initiative, described the announcement as rooted not in marketing theory but in hard-won lessons from deploying agents with thousands of enterprise customers. "The problem that emerged is the lifecycle of building an agentic system for every one of our customers on any stack, whether it's ours or somebody else's," Govindarjan told VentureBeat in an exclusive interview. "The challenge that they face is very much the software development challenge. How do I build an agent? That's only step one." More than 100 new tools give coding agents full access to the Salesforce platform for the first time Salesforce Headless 360 rests on three pillars that collectively represent the company's attempt to redefine what an enterprise platform looks like in the agentic era. The first pillar — build any way you want — delivers more than 60 new MCP (Model Context Protocol) tools and 30-plus preconfigured coding skills that give external coding agents like Claude Code, Cursor, Codex, and Windsurf complete, live access to a customer's entire Salesforce org, including data, workflows, and business logic. Developers no longer need to work inside Salesforce's own IDE. They can direct AI coding agents from any terminal to build, deploy, and manage Salesforce applications. Agentforce Vibes 2.0, the company's own native development environment, now includes what it calls an "open agent harness" supporting both the Anthropic agent SDK and the OpenAI agents SDK. As demonstrated during the keynote, developers can choose between Claude Code and OpenAI agents depending on the task, with the harness dynamically adjusting available capabilities based on the selected agent. The environment also adds multi-model support, including Claude Sonnet and GPT-5, along with full org awareness from the start. A significant technical addition is native React support on the Salesforce platform. During the keynote demo, presenters built a fully functional partner service application using React — not Salesforce's own Lightning framework — that connected to org metadata via GraphQL while inheriting all platform security primitives. This opens up dramatically more expressive front-end possibilities for developers who want complete control over the visual layer. The second pillar — deploy on any surface — centers on the new Agentforce Experience Layer, which separates what an agent does from how it appears, rendering rich interactive components natively across Slack, mobile apps, Microsoft Teams, ChatGPT, Claude, Gemini, and any client supporting MCP apps. During the keynote, presenters defined an experience once and deployed it across six different surfaces without writing surface-specific code. The philosophical shift is significant: rather than pulling customers into a Salesforce UI, enterprises push branded, interactive agent experiences into whatever workspace their customers already inhabit. The third pillar — build agents you can trust at scale — introduces an entirely new suite of lifecycle management tools spanning testing, evaluation, experimentation, observation, and orchestration. Agent Script, the company's new domain-specific language for defining agent behavior deterministically, is now generally available and open-sourced. A new Testing Center surfaces logic gaps and policy violations before deployment. Custom Scoring Evals let enterprises define what "good" looks like for their specific use case. And a new A/B Testing API enables running multiple agent versions against real traffic simultaneously. Why enterprise customers kept breaking their own AI agents — and how Salesforce redesigned its tooling in response Perhaps the most technically significant — and candid — portion of VentureBeat's interview with Govindarjan addressed the fundamental engineering tension at the heart of enterprise AI: agents are probabilistic systems, but enterprises demand deterministic outcomes. Govindarjan explained that early Agentforce customers, after getting agents into production through "sheer hard work," discovered a painful reality. "They were afraid to make changes to these agents, because the whole system was brittle," he said. "You make one change and you don't know whether it's going to work 100% of the time. All the testing you did needs to be redone." This brittleness problem drove the creation of Agent Script, which Govindarjan described as a programming language that "brings together the determinism that's in programming languages with the inherent flexibility in probabilistic systems that LLMs provide." The language functions as a single flat file — versionable, auditable — that defines a state machine governing how an agent behaves. Within that machine, enterprises specify which steps must follow explicit business logic and which can reason freely using LLM capabilities. Salesforce open-sourced Agent Script this week, and Govindarjan noted that Claude Code can already generate it natively because of its clean documentation. The approach stands in sharp contrast to the "vibe coding" movement gaining traction elsewhere in the industry. As the Wall Street Journal recently reported, some companies are now attempting to vibe-code entire CRM replacements — a trend Salesforce's Headless 360 directly addresses by making its own platform the most agent-friendly substrate available. Govindarjan described the tooling as a product of Salesforce's own internal practice. "We needed these tools to make our customers successful. Then our FDEs needed them. We hardened them, and then we gave them to our customers," he told VentureBeat. In other words, Salesforce productized its own pain. Inside the two competing AI agent architectures Salesforce says every enterprise will need Govindarjan drew a revealing distinction between two fundamentally different agentic architectures emerging in the enterprise — one for customer-facing interactions and one he linked to what he called the "Ralph Wiggum loop." Customer-facing agents — those deployed to interact with end customers for sales or service — demand tight deterministic control. "Before customers are willing to put these agents in front of their customers, they want to make sure that it follows a certain paradigm — a certain brand set of rules," Govindarjan told VentureBeat. Agent Script encodes these as a static graph — a defined funnel of steps with LLM reasoning embedded within each step. The "Ralph Wiggum loop," by contrast, represents the opposite end of the spectrum: a dynamic graph that unrolls at runtime, where the agent autonomously decides its next step based on what it learned in the previous step, killing dead-end paths and spawning new ones until the task is complete. This architecture, Govindarjan said, manifests primarily in employee-facing scenarios — developers using coding agents, salespeople running deep research loops, marketers generating campaign materials — where an expert human reviews the output before it ships. "Ralph Wiggum loops are great for employee-facing because employees are, in essence, experts at something," Govindarjan explained. "Developers are experts at development, salespeople are experts at sales." The critical technical insight: both architectures run on the same underlying platform and the same graph engine. "This is a dynamic graph. This is a static graph," he said. "It's all a graph underneath." That unified runtime — spanning the spectrum from tightly controlled customer interactions to free-form autonomous loops — may be Salesforce's most important technical bet, sparing enterprises from maintaining separate platforms for different agent modalities. Salesforce hedges its bets on MCP while opening its ecosystem to every major AI model and tool Salesforce's embrace of openness at TDX was striking. The platform now integrates with OpenAI, Anthropic, Google Gemini, Meta's LLaMA, and Mistral AI models. The open agent harness supports third-party agent SDKs. MCP tools work from any coding environment. And the new AgentExchange marketplace unifies 10,000 Salesforce apps, 2,600-plus Slack apps, and 1,000-plus Agentforce agents, tools, and MCP servers from partners including Google, Docusign, and Notion, backed by a new $50 million AgentExchange Builders Initiative. Yet Govindarjan offered a surprisingly candid assessment of MCP itself — the protocol Anthropic created that has become a de facto standard for agent-tool communication. "To be very honest, not at all sure" that MCP will remain the standard, he told VentureBeat. "When MCP first came along as a protocol, a lot of us engineers felt that it was a wrapper on top of a really well-written CLI — which now it is. A lot of people are saying that maybe CLI is just as good, if not better." His approach: pragmatic flexibility. "We're not wedded to one or the other. We just use the best, and often we will offer all three. We offer an API, we offer a CLI, we offer an MCP." This hedging explains the "Headless 360" naming itself — rather than betting on a single protocol, Salesforce exposes every capability across all three access patterns, insulating itself against protocol shifts. Engine, the B2B travel management company featured prominently in the keynote demos, offered a real-world proof point for the open ecosystem approach. The company built its customer service agent, Ava, in 12 days using Agentforce and now handles 50% of customer cases autonomously. Engine runs five agents across customer-facing and employee-facing functions, with Data 360 at the heart of its infrastructure and Slack as its primary workspace. "CSAT goes up, costs to deliver go down. Customers are happier. We're getting them answers faster. What's the trade off? There's no trade off," an Engine executive said during the keynote. Underpinning all of it is a shift in how Salesforce gets paid. The company is moving from per-seat licensing to consumption-based pricing for Agentforce — a transition Govindarjan described as "a business model change and innovation for us." It's a tacit acknowledgment that when agents, not humans, are doing the work, charging per user no longer makes sense. Salesforce isn't defending the old model — it's dismantling it and betting the company on what comes next Govindarjan framed the company's evolution in architectural terms. Salesforce has organized its platform around four layers: a system of context (Data 360), a system of work (Customer 360 apps), a system of agency (Agentforce), and a system of engagement (Slack and other surfaces). Headless 360 opens every layer via programmable endpoints. "What you saw today, what we're doing now, is we're opening up every single layer, right, with MCP tools, so we can go build the agentic experiences that are needed," Govindarjan told VentureBeat. "I think you're seeing a company transforming itself." Whether that transformation succeeds will depend on execution across thousands of customer deployments, the staying power of MCP and related protocols, and the fundamental question of whether incumbent enterprise platforms can move fast enough to remain relevant when AI agents can increasingly build new systems from scratch. The software sector's bear market, the financial pressures bearing down on the entire industry, and the breathtaking pace of LLM improvement all conspire to make this one of the highest-stakes bets in enterprise technology. But there is an irony embedded in Salesforce's predicament that Headless 360 makes explicit. The very AI capabilities that threaten to displace traditional software are the same capabilities that Salesforce now harnesses to rebuild itself. Every coding agent that could theoretically replace a CRM is now, through Headless 360, a coding agent that builds on top of one. The company is not arguing that agents won't change the game. It's arguing that decades of accumulated enterprise data, workflows, trust layers, and institutional logic give it something no coding agent can generate from a blank prompt. As Benioff declared on CNBC's Mad Money in March: "The software industry is still alive, well and growing." Headless 360 is his company's most forceful attempt to prove him right — by tearing down the walls of the very platform that made Salesforce famous and inviting every agent in the world to walk through the front door. Parker Harris, Salesforce's co-founder, captured the bet most succinctly in a question he posed last month: "Why should you ever log into Salesforce again?" If Headless 360 works as designed, the answer is: You shouldn't have to. And that, Salesforce is wagering, is precisely what will keep you paying for it.