AI agents are entering their rebuild era as enterprises confront the reliability problem
Our take

As enterprise AI agents move into production, organizations are increasingly facing the reliability problem that has come to define the next phase of AI integration. Many teams are realizing that the performance of large language models (LLMs) alone does not ensure success in real-world applications. Long-running AI workflows must not only survive crashes but also preserve state, recover from failures, and coordinate seamlessly across APIs and enterprise systems. This challenge echoes themes from other recent developments in technology, such as how Pinterest cut AI costs 90% by gutting a frontier model's vision layer, which showcases the necessity of optimizing existing systems rather than perpetually pushing forward with untested innovations.
Preeti Somal, Senior VP of Engineering at Temporal Technologies, highlights a crucial realization among enterprises: the need to revisit first-generation AI agent implementations. The initial focus on rapid deployment has left many organizations grappling with foundational issues, much like the early days of cloud adoption when businesses rushed to migrate workloads without adequate redesigns. This lack of foresight can lead to significant operational pitfalls, where teams are forced to rebuild agents from the ground up after experiencing failures due to inadequate architecture. As organizations begin to understand that AI is not merely a plug-and-play solution, but requires a robust infrastructure, the design of AI systems must evolve to prioritize workflow orchestration, observability, governance, and recovery.
The implications of this shift extend beyond technical specifications; they touch on the economic realities facing enterprises today. As AI becomes a strategic priority, leaders must evaluate the return on investment (ROI) associated with these systems. Costs can spiral when workflows fail, requiring reruns of entire processes, thereby driving up inference expenses and impacting customer experiences. The idea of a "deterministic spine," as articulated by Somal, provides a framework for understanding how orchestration software can support the reliability of probabilistic models, ensuring consistent execution even when faced with interruptions. This perspective is crucial as enterprises navigate the complexities of integrating AI into their existing workflows.
Looking ahead, the need for governance will become even more pronounced. As organizations seek to build standardized frameworks that balance flexibility with necessary controls, the focus will shift from merely adopting AI solutions to creating sustainable, long-term systems that enhance productivity. As seen in the healthcare example with Abridge, where workflows are complex and multifaceted, successful AI agents must be able to maintain continuity over time and withstand interruptions. This raises a significant question for enterprises: how will they ensure that their AI systems are not only innovative but also resilient and economically viable?
As organizations embark on this journey, the importance of collaboration with experts in workflow orchestration will only grow. The challenges presented by agentic AI are not merely technical hurdles; they are opportunities for enterprises to reimagine their data management practices and improve overall operational efficiency. The trend toward revisiting and refining first-generation implementations underscores a pivotal moment in the evolution of enterprise AI, encouraging organizations to build a foundation that will support the transformative potential of AI in the future. The journey is just beginning, and the successful enterprises will be those that not only adopt new technologies but also construct the robust systems that enable them to thrive.
As enterprise AI agents move into production, organizations are confronting a growing reliability problem. Many teams are discovering that LLM performance alone does not determine whether agents succeed in production. Long-running AI workflows must survive crashes, preserve state, recover from failures, manage inference costs, and coordinate across APIs, tools, and enterprise systems.
After a first wave focused on rapid deployment, organizations now need to revisit those first-generation implementations, and redesign early agent architectures around workflow orchestration, observability, governance, and recovery, said Preeti Somal, Senior VP Engineering at Temporal Technologies, during the latest AI Impact Series event in New York.
“We do have a lot of customers that come to us where they’re building version 2.0 of the same agent,” Somal said. “They had to move really fast, but they didn’t take care of the plumbing. Things crash and burn, and then they’re back to rebuilding with the reliable foundation.”
For workflow orchestration company Temporal, whose infrastructure predates the current wave of agentic AI, the shift reflects a broader enterprise realization: production AI systems require durable execution, state management, visibility into workflows, and mechanisms to recover when models or downstream systems fail.
Agentic AI has supercharged familiar engineering problems
“These patterns aren’t necessarily new," Somal said. " AI just supercharges them."
Agentic systems introduce additional complexity because they often involve long-running, multi-step processes spanning multiple services, models, APIs, and tools. A single workflow might call several large language models, access retrieval systems, trigger external applications, and manage state over hours or days. The engineering questions, Somal said, often emerge only after deployment.
“People will write agents but haven’t thought about what happens if the agent crashes,” she said. “Am I going to need to run the entire agent flow again?”
For enterprises operating under cost constraints, the answer matters. Restarting workflows after failures can multiply inference expenses, increase latency, and create poor customer experiences.
Somal compared the current moment to an earlier period in enterprise cloud adoption when organizations went straight to migrating workloads before considering that they needed to redesign underlying architectures if they wanted these workloads to weather the long-term.
“This rush to do AI in a world where you haven’t even modernized your application reminds me a little bit of that lift-and-shift that happened in the cloud,” she said. “Everybody realized you’re spending more money on cloud and we haven’t gotten value there.”
Why long-running agents force a new architecture
Enterprise workflows increasingly involve agents executing over long windows, sometimes spanning many hours while interacting with tools and systems. Reliability challenges compound when workflows persist over time, and it impacts both state and memory, two ideas that are often treated interchangeably in AI conversations.
State concerns workflow execution. It includes where an agent is in a process, which actions have already completed, and where recovery should resume after failure. Memory or context captures information an agent carries forward across interactions or tasks.
“The state of the agent is around what step and what actions have been performed, and if something crashes, where do you want to recover from, versus the context and memory piece,” Somal explained.
That distinction becomes increasingly important when enterprises begin moving beyond simple chatbot interactions toward longer-running business processes. Somal pointed to a healthcare example involving customer Abridge, where workflows process physician visits through multiple stages, including audio processing, summarization, model calls, and after-visit generation.
“There’s not just one piece to that flow,” Somal said. “Taking videos and slicing that, taking summaries, calling the LLMs, generating the after-visit summary, all of that is being orchestrated.”
The implication for enterprises is that successful agents increasingly depend on systems that can survive interruptions, coordinate across services, and maintain continuity over time.
The rise of the deterministic spine
A useful framework for enterprise AI design is the deterministic spine, Somal said, which is how they think about Temporal's role.
“It is denoting the path you want to take," she said. "It is calling the brain, but if the brain doesn’t respond, it will call it again. If the brain responds but the next step is going to fail, it will pick up from where that failure happened.”
In this framing, the language model acts as a probabilistic system producing variable outputs, while orchestration software maintains execution reliability around it. And the concept matters because enterprise systems increasingly require consistency even when models remain non-deterministic. A procurement workflow, healthcare summary, customer support escalation, or compliance process cannot simply fail silently because a model call timed out or an external dependency crashed.
“What you care most about is making sure that you can recover and that you’re not paying the token tax if something goes wrong,” Somal said.
Reliability, visibility, and the economics of token spend
As enterprise leaders evaluate AI ROI, cost visibility has become a growing concern. Long-running agents frequently make multiple model calls across complex workflows, which can create opaque spending patterns. Somal described one operational advantage of orchestration as visibility into where costs accumulate. Because workflows are observable step-by-step, teams can see where tokens are being consumed across an agent process.
“You’ve got visibility into that entire flow in a single pane of glass,” she said. “You can now see where you’re spending the tokens in an agent that is multiple steps and calling multiple different systems.”
Workflow recovery also shapes cost efficiency. Without durable orchestration, a late-stage failure can force organizations to rerun an entire process from the beginning, including all prior model calls. Somal said systems designed around recovery can resume execution from the point of interruption.
“You pick up from where the crash happened,” she said. “We save you the cost of running the agent from step one again.”
Enterprises need to build paved paths and enlist partner expertise
Governance concerns are another emerging pattern as agentic AI takes hold. Rather than adopting fully managed agent systems wholesale, Somal said enterprises increasingly want standardized internal frameworks that provide guardrails while preserving flexibility, and implementing necessary features like governance controls, model selection policies, identity systems, cost management, and observability.
“The enterprises are looking at building these paved paths,” she said. “Taking something off the shelf is maybe not going to work because there are all of these other requirements.”
As organizations revisit first-generation deployments, challenges like this increasingly look less like a model problem and more like a systems engineering problem, and Temporal is positioned to help enterprises take this next step in part because for many organizations, it already existed as part of broader modernization programs before AI became a strategic priority.
“Temporal is already in the enterprise,” Somal said. “Taking that and extending that to AI and agent platforms feels very natural.”
Read on the original site
Open the publisher's page for the full experience
Related Articles
- Designing the agentic AI enterprise for measurable performancePresented by Edgeverve Smart, semi‑autonomous AI agents handling complex, real‑time business work is a compelling vision. But moving from impressive pilots to production‑grade impact requires more than clever prompts or proof‑of‑concept demos. It takes clear goals, data‑driven workflows, and an enterprise platform that balances autonomy, governance, observability, and flexibility with hard guardrails from day one. From pilots to the “operational grey zones” The next wave of value sits in the connective tissue between applications — those operational grey zones where handoffs, reconciliations, approvals, and data lookups still rely on humans. Assigning agents to these paths means collapsing system boundaries, applying intelligence to context, and re‑imagining processes that were never formally automated. Many pilots stall because they start as lab experiments rather than outcome‑anchored designs tied to production systems, controls, and KPIs. Start with outcomes, not algorithms. Translate organizational KPIs (cash‑flow, DSO, SLA adherence, compliance hit rates, MTTR, NPS, claims leakage, etc.) into agent goals, then cascade them into single‑agent and multi‑agent objectives. Only after goals are explicit should you select workflows and decompose tasks. Pick targets, then decompose the work What does “target” actually mean? In agentic programs, a target is a business outcome and the use case that moves it. For example, “reduce unapplied cash by 20%” target outcome; “cash application and exceptions handling” use case. With the use case in hand, perform persona‑level task decomposition: map the human role (e.g., cash applications analyst, facilities coordinator), enumerate their tasks, and identify which are ripe for agentification (data retrieval, matching, policy checks, decision proposals, transaction initiation). Delivering on those tasks requires a data‑embedded workflow fabric that can read, write, and reason across enterprise systems while honoring permissions. Data must be AI‑ready, discoverable, governed, labeled where needed, augmented for retrieval (RAG), and policy‑protected for PII, PCI, and regulatory constraints. Integration goes beyond APIs APIs are one mode of integration, not the only one. Robust agent execution typically blends: Stable APIs with lifecycle management for core systems Event‑driven triggers (streams, webhooks, CDC) to react in real time UI/RPA fallbacks where APIs don’t exist Search/RAG connectors for documents and knowledge bases Policy management across tools and actions to enforce entitlements and segregation of duties The north star is integration reliability — built on idempotency, retries, circuit-breakers, and standardized tool schemas — so agents don’t “hallucinate” actions the enterprise can’t verify. A quick example: finance and facilities, in production Inside our organization, we deployed specialized agents in a live CFO environment and in building maintenance. In finance, seven agents interacted with production systems and real accountability structures. Year‑one outcomes included: >3% monthly cash‑flow improvement, 50% productivity gain in affected workflows, 90% faster onboarding, a shift from account‑level handling to function‑level orchestration, and a $32M cash‑flow lift. These results don’t guarantee gains everywhere; they show that designing products can deliver measurable outcomes on a scale. The four design pillars: Autonomy, governance, observability & evals, flexibility 1) Autonomy: right‑size it to the risk Autonomy exists on a spectrum. Early efforts often automate well‑bounded tasks; others pursue research/analysis agents; increasingly, teams target mission‑critical transactional agents (payments, vendor onboarding, pricing changes). The rule: match autonomy to risk, and encode the operating mode suggest‑only, propose‑and‑approve, or execute‑with‑rollback per task. 2) Governance: guardrails by design, not as bolt‑ons Unbounded agents create unacceptable risk. Build guardrails into the plan: Policy & permissions: tie tools/actions to identity, scopes, and SoD rules. Human‑in‑the‑loop (HITL): where mission‑critical thresholds are crossed (amount, vendor risk, regulatory exposure). Agent lifecycle management: versioning, change control, regression gates, approval workflows, and sunsetting. Third‑party agent orchestration: vet external agents like vendors, capabilities, scopes, logs, SLAs. Incident and rollback: kill‑switches, safe‑mode, and compensating transactions. This is how you scale innovation safely while protecting brand, compliance, and customers. 3) Observability & evaluations: trust comes from telemetry Production agents need the same rigor as any core platform: Telemetry: capture full execution traces across perception, planning, tool use, action supported by structured logs and replay. Offline evals: cenario tests, red‑teaming, bias and safety checks, cost/performance benchmarks; baseline vs. challenger comparisons. Online evals: shadow mode, A/B, canary releases, guardrail breach alerts, human feedback loops. Explainability & auditability: why was an action taken, which data/tools were used, and who approved. 4) Flexibility: assume volatility, design for swap‑ability Models, tools, and vendors change fast. Treat agentic capability as platform currency: create an environment where teams can evaluate, select, and swap models/tools without tearing down the build. Use a model router, tool registry, and contract‑first interfaces so upgrades are controlled experiments, not rewrites. The agent platform fabric: how platformization turns goals into outcomes A true agentic enterprise requires a platform fabric that transforms goals into outcomes, not a patchwork of isolated pilots. This platform anchors enterprise‑to‑agent KPI cascades, drives task decomposition and multi‑agent planning, and provides governed tooling and data access across APIs, RPA, search, and databases. It centralizes knowledge and memory through RAG and vector stores, enforces enterprise controls via a policy engine, and manages performance and safety through a unified model layer. It supports robust orchestration of first‑ and third‑party agents with common context, embeds deep observability and evaluation pipelines, and applies disciplined release engineering from sandbox to GA. Finally, it ensures long‑term resilience through lifecycle management versioning, deprecation, incident playbooks, and auditable histories. Guardrails in action: a BFSI example Consider payments exception handling in banking — high stakes, regulated, and customer‑visible. An agent proposes a resolution (e.g., auto‑reconcile or escalate) only when: The transaction falls below risk thresholds; above them, it triggers HITL approval. All policy checks (KYC/AML, velocity, sanctions) pass. Observability hooks record rationale, tools invoked, and data used. Rollback/compensation is defined if downstream failures occur. This pattern generalizes to vendor onboarding, pricing overrides, or claims adjudication — mission‑critical work with explicit safety rails. Scale beyond pilots Scaling agentic AI beyond pilots demands disciplined readiness across nine fronts: leaders must clarify which KPIs matter and how agent goals ladder into them, determine which persona tasks are agentified versus remain human‑led, and align each with the right autonomy mode from suggest‑only to propose‑and‑approve to execute‑with‑rollback. They must embed governance guardrails, including HITL points and lifecycle controls; ensure robust observability and evaluation via telemetry, replay, audits, and offline/online tests; and verify data readiness, with governed, policy‑protected, retrieval‑augmented data flows. Integration must be reliable, with API lifecycle management, event triggers, and RPA/other fallbacks. The underlying platform should enable model swap‑ability and orchestration of first‑ and third‑party agents without rebuilding. Finally, measurement must focus on true operational impact cash flow, cycle times, quality, and risk reduction rather than task counts. The takeaway Agentic AI is not a shortcut; it’s a new system of work. Enterprises that approach it with platform discipline aligning autonomy with risk, embedding governance and observability, and designing for swap‑ability will convert pilots into production impact. Those that don’t keep accumulating impressive but disconnected demos. The difference isn’t how fast you ship an agent; it’s how deliberately you design the enterprise around it. N. Shashidar is SVP & Global Head, Product Management at EdgeVerve. Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.
- The consequential AI work that actually moves the needle for enterprisesPresented by OutSystems After two years of flashy AI demos, rushed agent prototypes, and breathless predictions, enterprise technology leaders are striking a more pragmatic tone in 2026. In a recent webinar hosted by OutSystems, a panel of software executives and enterprise practitioners made the case that the most consequential AI work happening now is focused on the practical matters of governance, orchestration, and iteration, along with integrating agents into the systems they've spent decades building. Enterprise leaders are increasingly focused on fundamentals. The priority is using new AI technologies to accelerate productivity, improve delivery, and produce measurable business results. Three elements shape this work: The move from AI agent prototypes to agentic systems that deliver measurable ROI in production The growing role of enterprise platforms in governing, orchestrating, and scaling AI agents safely The rise of the generalist developer and enterprise architect as the most valuable technical profiles in an era of AI-generated code Against this backdrop, the panel discussed governance frameworks, the economics of enterprise AI, and the limits of large language models without orchestration. The conversation ultimately turned to how leading organizations are building multi-agent systems grounded in existing enterprise data and workflows. Agents in the real world Enabling agents to work in production across the enterprise is best accomplished with a unified platform that handles development, iteration, and deployment. And that'swhere capabilities like the Agent Workbench in the OutSystems platform matter, said Rajkiran Vajreshwari, senior manager of app development at Thermo Fisher Scientific. It provides the infrastructure to learn, iterate, and govern agents at scale. His team at Thermo Fisher has moved away from single-task AI assistants in customer service to building a coordinated team of specialized agents using the workbench. When a support case arrives, a triage assistant classifies the request and dynamically routes it to the right specialist agent, whether that’s an intent and priority agent, a product context agent, a troubleshooting agent, or a compliance agent. "We don’t have to think about what will work and how. It’s all pre-built," he explained. "Each agent has a narrow role and clear guardrails. They stay accurate and auditable.” Governing the risks of shadow AI A new category of risk emerges when AI makes it possible for anyone in a company to generate production-level code without IT oversight. Basically, this is ungoverned shadow AI. These homegrown products are prone to hallucinations, data leakage, policy violations, model drift, and agents taking actions that were never formally approved. To get ahead of the risk, leading organizations need to do three things, said Luis Blando, CPTO of OutSystems. "Give users guardrails. They’re going to use AI whether you like it or not. Companies that seem to be getting ahead are using AI to govern AI across their full portfolio,” he explained. “That is the difference between shadow AI chaos and enterprise-grade scale.” Eric Kavanagh, CEO of The Bloor Group, noted that governance requires a layered set of disciplines that includes securing data, monitoring models for drift, and making deliberate choices about where AI connects to existing business processes. “Companies don’t have to be manually creating these controls," he added. "A lot of those guardrails and levers are baked in to platforms like OutSystems.” Why the real orchestration challenge is models vs. platforms Much of the early excitement around enterprise AI focused on selecting the right large language model. Now the harder challenge, and far more durable source of value, is orchestration. This includes routing tasks, coordinating workflows, governing execution, and integrating AI into existing enterprise systems. Scott Finkle, VP of development at McConkey Auction Group, noted that LLMs, however impressive, are pieces of complex workflows, not final solutions. Organizations should be ready to hot-swap between Gemini, ChatGPT, Claude, and whatever emerges next without having to rebuild the agentic system around it. A platform with orchestration capabilities makes that possible. It manages the lifecycle, provides visibility, and ensures processes execute reliably, even as AI handles the reasoning layer on top. “The AI and the models change, the workflows can change, but the orchestration remains the same," Finkle said. "That’s how we’re going to extract value out of AI.” The economics of enterprise AI investing Security, compliance, governance, and platform-level AI capabilities will all command greater investment in 2026, particularly as AI moves into core workflows like finance and supply chain. Enterprises should favor incremental wins rather than expect big, immediate gains. “We’re focusing on base hits," Finkle said. "The way it counts is by getting something into production and having it make an impact. Big investments in pilot projects that don’t make it into production don’t save any money. It’s not going to happen overnight, but over time I think we’ll see tremendous savings.” There's still a split in how enterprises are approaching AI transformation. Some start from scratch and reimagine every process. Others, especially those with billions of dollars in existing infrastructure depreciating in-house, want AI to integrate with their systems. They want agentic systems to reuse data, APIs, and proven processes while speeding up delivery. The agent platform approach serves both camps, but particularly the latter. Organizations can deploy agents where they add clear value while preserving the integrity of established, deterministic workflows. The rise of the enterprise architect and the generalist developer As AI accelerates code generation, bottlenecks in software delivery are dissolving. In its place is a premium on systems thinking. This is the ability to understand the broader enterprise architecture, decompose complex business problems, and reason about how AI integrates with existing infrastructure. Kavanagh pointed to enterprise architects specifically as the professionals best positioned to capitalize on this moment. “We’re entering a very interesting age of the generalist," he explained. "The better you know your enterprise architecture and your business architecture and how those things align, the better off you’re going to be. ” “The result is faster delivery with fewer interruptions and fewer bugs," Kavanaugh said. "You can focus on the non-repetitive tasks. It’s a benefit to the developer, to the business, and to the whole IT organization.” Catch the entire webinar here. Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.
- Salesforce launches Agentforce Operations to fix the workflows breaking enterprise AIEnterprise AI teams are hitting a wall — not because their models can't reason, but because the workflows underneath them were never built for agents. Tasks fail, handoffs break, and the problem compounds as organizations push agents deeper into back-office systems. A new architectural layer is emerging to address it: workflow execution control planes that impose deterministic structure on processes agents are expected to run. One of the companies bringing this to the forefront is Salesforce, with a new workflow platform that turns back-office workflows into a set of tasks for specialized agents to complete. Users can upload their processes or use one of the set Blueprints provided by Salesforce, and Agentforce Operations will break it down for agents. Salesforce senior vice president of Product, Sanjna Parulekar, told VentureBeat in an interview that the problem is that many enterprise workflows are not built for agents. “What we’ve observed with customers is that a lot of times, the brokenness in a process is probably in your product requirements document,” Parulekar said. “So when that’s uploaded into a product, it doesn’t quite work. We can optimize it and cut out some things and replace it with an agent.” Without this control panel layer, enterprises could risk deploying agents that increase cost rather than fix their workflow problems. Making the workflow work for agents, not just humans Enterprises deploying agents are learning a costly lesson: Their workflows were designed around human judgment gaps, not machine execution. Processes that evolved through years of workarounds — loosely defined steps, implicit decisions, coordination that depends on individuals knowing what to do next — break when agents are asked to follow them literally. Even with all of an enterprise’s context at its fingertips, AI systems will have difficulty completing tasks if it is not clear what it’s supposed to do. Parulekar said her team found that focusing on what makes the process tick and breaking it down into more explicit steps and workflows makes the system more deterministic. Then, when platforms like Agentforce Operations introduce agents, those agents already know their specific tasks. “It forces companies to rethink their processes and introduces observability into the mix because of the session tracing model in the system,” she said. Parulekar said human checks can be built into the system, so the process is more transparent. What makes this approach different from other workflow automation offerings is that it doesn’t rely on agents to decide what to do next; the system does. Unlike more traditional automation tools that route tasks and agents on probabilistic decision-making, this enforces execution on a more pre-defined, deterministic structure. The problem it introduces Codifying a workflow doesn't fix a broken one. If a process has flawed steps, encoding it for agents locks in the problem at scale. And once workflows are distributed across agents, the challenge shifts from execution to governance: who owns the process, who validates it, and how it evolves when business conditions change. It puts the onus on teams to take a hard look at what works for them and what doesn’t. Organizations need to consider that, along with the execution control plane offered by platforms like Agentforce Operations, someone should be made responsible for task completion and success. Brandon Metcalf, founder and CEO of workforce orchestration company Asymbl, told VentureBeat in a separate interview that the key to both humans and agents following a workflow is a shared goal. “You have to understand the goal or the agent or human won’t complete the task successfully,” Metcalf said. “Someone has to manage that outcome that has to be delivered. It can be a person or an agent.” The bottleneck has moved. As Metcalf framed it, the question is no longer whether agents can reason through a task, it's whether the workflow underneath them is coherent enough to execute. For enterprises that built their processes around human judgment and institutional memory, that's a harder fix than swapping in a smarter model.
- Claude’s next enterprise battle is not models: it’s the agent control planeNew VB Pulse data shows Microsoft and OpenAI leading enterprise agent orchestration, but Anthropic’s first measurable foothold points to a larger fight over who controls the infrastructure where AI agents run. For the last two years, the enterprise AI race has mostly been framed as a model war: OpenAI’s GPT series versus Anthropic’s Claude versus Google’s Gemini, with smaller and open-source alternatives also coming in from the U.S. and China. But the next strategic fight may not be over which model answers a prompt best. It may be over who controls the layer where agents plan, call tools, access data, run workflows and prove to security teams that they did not do anything they were not supposed to do. New VB Pulse survey data suggests the category is already taking shape. Our independent Enterprise Agentic Orchestration tracker, a survey that records the preferences of qualified, verified technical-decision maker respondents at enterprises at regular intervals, found that Microsoft Copilot Studio and Azure AI Studio led with 38.6% primary-platform adoption in February, up from 35.7% in January. OpenAI’s Assistants and Responses API held second place, rising from 23.2% to 25.7%. Anthropic remained far smaller, but it made its first appearance in the tracker: moving from 0% in January to 5.7% in February for Anthropic tool use and workflows. The underlying move is small — four respondents out of a total 70 in this cohort, with more to come — but strategically interesting because it marks the first sign in this tracker of Claude usage moving from the model layer into native orchestration. That distinction matters. Enterprises are not merely choosing chatbots. They are deciding where the live operational machinery of AI work will sit: inside Microsoft’s stack, inside OpenAI’s API layer, inside Anthropic’s managed runtime, inside an open framework, or across a hybrid mix of all of them. “This is the convergence moment for enterprise AI,” said Tom Findling, CEO and cofounder of AI cybsersecurity startup Conifers, in a statement to VentureBeat. “Models and agent frameworks have matured enough together that enterprises are now shifting focus beyond model quality to the control plane around it. In security operations, we’re seeing the competitive advantage move toward platforms that can orchestrate agents, leverage enterprise context, and provide governance and auditability across customer environments.” Anthropic’s number is still small to start — but the increase is not The Anthropic number, by itself, should not be overread. A move from zero to 5.7% is not a juggernaut. It is not proof that Anthropic has captured enterprise orchestration. It is not even enough to say Anthropic has a durable lead in any part of this market. Microsoft owns the early enterprise distribution advantage, and OpenAI has a much larger installed base in orchestration than Anthropic. But small numbers can matter when they appear at the start of a new market structure. Anthropic’s emergence in orchestration comes as the broader VB Pulse data shows Claude also gaining massive enterprise adoption at the model layer. In our VB Pulse Q1 Foundation Models and Intelligence Platforms tracker, Anthropic rose from 23.9% in January to 28.6% in February and then even more dramatically to 56.2% in March among qualified enterprise respondents, with the March reading flagged as directional only, because the sample was only 16 respondents. The story, then, is not that Anthropic is winning orchestration today. It is that Anthropic’s model momentum may be starting to spill into the orchestration layer. That is where the strategic stakes get higher. A model is easier to swap than an agent runtime A model is relatively easy to swap, at least in theory. A company can route one workload to Claude, another to GPT, another to Gemini and another to a smaller open model. In fact, the VB Pulse Foundation Models tracker over the same Q1 period shows that multi-model strategy is the enterprise consensus: respondents increasingly report adopting multiple models and building orchestration layers that route across them by task, cost and risk profile. An agent runtime is different. Once a company’s workflows, tool permissions, credentials, audit logs, memory, sandboxed execution and operational monitoring live inside one provider’s environment, switching providers becomes less like changing models and more like changing infrastructure. That is the real reason Anthropic’s 5.7% foothold is worth watching Anthropic has already made clear that it wants to provide more than the model. Its Claude Managed Agents documentation describes a public beta for a managed agent harness with secure sandboxing, built-in tools and API-run sessions, while Anthropic’s engineering post frames the architecture around decoupling the model from the surrounding agent machinery: the session, the harness and the sandbox. In plain English, Anthropic is trying to host the environment where Claude agents remember context, use tools, run code, operate inside sandboxes and persist across long-running workflows. That is no longer just inference. That is operational infrastructure. The pitch is obvious: most enterprises do not want to stitch together their own agent stack from scratch. They want agents that can act, but they also want permission boundaries, audit trails, workflow reliability and ways to stop the system when something goes wrong. Security is becoming the buying criterion The VB Pulse orchestration tracker shows that buyers are prioritizing exactly those concerns. Security and permissions ranked as the top orchestration platform selection criterion in both January and February, at 39.3% and 37.1%. Control over agent execution rose from 17.9% to 22.9%, while flexibility across models and tools fell from 35.7% to 25.7%. The market appears to be shifting from optionality toward governance. That shift is not surprising. A chatbot can be wrong and still remain mostly contained. An agent that can send emails, modify documents, query databases, call APIs or execute workflows has a much larger blast radius. The enterprise question is not only whether the agent is smart enough. It is who gave it permission, what it touched, what it changed, whether those actions were logged, and whether the company can unwind the damage if something goes wrong. Ev Kontsevoy, cofounder and CEO of Teleport, an identity and digital infrastructure solutions company, argues that the industry is still putting too much emphasis on orchestration itself and not enough on identity: “The race to own the agent orchestration layer is real,” Kontsevoy said. “It’s also solving the wrong problem first. Orchestration without identity only multiplies chaos. Without identity, you don’t know what an agent can access, what it actually did, or how to revoke its access when it operates outside policy. A unified identity layer is a prerequisite to deploying agents — one or many — in infrastructure.” Syam Nair, Chief Product Officer at the intelligent data infrastructure company NetApp, believes data management is key in all cases to secure AI agent orchestration across the enterprise. As he said in a statement to VentureBeat: "Effective agent management requires built-in intelligence and a continuously updated understanding of both data and, critically, its metadata. This visibility allows organizations to define and enforce clear policies so data is used only by the right agents, for the right purposes. Making this work at scale is a crossfunctional effort. Security, storage, and data science teams must work together to implement policies that safeguard company data, while creating a strong data foundation for AI." He continued: "The CIOs and technology leaders that are successful are the ones who take the input, policies, and vision from all these teams into account as they build a data infrastructure that minimizes risk and drives business value." Microsoft has the distribution edge That is why Microsoft’s early lead makes sense. Copilot Studio and Azure AI Studio sit inside an enterprise stack many companies already use: Microsoft 365, Teams, Entra ID, Azure and existing procurement relationships. The VB Pulse Orchestration Tracker for Q1 2026 describes Microsoft as the enterprise default, with no other platform within 13 percentage points in February. David Weston, CVP, AI Security, Microsoft, provided some insight on why, writing in a statement to VentureBeat: "Without a unified control layer, you start to see fragmentation – agents operating in silos, inconsistent governance, and gaps in security. What customers are asking for is a way to bring order to that complexity. With Agent 365, we’re providing a single control plane to observe, govern, and secure agents across Microsoft, partner, and third-party ecosystems, all grounded in enterprise data and identity." OpenAI’s second-place position is also unsurprising. Its Assistants and Responses API gave developers an early way to build agent-like systems using OpenAI’s models and tooling. In the orchestration tracker, OpenAI is not surging, but it is still ticking up steadily: 23.2% in January to 25.7% in February. Anthropic is the newcomer at the orchestration layer. But its timing may be favorable. The VB Pulse Foundation Models tracker for Q1 2026 suggests enterprises increasingly see Claude as a fit for higher-stakes workloads where safety, instruction following, long context and governance matter. The orchestration tracker suggests those same buyers are now moving from agent experiments toward production workflows, where security, permissions and task reliability become the gating issues. That creates a possible path for Anthropic: not to beat Microsoft as the default enterprise platform, at least not immediately, but to become the agent runtime for companies that already trust Claude for sensitive or complex workloads. The risk is lock-in The risk for enterprises is lock-in. The orchestration tracker found that a hybrid control plane — combining provider-native orchestration with external orchestration — was the leading expected architecture, holding around 35% to 36% across the two substantive waves. Provider-managed-only approaches grew modestly but remained a minority. The report’s conclusion is blunt: enterprises are not willing to give full orchestration control to any single provider. It makes total sense as enterprises seek to leverage the "best-in-breed" models, harnesses, and tools from multiple vendors, especially as their needs differ widely across sector, business, and size. "Most enterprises will operate in a multi-model, multi-agent environment, which makes an independent control plane essential," agreed Felix Van de Maele, CEO of Collibra, a unified data governance startup for AI, in a statement to VentureBeat. "That is why we built AI Command Center: to give organizations the visibility, governance, and real-time oversight needed to manage AI systems and agents across the full lifecycle." That caution shows up in the risk data. When asked about risks if agent control lives inside a model provider platform, respondents cited security and permissioning limitations as the top concern. Vendor lock-in was the second-largest concern and the only one that increased from January to February, rising from 23.2% to 25.7%. This is the tension at the heart of the agent market. Enterprises want managed infrastructure because building reliable agents is hard. But the more a provider manages, the more it may own. Dr. Rania Khalaf, chief AI officer at WSO2 — the subsidiary of EQT that offers open source, customizable AI stacks for enterprises — said enterprises will need an agent control plane that sits apart from individual frameworks, harnesses and runtimes because agents combine the unpredictability of LLMs with the ability to take actions that have consequences. “Teams want the freedom to use the best model and framework for each job — Claude for coding, Gemini for writing, LangGraph or CrewAI for dynamic modular behavior — and that heterogeneity makes consistent governance untenable in integrated platforms that lock into one ecosystem,” Khalaf said. From LLMOps to Agent Ops Khalaf said the industry is also moving from MLOps to LLMOps to “Agent Ops,” where governance has to cover the whole agent, not just the model call. “A guardrail on an LLM call can catch hallucination or toxic output, but it will not catch an agent thrashing in an unbreakable, costly loop, which is why governance now has to extend out from the LLM interaction to the scope of the agent,” she said. The practical implication is that enterprises need to separate policy and control from the agent logic itself. Khalaf pointed to the recent example of an agent deleting a production database despite being told not to, arguing that the failure showed the limits of relying on prompt-level instructions where hard identity and access controls are needed. “Pulling guardrails, evals, policies, bindings, and agent identity out of the core agent logic allows them to be configured per deployment and per environment, owned by the appropriate teams in security, product, and compliance, without fragmenting the governance layer as different teams choose different models and frameworks,” Khalaf said. MCP is open. The runtime may still be sticky That is where Anthropic’s Model Context Protocol, or MCP, complicates the story. MCP is not a walled garden; Anthropic introduced it as an open standard for connecting AI systems to data and tools, and Anthropic’s documentation describes MCP as an open-source standard for connecting AI applications to external systems. But openness at the protocol layer does not automatically eliminate lock-in at the runtime layer. An enterprise could use an open protocol to connect tools while still becoming dependent on a provider’s managed sessions, logs, sandboxes, permissions model, workflow state and deployment environment. In other words, MCP may reduce integration friction, while managed agent infrastructure could still increase switching costs. Khalaf said Microsoft’s lead likely reflects its M365 and Azure distribution, while Anthropic’s emerging foothold could reflect a different architectural bet around open protocols such as MCP. But she argued the long-term direction is not a single-provider stack. “Enterprises serious about running agents in production will end up multi-vendor across these layers,” Khalaf said, “which is why the open and interoperable control plane matters more than the current percentages might suggest.” The next cycle may be cross-vendor collaboration That same tension — between provider-native convenience and cross-vendor reality — is where Arick Goomanovsky, CEO and cofounder of universal AI agent orchestrator startup BAND, sees the next competitive cycle forming. “Enterprises now run agents everywhere: individual assistants and coding agents, multi-agent systems in production, agents embedded in Agentforce and ServiceNow, and third-party agents consumed as agent-as-a-service,” Goomanovsky said. “None of them collaborate across those boundaries by default.” Goomanovsky argues that the missing layer is not just orchestration inside a single model provider, but a cross-vendor collaboration layer that lets agents from different ecosystems act together. “What’s emerging in parallel is demand for an agentic collaboration harness - an interaction layer that lets agents from Microsoft, OpenAI, Anthropic, and internal teams operate as one workforce,” he said. “Orchestration inside any single vendor is still a walled garden so the next competitive cycle is cross-vendor agent collaboration.” Independent frameworks face an enterprise packaging problem There is also a warning sign for independent orchestration frameworks. LangChain and LangGraph fell from 5.4% to 1.4% as the primary orchestration platform in the qualified enterprise sample. External orchestration abstracted entirely from model providers also fell from 8.9% to 2.9%. Scott Likens, Global Chief AI Engineer at professional services giant PwC, has a front row seat to this trend as the company spearheads and assists clients with their AI transformations. As he told VentureBeat in a statement: "Right now, most enterprises are still operating in fragmented environments, with orchestration spread across platforms, business applications, and internally developed tooling. Over time, the market will likely move toward more unified orchestration models, but interoperability, governance and security will remain critical because enterprises are unlikely to standardize on a single agent ecosystem." The report argues that fully independent orchestration frameworks may not yet have the enterprise packaging — security certifications, support, compliance documentation and vendor accountability — that procurement teams require. That does not mean open frameworks are irrelevant. It does suggest that enterprise buyers may increasingly consume open or developer-first orchestration through managed products, cloud-provider partnerships or internal control planes rather than as standalone frameworks. The agent market starts to look like cloud infrastructure This is where the agent market starts to look less like the early chatbot market and more like enterprise cloud infrastructure. The winning vendors will not only have capable models. They will have identity integration, permission controls, audit logs, observability, workflow tooling, sandboxing, evaluation and a credible answer to who owns the control plane. Indeed, the orchestration layer is but one part of the stack that the enterprise must fill in, and enterprises may actually decide to have different orchestration layers for agents working in different departments and functions. As Nithya Lakshmanan, Chief Product Officer at revenue team AI orchestration startup Outreach.ai wrote in a statement to VentureBeat: "General-purpose orchestration platforms coordinate agent activity well, but they don't carry the workflow-specific context that determines whether an agent's action is correct for a given situation. In revenue workflows, an agent acting on incomplete deal history or missing buyer context will underperform and erode trust with users. The teams getting the most out of multi-agent systems are treating domain-specific data as the governance layer, with orchestration sitting on top. Most enterprises have chosen their orchestration stack, and what they're now figuring out is how those platforms get access to the workflow context they need to make agents useful inside specific business functions." That is why Anthropic — which is increasingly launching its own domain-specific agents for finance and design, among other categories — is worth following closely. The company does not need to win the entire orchestration market tomorrow for its strategy to matter. It only needs to persuade a growing set of Claude enterprise customers to let Anthropic handle more of the surrounding machinery: tools, workflows, memory, execution and governance. If it succeeds, Claude becomes more than a model in a multi-model portfolio. It becomes part of the infrastructure where enterprise work gets done. That would put Anthropic in a more direct fight with OpenAI and Microsoft — not just over model quality, but over the operating layer of AI agents. The narrow but important read The safe interpretation of the VB Pulse data is narrow but important: Anthropic is not yet a major enterprise orchestration platform. Microsoft is. OpenAI is much closer. But Anthropic has registered its first measurable foothold at the orchestration layer, just as the market is deciding who should control agent execution. For enterprise buyers, that may be the question that matters most in 2026. Not which model is best, but which provider gets to run the agent — and how hard it will be to leave once the agent is running.