6 min readfrom VentureBeat

Stanford's DeLM cuts multi-agent task costs 50% — without a central orchestrator

Our take

Stanford researchers have demonstrated a significant advancement in multi-agent AI systems with DeLM, a decentralized language model that slashes task costs by 50%—all without relying on a central orchestrator. This framework challenges the conventional AI paradigm requiring a "boss" agent, revealing that direct agent coordination via a shared knowledge base can be both effective and far more efficient.
Stanford's DeLM cuts multi-agent task costs 50% — without a central orchestrator

The prevailing architectural paradigm in AI development often assumes a hierarchical structure – a central orchestrator dictating the flow of information and tasks amongst a network of agents. This model, while seemingly intuitive, may be introducing unnecessary bottlenecks and inefficiencies. Stanford’s recent work on DeLM, a decentralized language model, challenges this assumption head-on, proposing a system where agents can coordinate directly, eliminating the need for a central "boss." This shift is particularly relevant given the rapid growth of AI-powered workflows and the increasing demands on computational resources. Consider the efforts of companies like Probably, who are focused on building more reliable AI to prevent factual errors Probably raises $9M to build a more reliable kind of AI, or Plaud, working to improve meeting notetaking with AI Plaud says its software business topped $100M in ARR after shipping over 2M AI notetakers; both demonstrate the growing complexity and resource intensity of even seemingly narrow AI applications, making DeLM’s efficiency gains all the more significant.

The core innovation of DeLM lies in its shared knowledge base – a curated repository of "gists" or information summaries accessible to all agents. Instead of funneling every update through a central controller, agents directly contribute to and draw from this shared context, fostering parallel exploration and collaboration. This approach addresses the inherent limitations of centralized systems, where the orchestrator can become a communication bottleneck, potentially diluting or distorting valuable information. The framework’s performance on benchmarks like SWE-bench Verified and LongBench‑v2 Multi‑Doc QA, demonstrating significant cost reductions and accuracy improvements, strongly suggests that this decentralized model isn’t just a theoretical curiosity but a practical solution for scaling AI tasks. The success also highlights the growing importance of efficient resource usage, particularly as AI models become increasingly complex and computationally expensive—a trend that even the space industry is keenly aware of SpaceX is public: Everything you need to know post-IPO.

Beyond the immediate cost savings, DeLM’s architecture offers a fundamentally more robust and adaptable approach to multi-agent AI. The ability for agents to share failures, inherit constraints, and avoid redundant exploration represents a significant leap forward in coordination efficiency. The "coarse-to-fine" access to information, allowing agents to selectively unfold details as needed, further optimizes resource utilization and prevents information overload. This level of granularity and adaptability has particular implications for complex tasks such as software engineering and long-context reasoning, where the ability to manage and leverage vast amounts of information is critical. The framework’s modular design also allows for easier integration with existing LLMs and workflows, potentially accelerating its adoption across various industries.

Ultimately, DeLM's success prompts a crucial re-evaluation of architectural assumptions in AI development. The results strongly suggest that relinquishing centralized control, and embracing a more decentralized, collaborative model, can unlock substantial performance gains and cost efficiencies. As AI becomes increasingly integrated into every facet of business and daily life, the question is no longer whether decentralization is desirable, but how quickly we can move beyond the established hierarchical patterns and embrace more adaptive, efficient, and scalable AI architectures. The future likely holds a proliferation of agentic systems; will DeLM's approach become a foundational paradigm, or will other decentralized architectures emerge to challenge its dominance?

One of the assumptions behind today’s AI frameworks is that agents require a “boss” at the center; this orchestrator runs the show, routes requests, and makes sure the whole system doesn’t descend into chaos.

That assumption may be wrong, and the cost of carrying it could be measured in inference dollars and coordination latency. A new Stanford framework called a decentralized language model, or DeLM, is built on the premise that agents can coordinate directly, without routing every update through a central controller.

DeLM's shared knowledge base serves as a “common communication substrate” so that agents can build upon one another’s verified progress without having to route every interaction through a main agent to “merge, filter, and rebroadcast,” Yuzhen Mao and Azalia Mirhoseini, co-developers of the framework, explain in a research paper.

It’s a system that’s not only possible, but desirable in certain instances. “Agents can build on prior findings, avoid repeated failures, preserve constraints, and recover detailed evidence only when needed.”

The challenges of traditional multi-agent systems

In a typical centralized multi-agent system, a main agent breaks tasks into subtasks, assigns them out to multiple sub-agents in parallel, waits for responses, merges and summarizes intermediate progress, then launches a next wave of orders based on collected context.

While this is a natural way to scale LLM reasoning, the Stanford researchers argue that it scales poorly. Every useful finding, partial finding, and failure must be reported back to the main agent, which then determines what information to merge and rebroadcast to the agents below it.

“As the number of subtasks grows, this controller becomes a communication and integration bottleneck,” Mao and Mirhoseini write. Further, the main orchestrator may “dilute, omit, or distort” useful information, leading to lost progress.

This bottleneck also occurs in long-context reasoning scenarios. Once it receives reports back from subagents, a main agent will typically group related concepts, data points, and other materials together in an unsupervised learning loop. It may then pre-assign these "evidence clusters" to sub-agents before knowing what surfaced material is actually relevant or whether it’s combined correctly.

When a subagent receives this insufficient context, it will essentially get confused and return to the main agent, kicking off another retrieval or delegation round. “This back-and-forth makes coordination slower, more iterative, and increasingly constrained by a single overloaded main agent,” the researchers write.

What DeLM addresses and how it works

DeLM, by contrast, is built around parallel agents, a shared context, and a task queue.

Shared context is essentially a curated store of “gists,” or information summaries that other agents might find useful. These include verified and evidence-based findings alongside partial findings and documented failures; they also point to detailed evidence that agents can pull from based on their specific task.

A task queue is then a set of subsequent pending subtasks that agents can claim independently.

“Agents write compact, verified updates into a shared context that later agents can read directly,” the researchers write. Useful findings, failures, and constraints accumulate as a “shared problem state,” rather than passing through a central controller.

The pipeline looks like this:

  • Initialization: Inputs are broken into different work units and added to a queue;

  • Parallel execution: Agents work independently and in tandem, pulling tasks and reading shared context as they progress.

  • Compression and verification: Results are compressed into reusable “gists” that are checked against supporting evidence. Only gists that are fully verified are shared with the group.

  • Additional work (if needed): When the queue is emptied, the last agent to return an answer inspects all the shared context to determine whether further work is required.

  • Final step: The last agent determines that no more steps are required and returns the final answer.

Agents “exchange progress through shared state, asynchronously claim ready tasks, and scale more adaptively as the number of subtasks grows,” the researchers explain.

How DeLM performs in the wild

With DeLM, agents can avoid redundant exploration; reuse and build on each other’s discoveries and failures; and focus on unresolved issues.

The framework can be particularly useful in software engineering test-time scaling, when models are given time to “think” to improve their reasoning and problem-solving capabilities. Different agents can explore their own hypotheses or pursue reasoning paths in parallel, while still sharing intermediate progress. One example is concurrent de-bugging.

DeLM is also suitable for long-context reasoning and multi-document question-answering; agents can simultaneously examine their own evidence clusters (collections of papers, code, or other materials) at the same time, while maintaining a “global compact view” of accumulated evidence.

The researchers contend that it makes agentic tasks more accurate and significantly cheaper. This is backed by its performance on real-world benchmarks: On SWE-bench Verified — which evaluates how well AI models and agents solve real-world software engineering problems — it performed 10.5% better than the strongest baseline and reduced cost per task by roughly 50%.

But it can go beyond coding: On LongBench‑v2 Multi‑Doc QA — which assesses LLMs’ ability to handle long-context, real-world problems — DeLM had the highest accuracy across four model families, including GPT‑5.4, Claude Sonnet, Gemini Flash, and DeepSeek‑V4‑Pro.

DeLM outperforms other models on SWE-Bench for a number of reasons, as Mao detailed on X.

First, agents share failures. In ordinary parallel runs, when one agent follows the wrong path, that failure stays private, and subsequent agents may waste time (and money) pursuing the same dead end. But with DeLM, failed hypotheses are written into shared context.

“Later agents can read them as constraints, avoid repeated exploration, and redirect their search toward more promising fixes,” Mao said.

Additionally, constraints, once verified, are immediately added to agents’ shared context. This means they become a binding shared state. “Later agents inherit them, build around them, and avoid repeating globally invalid simplifications,” Mao said.

Crucially, DeLM keeps shared progress compact enough to reuse. It is unfoldable, meaning agents see short gists by default, but can choose to unfold them into more detailed summaries and raw evidence.

As the researchers note, providing all raw documents and traces gives agents the maximum amount of information, but that can overwhelm their context windows and ultimately increase costs.

“If agents shared full traces, each worker would need to read long command histories, file dumps, failed edits, and intermediate reasoning, turning coordination itself into another long-context bottleneck,” Mao said.

On the other hand, while sharing compact summaries is cheaper, important details and evidence can be lost, resulting in less reliable reasoning.

Unfolding, therefore, provides “coarse-to-fine” opt-in access. This can improve accuracy and cost.

Ultimately, with a framework like DeLM, agents can be more efficient because they are prevented from repeatedly reading the same documents or rerunning the same failed analysis; more effective because useful findings are propagated across parallel threads; and more robust because they only share verified claims.

For enterprise builders, DeLM challenges a core assumption: that every multi-agent workflow needs a central controller. The SWE-bench and LongBench-v2 results suggest the decentralized model isn't just theoretically cleaner — it's faster, more accurate, and roughly half the cost.

Read on the original site

Open the publisher's page for the full experience

View original article

Tagged with

#real-time data collaboration#real-time collaboration#natural language processing for spreadsheets#generative AI for data analysis#Excel alternatives for data analysis#financial modeling with spreadsheets#cloud-based spreadsheet applications#digital transformation in spreadsheet software#conversational data analysis#natural language processing#enterprise data management#big data performance#data analysis tools#big data management in spreadsheets#machine learning in spreadsheet applications#enterprise-level spreadsheet solutions#intelligent data visualization#no-code spreadsheet solutions#AutoML capabilities#data visualization tools