The AI scaffolding layer is collapsing. LlamaIndex's CEO explains what survives.
Our take

The landscape of AI and data management is undergoing a significant transformation, especially as the once-essential scaffolding layer for developing large language model (LLM) applications begins to collapse. As Jerry Liu, co-founder and CEO of LlamaIndex, articulates in a recent episode of the VentureBeat Beyond the Pilot podcast, this shift is not a setback but rather a natural evolution of the technology. The diminishing need for complex frameworks, such as indexing layers and retrieval pipelines, signals a movement toward simplicity and efficiency in building LLM applications. This evolution raises important questions about the future roles of developers and the tools they will rely on, particularly as we see a growing emphasis on context as a competitive advantage. For deeper insights on the implications of context in AI, you may want to explore Why AI breaks without context — and how to fix it.
As Liu highlights, the advancements in AI capabilities have reached a point where models can now process vast amounts of unstructured data more effectively than humans. This development allows for self-correction and multi-step planning, fundamentally changing how developers interact with these technologies. The fact that 95% of LlamaIndex's code is now generated by AI underscores a pivotal shift in programming paradigms. Developers are increasingly able to communicate with machines in natural language rather than relying on complex coding languages. This transition not only democratizes access to advanced AI capabilities but also challenges the traditional roles of software engineers.
With the collapse of the scaffolding layer, context has emerged as a crucial differentiator in the realm of AI. Liu argues that understanding file formats and extracting relevant information is paramount for developing effective applications. This necessity positions LlamaIndex favorably as it harnesses innovative techniques in agentic document processing, such as optical character recognition (OCR). By focusing on context, developers can enhance accuracy and reduce costs associated with data parsing—an essential capability in a world where the quantity of data continues to expand exponentially. Companies must be prepared to adapt their tech stacks to leverage these advancements, leading to a more modular and flexible approach to software development. For a related perspective on the implications of proprietary systems in AI, consider reading Anthropic wants to own your agent's memory, evals, and orchestration — and that should make enterprises nervous.
Looking ahead, it is essential for businesses and developers to embrace this new paradigm. The traditional methods of building AI applications are becoming outdated, and a shift toward more modular, context-driven frameworks is inevitable. As Liu notes, the landscape will continue to evolve with each new model release, necessitating an agile approach to development that avoids overcomplicating systems. This invites a broader conversation about the future of AI in the workplace: How will teams leverage these advancements to enhance productivity and innovation? As we navigate this rapidly changing environment, the focus must remain on user outcomes and the transformative potential of AI. The question remains: Are organizations ready to adapt to these changes, or will they cling to traditional frameworks that no longer serve their needs?
In this era of rapid advancement, maintaining an adaptable and human-centered approach to technology will be vital for success.
The scaffolding layer that developers once needed to ship LLM applications — indexing layers, query engines, retrieval pipelines, carefully orchestrated agent loops — is collapsing. And according to Jerry Liu, co-founder and CEO of LlamaIndex, that's not a problem. It's the point.
“As a result, there's less of a need for frameworks to actually help users compose these deterministic workflows in a light and shallow manner,” Jerry Liu, co-founder and CEO of LlamaIndex, explains in a new VentureBeat Beyond the Pilot podcast.
Context is becoming the moat
Liu’s LlamaIndex is one of the foremost retrieval-augmented generation (RAG) frameworks connecting private, custom, and domain-specific data to LLMs. But even he acknowledges that these types of frameworks are becoming less relevant.
With every new release, models demonstrate incremental capabilities to reason over “massive amounts” of unstructured data, and they’re getting better at it than humans, he notes. They can be trusted to reason extensively, self-correct, and perform multi-step planning; Modern Context Protocol (MCP) and Claude Agent Skills plug-ins allow models to discover and use tools without requiring integrations for every one independently.
Agent patterns have consolidated toward what Liu calls a "managed agent diagram" — a harness layer combined with tools, MCP connectors, and skills plug-ins, rather than custom-built orchestration for every workflow.
Further, coding agents excel at writing code, meaning devs don’t need to rely on extensive libraries. In fact, about 95% of LlamaIndex code is generated by AI. “Engineers are not actually writing real code,” Liu said. “They're all typing in natural language.” This means the layers between programmers and non-programmers is collapsing, because “the new programming language is essentially English.”
Instead of manual coding or struggling to understand API and document integration, devs can just point Claude Code at it. “This type of stuff was either extremely inefficient or just would break the agent three years ago,” said Liu. “It's just way easier for people to build even relatively advanced retrieval with extremely simple primitives.”
So what’s the core differentiator when the stack collapses?
Context, Liu says. Agents need to be able to decipher file formats to extract the right information. Providing higher accuracy and cheaper parsing becomes key, and LlamaIndex is well-positioned here, he contends, because of its developments with agentic document processing via optical character recognition (OCR).
“We've really identified that there's a core set of data that has been locked up in all these file format containers,” he said. Ultimately, “whether you use OpenAI Codex or Claude Code doesn't really matter. The thing that they all need is context.”
Keeping stacks modular
There’s growing concern about builders like Anthropic locking in session data; in light of this, Liu emphasizes the importance of modularity and agnosticism. Builders shouldn’t bet on any one frontier model, or overbuild in a way that overcomplicates components of the stack.
Retrieval has evolved into “agent-plus-sandbox,” as he describes it, and enterprises must ensure that their code bases are tech debt free and adaptable to changing patterns. They also have to acknowledge that some parts of the stack will eventually need to be thrown away as a matter of course.
“Because with every new model release, there's always a different model that is kind of the winner,” Liu said. “You want to make sure you actually have some flexibility to take advantage of it.”
Listen to the podcast to hear more about:
LlamaIndex’s beginnings as a ‘toy project’ with initially only about 40% accuracy;
How SaaS companies can tap into complicated workflows that must be standardized and repeatable for average knowledge workers;
Why vertical AI companies are taking off and why ‘build versus buy’ is still a very valid question in the agent age.
You can also listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts.
Read on the original site
Open the publisher's page for the full experience
Related Articles
- Anthropic wants to own your agent's memory, evals, and orchestration — and that should make enterprises nervousJust a few weeks after announcing Claude Managed Agents, Anthropic has updated the platform with three new capabilities that collapse infrastructure layers like memory, evaluation, and multi-agent orchestration, into a single runtime. This move could threaten the standalone tools that many enterprises cobble together. The new capabilities — 'Dreaming,' 'Outcomes,' and 'Multi-Agent Orchestration' — aim to make agents inside Claude Managed Agents “more capable at handling complex tasks with minimal steering,” Anthropic said in a press release. Dreaming deals with memory, where agents “reflect” on their many sessions and curate memories so they learns and surface unknown patterns. Outcomes allows teams to define and set specific rubrics to measure an agent's success, while Multi-Agent Orchestration breaks jobs down so a lead agent can delegate to other agents. Claude Managed Agents ideally provides enterprises with a simpler path to deploy agents and embeds orchestration logic in the model layer. It’s an end-to-end platform to manage state, execution graphs, and routing. With the addition of Dreaming, Outcomes and Multi-agent Orchestration, Claude Managed Agents expands capabilities even further and directly competes with tools like LangGraph or CrewAI, as well as external evaluation frameworks, RAG memory architectures, and QA loops. An integration threat Enterprises must now ask: Should we ditch our flexible, modular system in favor of an agent platform that brings almost everything in-house? Anthropic designed Claude Managed Agents to share context, state, and traceability in one place. This means the platform sees every decision agents make, rather than enterprises having to wire separate systems together. It sounds practical to have one platform that does everything. But not all enterprises want a full-service system. Claude Managed Agents already faces criticism that it encourages vendor lock-in because it owns most of the architecture and tools that govern agents. In the current paradigm, an organization may run Managed Agents but keep multi-agent orchestration, memory, or evaluations in a separate space ensures flexibility. The platform offers a fully-hosted runtime, which means memory and orchestration run on infrastructure the enterprise does not own. This can become a compliance nightmare for some organizations that have to prove data residency. Another problem to consider is that enterprises already in the middle of large-scale AI transformations must cobble together workarounds to deal with the constraints of their tech stack. Not every workflow is easily replaceable by switching to Claude Managed Agents. Dreaming and outcomes against current tools Most enterprises have a fragmented approach to AI deployment. For example, they may use LangGraph or Crew AI for agent routing and workflow management, Pinecone as a vector database for long-term memory, DeepEval for external evaluation, and a human-in-the-loop quality assurance to review some tasks. Anthropic hopes to do away with all of that. With Dreaming, Anthropic approaches memory by allowing users to actively rewrite it between sessions, so the agent essentially learns from its mistakes. Anthropic says this capability is useful for long-running states and orchestration. Current systems often handle memory persistence by storing embeddings, retrieving relevant context, and adding more state over time. Outcomes addresses the evaluation portion by detailing expectations for agents. Instead of external quality checks, which are often done by a team of humans, Anthropic is bringing evaluation into the orchestration layer rather than above it. But it’s the Multi-Agent Orchestration capability that pits Claude Managed Agents against orchestration frameworks from Microsoft, LangChain, CrewAI, and others. Model providers like Anthropic and OpenAI have already begun pushing aggressively into this space, arguing that bringing this to the model layer gives teams better control. Big decisions to make Enterprises face a big decision, and this one could depend on where they are in agent maturity. If an organization is still experimenting with agents and has not deployed many in production, they may find moving to Claude Managed Agents and configuring Dreaming and Outcomes to their needs much easier. This is the stage of development where, even if enterprises are using a third-party orchestrator like LangChain, they’re still customizing it. But for those who are already further along in the process, the calculation becomes trickier. It’s now a matter of parallel evaluation and better understanding of their processes. Businesses, though, will face the same decision even if they don’t intend to use Claude Managed Agents. Anthropic has signaled that other model and platform providers will likely shift their product roadmaps to a similar model that keeps everything locked in the same system — because models may become interchangeable, but the tooling and orchestration infrastructure will not.
- Why AI breaks without context — and how to fix itPresented by Zeta Global The gap between what AI promises and what it delivers is not subtle. The same model can produce precise, useful output in one system and generic, irrelevant results in another. The issue is not the model. It's the context. Most enterprise systems were not built for how AI operates. Data is scattered across tools. Identity is inconsistent. Signals arrive late or not at all. Systems record events but fail to connect them into a continuous view. AI depends on that continuity. Without it, the model fills in the gaps so the result looks polished but lacks relevance. This is where most teams get stuck. A better model does not fix fragmented, stale, or commoditized data. Gartner estimates organizations lose an average of $12.9 million annually due to poor data quality. AI does not solve that problem, it surfaces it faster and at a greater scale. The mirror test There is a fast diagnostic test for this. Give your AI a perfect, high-intent customer signal and see what comes back. If the output is generic or irrelevant, the model needs work. But if the model produces something sharp and useful on clean data, and then falls apart on real production data, the problem is the data. In practice, it is almost always the second scenario. AI functions like a magnifying glass, so strong data systems become dramatically more powerful, and the weak ones become dramatically more visible. Organizations that have been coasting on fragmented, poorly integrated customer data can no longer hide behind reporting lag and manual interpretation. The AI renders the problem in plain sight. Context is the new identity layer This is really where the next evolution gets interesting. Even after you solve the data quality problem, there is still a second shift underway in how customer profiles are built and used. For years, enterprise data systems stored content: transactions in CRMs, demographics in data warehouses, campaign responses in marketing platforms. These records described what had already happened. They were useful for reporting but were not built for AI. AI requires context. Context is not a static record. It is a current view of the customer including recent behavior, cross-channel signals, and emerging intent. The thread that connects one interaction to the next. Identity tells you who someone is. Context tells you what they are doing and what they are likely to do next. Consider a simple example: ask an AI to recommend a beach vacation destination, and it might suggest Hawaii or Florida. Tell it you have three children, and it surfaces family-friendly options. Give it access to your recent search patterns, your affordability signals, and where you have been searching over the past year, and the recommendation changes entirely because the model is no longer working from demographic categories but from a live picture of who you are and what you are doing right now. Most enterprise systems were built to store state, not maintain context. They capture events, but they don’t maintain continuity between them. That’s the gap AI exposes. But for practitioners, the challenge is not conceptual; it is architectural. Context does not live in a single system. It is fragmented across event streams, product analytics tools, CRMs, data warehouses, and real-time pipelines. Stitching that into something an AI system can actually use requires moving from batch-oriented data models to streaming or near-real-time architectures, where signals are continuously ingested, resolved, and made available at inference time. This is where many AI initiatives stall. The model is ready, but the context layer is not operationalized. Systems are not designed to retrieve the right signals within milliseconds, or to resolve identity across channels in real time. Without that, “context” remains theoretical rather than actionable. Architectures like Model Context Protocol (MCP) are accelerating this shift by giving AI systems a way to pass memory about a user between applications, essentially threading a continuous line of context around an individual across different interactions. The result is a profile that becomes richer and more predictive over time, one that creates a line of continuity between what someone has done, what they are doing now, and what they are likely to do next. When that identity layer is strong, the same model produces better outcomes. When it is weak, no model can compensate. The compounding advantage Organizations that built first-party data systems and durable identity infrastructure before the AI wave are now benefiting from a compounding effect. Better data trains smarter models. Smarter models attract more consented users. More consented users generate richer behavioral signals. Competitors without that foundation cannot replicate this, regardless of which model they are running. The gap is structural, not algorithmic, and because identity systems improve incrementally over time, the organizations that started investing earlier have advantages that are genuinely hard to close. What this means in practice The practical implication is a shift in where AI investment goes. The organizations getting consistent results from AI are treating it as a processing layer for a living data system, not as a standalone capability to be bolted onto existing infrastructure. For builders and operators, this translates into a different set of priorities than the last two years of AI experimentation: First, instrument for real-time signals. Batch pipelines and nightly refreshes are not sufficient when AI systems are expected to respond to user intent as it happens. Teams need event-driven architectures that capture and surface behavioral signals in near real time. Second, make context retrievable at inference time. It is not enough to store data in a warehouse. Systems must be designed so that relevant context can be resolved and injected into prompts or retrieved by agents within milliseconds. Third, invest in identity resolution as infrastructure. Connecting fragmented signals across devices and channels so the system understands real individuals rather than anonymous interactions is foundational, not optional. Fourth, treat governance and consent as part of system design. First-party data built on trust is not just safer; it is more durable and ultimately more valuable than third-party data that competitors can access. These investments are less visible than a new model launch and are also far harder to copy. The real race Models are now interchangeable. The difference will come from who can operationalize context at scale and treat the model as a processing layer, not the advantage. That advantage comes from years of investment in identity infrastructure, first-party data, and systems that keep customer context current. The organizations that win won’t be the ones with better prompts. They’ll be the ones whose systems understand the customer before the prompt is ever written. Neej Gore is Chief Data Officer at Zeta Global. Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.