May 18, 2026•7 min read•from VentureBeat

Context architecture is replacing RAG as agentic AI pushes enterprise retrieval to its limits

Our take

As agentic AI pushes enterprise retrieval to new limits, Redis introduces Redis Iris—a context and memory platform designed to optimize data management for AI agents. Unlike traditional retrieval layers built for human-scale queries, Iris addresses the structural challenges posed by an explosion of data requests from agents. By enabling real-time data ingestion and auto-generating tools for efficient querying, Redis Iris transforms how enterprises approach their data infrastructure. For further insights on evolving AI technologies, check out our article on "Anthropic's acquisition of Stainless."

Context architecture is replacing RAG as agentic AI pushes enterprise retrieval to its limits

The introduction of Redis Iris marks a significant pivot in how enterprises approach AI data management, particularly as production AI agents generate an exponentially larger volume of data requests than traditional human users. In an era where agentic AI is becoming more prevalent, the existing retrieval architectures—designed with human-scale interactions in mind—are struggling to keep pace. This structural mismatch has prompted a reevaluation of how organizations think about data retrieval and context architecture, highlighting the need for more sophisticated solutions that can dynamically support these AI agents. It echoes broader trends in the industry, such as the recent acquisition of a dev tools startup by Anthropic, which underscores the necessity for robust infrastructure that can seamlessly integrate with the evolving landscape of AI technologies.

Redis Iris functions as a context and memory platform, bridging the gap between AI agents and the data necessary for them to operate effectively. The platform integrates real-time data ingestion with a semantic interface, enabling agents to pull relevant information at runtime rather than relying on pre-loaded data. This shift is crucial, as it allows for a more efficient data retrieval process, aligning with the operational needs of AI agents who cannot write their own middleware. As Rowan Trollope, CEO of Redis, aptly illustrates, this is akin to having a refrigerator stocked with food at home rather than needing to run to the grocery store every time one wants to make a sandwich. This analogy encapsulates the transition from a static, human-centric data architecture to a dynamic, agent-focused one.

The implications of this development extend beyond just the technological innovations offered by Redis. As enterprises increasingly recognize the limitations of their existing retrieval systems, there is a growing investment in optimizing data context and memory capabilities. According to the latest data from VentureBeat, buyer intent for hybrid retrieval solutions has surged, reflecting a fundamental shift in the market's priorities. With retrieval optimization overtaking evaluation as the top investment focus, organizations are beginning to understand that simply deploying AI agents is not enough; they require a robust context layer to ensure these agents operate efficiently. This sentiment is echoed in discussions surrounding the significance of context in AI systems, as highlighted in the ongoing dialogue about the need for data-intensive applications.

Looking ahead, the challenge will be not just in adopting these new context architectures but also in effectively governing them. As Stephanie Walter from HyperFRAME Research points out, the future of agentic AI hinges on creating context layers that are not only fast and efficient but also secure and manageable. The successful integration of these systems will require a disciplined approach to defining and maintaining data governance, ensuring that as organizations scale their AI workloads, they do not inadvertently create new risks or cost centers.

As we observe the market's transition from traditional RAG infrastructures to more context-focused architectures, one question looms large: How will organizations adapt their strategies to ensure they are not only keeping up with technological advancements but also fully leveraging the potential of their AI agents? The evolution of data management in the age of AI is not just an IT challenge; it is a strategic imperative that will define competitive advantage in the years to come.

Redis built its name as the caching layer that kept web applications from collapsing under load. The problem it is targeting now has the same structure but is harder to solve: production AI agents failing not because the models are wrong, but because the data underneath them is scattered, stale and structured for humans rather than machines. Retrieval pipelines built for single queries cannot absorb the volume agents generate.

The gap Redis is targeting is structural: agents make orders of magnitude more data requests than human users, but most retrieval layers were built for the human-scale problem. Redis Iris, launched Monday, is the company's answer: a context and memory platform that sits between an agent and the data it needs to act. The platform combines real-time data ingestion, a semantic interface that auto-generates MCP tools from business data models, and an agent memory server built on Redis Flex, a rewritten storage engine that runs 99% of data on flash at a tenth of the cost of in-memory storage alone.

The announcement lands as enterprise RAG infrastructure is in active transition. VentureBeat's Q1 2026 VB Pulse RAG Infrastructure Market Tracker found buyer intent to adopt hybrid retrieval tripling from 10.3% to 33.3% between January and March. Retrieval optimization surpassed evaluation as the top enterprise investment priority for the first time. Custom in-house retrieval stacks rose from 24.1% to 35.6% as enterprises outgrew off-the-shelf options. Redis is not the only infrastructure vendor reading those signals — several data platform providers have repositioned around agent context layers in recent weeks.

The scale mismatch is the structural argument behind the launch. "Companies will have orders of magnitude more agents than human beings," Rowan Trollope, CEO of Redis, told VentureBeat. "Orders of magnitude more agents than human beings means orders of magnitude more load on back end systems."

From cache to context

Trollope traces the parallel back to the mobile era: When legacy backends built for branch tellers suddenly had to serve a million smartphone users, Redis became the caching layer that absorbed the load without a full rebuild.

What is different this time is that agents cannot write their own middleware. In the mobile era, a developer would sit with a database administrator, identify the queries an application needed and hard-code the caching logic into a middleware layer. Agents cannot do that. They need to find the right data at runtime, through interfaces built for them in advance, or they stall.

"This is like the analogy of the grocery store in the fridge," he said. "If every time you have to go make your sandwich, you have to run to the grocery store to get the food, that's not very efficient. You put a fridge in every house, you store a little bit of food there. And that's kind of where we still tend to exist in the infrastructure stack."

What Redis Iris includes

Iris ships five components that together cover data ingestion, semantic access, memory and caching.

Redis Data Integration. Now in general availability. RDI uses change data capture pipelines to sync data from relational databases, warehouses and document stores into Redis continuously, with connectors for Oracle, Snowflake, Databricks and Postgres.

Context Retriever. Now in preview. Developers define a semantic model of business data using pydantic models and Redis auto-generates MCP tools agents use to query it directly, with row-level access controls enforced server-side. Trollope describes the shift from classic RAG as a directional inversion. "It's just a flip to let the agent pull the data instead of presupposing and stuffing it into the pipeline," he said.

Agent Memory. Now in preview. Stores short and long-term state across sessions so agents carry context without re-deriving it on each turn.

Redis Flex. A rewritten storage engine that runs 99% of data on SSDs and 1% in RAM, delivering petabyte-scale retrieval at sub-millisecond latencies.

Redis Search and LangCache. The retrieval and semantic caching backbone underneath the platform. LangCache reduces redundant model calls by caching prompt responses.

What analysts say

The data industry is generally heading in the same direction now. Every major database vendor is making a context layer argument.

Traditional database vendors including Oracle are integrating context and memory layers to bring relational databases into the agentic AI era. Purpose-built vector database vendors including Pinecone are doing the same, building out a new knowledge layer for agentic AI context. Standalone context layers like Hindsight are also part of the emerging landscape.

Trollope frames Redis's position as structurally different from that competition.

"For us to win, no one else has to lose," he said. Many Redis deployments already run MongoDB or Oracle as the backend system of record. Iris reflects and caches from those systems rather than displacing them. Redis is launching Iris in the Snowflake marketplace with native connectors.

Stephanie Walter, Practice Leader for AI Stack at HyperFRAME Research, puts the market context plainly. "The market is converging on the same conclusion: agents don't just need more tokens or better models. They need governed, current, low-latency context," Walter said.

Her read on Redis's differentiation focuses on where Redis already sits in the stack, which is close to runtime, latency-sensitive operational state, and real-time data.,

"The pitch is not 'better RAG' as much as 'agents need live context, memory, and fast retrieval while they are actually working," she said.

Whether it's Redis or another vendor, every context layer technology will face a governance challenge to be successful.

"Agentic AI will not scale in the enterprise if every agent becomes a new cost center, a new data access risk, and a new governance exception," she said. "The winning context layers will be the ones that make agents faster, cheaper, and safer to run."

For real-time clinical AI, getting context wrong is not an option

Mangoes.ai is one company that has already had to answer those questions in production, under conditions where the cost of getting context wrong is measured in patient outcomes.

Amit Lamba, founder and CEO of Mangoes.ai, runs a real-time voice AI platform deployed across large healthcare facilities where patients and clinicians ask live questions about treatment, scheduling and case history. Mangoes.ai built its stack natively on Redis from the start.

"Retrieval, memory, and session state all run through Redis, so we're not stitching together separate tools and hoping they talk to each other," Lamba said.

The problem Iris's dynamic memory capability addresses is what happens across a complex session.

"Think about a one-hour group therapy session," Lamba said. "You need to know who said what, when, and be able to surface the right information to the therapist in the moment. That's not a simple retrieval problem."

The platform runs multiple specialized agents in parallel, one for entity identification, one for relationship reasoning and one for integrating case history. "The dynamic memory capability maps almost perfectly to the problem we're solving," Lamba said.

What this means for enterprises

For enterprises that built their AI stack around RAG, the retrieval layer that got them to production is no longer enough to keep them there The RAG era is giving way to context architecture. The classic RAG model pushed data into the agent before the model was called. Production deployments are flipping that: agents pull what they need at runtime through tool calls, treating the data layer as a live resource rather than a pre-loaded payload. Teams still optimizing RAG pipelines are solving last year's problem.

The semantic layer is now production infrastructure. The model that defines business entities, their relationships and the access rules between them needs to be built, versioned and maintained with the same discipline as a data pipeline. Most organizations have not staffed or structured for that work. The enterprises that define their context architecture now are the ones that will not have to rebuild it when agent workloads scale.

Budget is already moving. VB Pulse Q1 2026 data shows retrieval optimization investment rising from 19% to 28.9% across the quarter, overtaking evaluation spending for the first time. Organizations that spent the previous year measuring their retrieval quality are now spending to fix it. The context layer is an active procurement decision, not a roadmap item.

"The first buyer question should not be 'Do I need a vector database, long context, memory, or a context engine?' It should be 'What does this agent need to know, how fresh must that knowledge be, who is allowed to access it, and what does every retrieval cost?'" Walter said.

Read on the original site

Open the publisher's page for the full experience

View original article →

The retrieval rebuild: Why hybrid retrieval intent tripled as enterprise RAG programs hit the scale wallSomething shifted in enterprise RAG in Q1 2026. VB Pulse data spanning January through March tells a consistent story: the market stopped adding retrieval layers and started fixing the ones it already has. Call it the retrieval rebuild. The survey covered three consecutive monthly waves from organizations with 100 or more employees, with between 45 and 58 qualified respondents per month across platform adoption, buyer intent, architecture outlook and evaluation criteria. The data should be treated as directional. Enterprise intent to adopt hybrid retrieval tripled from 10.3% to 33.3% in a single quarter — even as 22% of qualified enterprise respondents reported having no production RAG systems at all. For data engineers and enterprise architects building agentic AI infrastructure, the data reveals a market in active transition: the RAG architecture most enterprises built to scale is not the one they expect to run by year-end. Hybrid retrieval has become the consensus enterprise strategy. Unlike single-method RAG pipelines that rely on vector similarity alone, hybrid retrieval combines dense embeddings with sparse keyword search and reranking layers, trading simplicity for the retrieval accuracy and access control that production agentic workloads require. The standalone vector database category is under pressure. Weaviate, Milvus, Pinecone and Qdrant each lost adoption share across the quarter in the VB Pulse data. Custom stacks and provider-native retrieval are absorbing their displaced share. A growing minority of enterprises are stepping back from RAG altogether — a signal that the market's maturity narrative has meaningful exceptions. Organizations that went wide on RAG in 2025 are hitting the same failure point: the architecture built for document retrieval does not hold at agentic scale. Enterprises that scaled RAG fast are now paying to rebuild it The two largest intent movements in Q1 are directly connected — enterprises confronting retrieval quality problems at scale, and hybrid retrieval emerging as the consensus answer. Investment priorities shifted in parallel. Evaluation and relevance testing led budget intent in January at 32.8% and fell to 15.6% by March. Retrieval optimization moved in the opposite direction, from 19.0% to 28.9% — overtaking evaluation as the top growth investment area for the first time. Steven Dickens, vice president and practice lead at HyperFRAME Research, described the operational burden enterprise data teams are facing in a VentureBeat interview in March on Oracle's agentic AI data stack. "Data teams are exhausted by fragmentation fatigue," Dickens said. "Managing a separate vector store, graph database and relational system just to power one agent is a DevOps nightmare." That fatigue shows directly in the platform data. The custom stack rise to 35.6% is not a rejection of managed retrieval — many organizations run both. It is a consolidation response from engineering teams that have hit the limits of assembling too many components. Not every enterprise has made it that far. The VB Pulse data includes a signal that complicates the market's overall growth narrative: 22.2% of qualified respondents reported no production RAG by March, up from 8.6% in January. The report attributes this cohort to organizations that have "not yet committed to any retrieval infrastructure, or have paused programs" — concentrated in Healthcare, Education and Government, the same sectors showing the highest rates of flat budgets. Standalone vector databases are losing the adoption argument but winning the reliability one Recent reporting by VentureBeat illustrates why the dedicated retrieval layer still matters in production. Two enterprises building on Qdrant show why purpose-built vector infrastructure still wins in production. &AI builds patent litigation infrastructure and runs semantic search across hundreds of millions of documents. Grounding every result in a real source document is not optional — patent attorneys will not act on AI-generated text. That requirement makes the architectural choice clear. "The agent is the interface," Herbie Turner, &AI's founder and CTO, told VentureBeat in March. "The vector database is the ground truth." GlassDollar, a startup that helps Siemens and Mahle evaluate startups, runs an agentic retrieval pattern across a corpus approaching 10 million indexed documents. A single user prompt fans out into multiple parallel queries, each retrieving candidates from a different angle before results are combined and re-ranked. That query volume and precision requirement is what drove the choice of purpose-built vector infrastructure. "We measure success by recall," Kamen Kanev, GlassDollar's head of product, told VentureBeat in March. "If the best companies aren't in the results, nothing else matters. The user loses trust." The VB Pulse data shows that framing — retrieval as ground truth rather than feature — is gaining traction across the broader enterprise market, even as standalone vector database adoption declines. Why enterprises say they need a dedicated vector layer shifted significantly across Q1. In January the top reasons were access control complexity (20.7%) and retrieval precision (19.0%). By March, operational reliability at scale had surged to 31.1% — more than doubling and overtaking everything else. Enterprises are no longer keeping vector infrastructure primarily for precision. They are keeping it because it is the part of the stack they can rely on when query volumes scale. How enterprises are redefining what good retrieval means How enterprises judge their retrieval systems shifted notably across Q1 — and the direction of that shift points to a market getting more sophisticated about what good retrieval actually means. In January, response correctness dominated evaluation criteria at 67.2% — far above anything else. By March, response correctness (53.3%), retrieval accuracy (53.3%) and answer relevance (53.3%) had converged exactly. Getting the right answer is no longer enough if it came from the wrong document or missed the context of the question. Answer relevance was the only criterion that rose across the quarter, gaining five percentage points. It is also the hardest to measure — whether the retrieved context is actually the right context for that specific question requires purpose-built evaluation infrastructure, not just pass-or-fail correctness checks. Its rise signals that a meaningful share of enterprise buyers have moved past basic RAG testing entirely. The market's verdict: RAG isn't dead. The original architecture is The "RAG is dead" narrative had real momentum heading into 2026. It rested on two claims. The first: that long-context windows — models capable of processing hundreds of thousands of tokens in a single prompt — would make dedicated retrieval unnecessary. The second: that agentic memory systems, which store what an agent learns across sessions rather than retrieving it fresh each time, would absorb the knowledge access problem entirely. The VB Pulse data is the enterprise market's answer to the first claim. The long-context-as-dominant-architecture position collapsed from 15.5% in January to 3.5% in February before partially recovering to 6.7% in March. January's sample was heavily weighted toward Technology and Software respondents — the segment most exposed to long-context model announcements in late 2025. As the sample diversified, the position evaporated. On the memory question, Jonathan Frankle, chief AI scientist at Databricks, framed the architecture clearly in a March interview with VentureBeat: a vector database with millions of entries sits at the base of the agentic memory stack, too large to fit in context. The LLM context window sits at the top. Between them, new caching and compression layers are emerging — but none of them replace the retrieval layer at the base. New agentic memory systems like Hindsight, developed by Vectorize, and observational memory approaches like those in the Mastra framework address session continuity and agent context over time — a different problem than high-recall search across millions of changing enterprise documents. The most consequential signal: the share of respondents not expecting large-scale RAG deployments by year-end grew from 3.4% to 15.6% — nearly 5x. That is not a verdict against retrieval. It is a verdict against the retrieval architecture most enterprises built first. The retrieval rebuild is not optional The retrieval rebuild is the cost of scaling RAG without first deciding what architecture could actually support it. If your organization is among the 43.1% that entered Q1 planning to expand RAG into more workflows, the VB Pulse data suggests that plan has already changed for many of your peers — and may need to change for you. Hybrid retrieval is the consensus destination. Custom stack growth to 35.6% reflects teams building retrieval infrastructure around requirements that off-the-shelf products do not fully address. RAG is not dead. The architecture most enterprises used to implement it is. The data suggests the rebuild is not a future decision. For 33% of enterprises, the rebuild is already the stated priority.

Tagged with

#rows.com