June 24, 2026•4 min read•from VentureBeat

How Shopify built an AI stack that doesn't care which models survive

Our take

Shopify has pioneered a progressive approach to AI infrastructure, building a resilient LLM proxy that grants engineers access to multiple providers—automatically shifting workloads during outages or model updates. This strategy, detailed in a recent VentureBeat podcast, mitigates risk and unlocks reporting capabilities, enabling seamless transitions like the switch from Claude Fable to Opus. Distillation, utilizing smaller, task-specific models like Sidekick, further optimizes performance, achieving up to 30x cost and speed improvements while maintaining accuracy.

How Shopify built an AI stack that doesn't care which models survive

Shopify’s recent approach to building an AI infrastructure, as detailed in their VentureBeat interview, offers a compelling blueprint for enterprises navigating the rapidly evolving landscape of large language models (LLMs). The volatility inherent in the current AI model ecosystem – exemplified by the abrupt shutdown of Claude Fable 5 [When Claude Fable 5 shut down] – demands a more resilient and adaptable strategy than simply relying on a single provider. Their solution, an LLM proxy facilitating seamless failover between multiple providers, avoids the panic and disruption experienced by organizations tethered to a specific model. This mirrors the strategic shift we’re seeing across industries, as evidenced by OpenAI’s unveiling of its first custom chip, Jalapeño [OpenAI unveils its first custom chip, built by Broadcom], signaling a move towards greater control and optimization of AI hardware and software. The willingness to prioritize infrastructure over immediate features, as Shopify emphasizes, is a crucial lesson for organizations building for the long term.

The concept of "distillation," employed by Shopify to create specialized models like Sidekick, is particularly noteworthy. Rather than solely relying on large, general-purpose models, distillation allows for the creation of smaller, more efficient models tailored to specific tasks. This approach, as Thawar highlights, isn't just about cost savings—though the potential for 2x to 30x improvements in speed and cost is significant—it's also about achieving greater accuracy within narrow domains. This resonates with the broader trend of focusing on specialized AI applications, as opposed to striving for a single, omnipotent AI system. Consider the simplicity and focused functionality of Slate Auto’s electric truck [Slate Auto’s radically simple electric truck starts at $24,950], which prioritizes core features and user experience over unnecessary complexity—a parallel to Shopify’s distillation strategy. The ability to rapidly iterate and deploy these distilled models, without the need for extensive approval processes, further accelerates innovation and responsiveness to changing business needs.

Shopify’s vision extends beyond simply managing model availability and cost; they’re actively pushing toward a future where AI becomes truly integrated into workflows – a move from “AI reflexivity” to “AI leverage.” The usage dashboard, providing insights into token consumption and model utilization, demonstrates a commitment to understanding *how* AI is being used, not just *that* it’s being used. The implementation of "circuit breakers" to prevent runaway token spending underscores a responsible approach to AI implementation, acknowledging the potential for unintended consequences. This focus on deep user understanding and proactive management of AI resources is critical for ensuring long-term sustainability and maximizing the value derived from these technologies. The dream of an AI pipeline that autonomously selects the optimal model based on real-time learnings – a self-optimizing distillation process – is a bold and ambitious goal that could fundamentally reshape how enterprises leverage AI.

Ultimately, Shopify’s experience underscores the importance of building a flexible, adaptable, and data-driven AI infrastructure. The rapid pace of innovation in the LLM space necessitates a proactive approach, one that prioritizes resilience, specialization, and responsible usage. The key question moving forward is: how can other organizations replicate Shopify’s success in building robust internal AI platforms that empower engineers and drive tangible business outcomes, without becoming overly reliant on any single vendor or technology?

Shopify built an LLM proxy that gives every engineer access to multiple AI providers — with automatic failover when any one of them goes down, changes, or disappears. When Claude Fable 5 shut down, Shopify's engineers didn't go into panic mode. The proxy shifted them to Claude Opus or GPT 5.5 automatically, without interrupting their workflows. “Fable looks amazing; we used it of course,” Farhan Thawar, Shopify’s head of engineering, says in a new VentureBeat Beyond the Pilot podcast. “When a model comes and then it goes, or it could be as innocuous as an update, the proxy allows us to spray across the different providers,” Thawar says.

Shopify buys tokens in bulk and all users connect to models through its proxy, Thawar says. This gives his team access to reporting and failover; when there’s an availability issue with one provider, users can be “automatically, seamlessly” transferred to another. Enterprises can learn from this example and consider how a disruption might affect their business, Thawar says. At the very least, they should establish a solid backup plan. It’s important to have a system that allows for movement across models so enterprises are not “super tied” to a specific provider. Distillation is another important strategy. With distillation, a student model learns from a teacher model and typically becomes specialized in a narrower task. These small language models (SLMs) can be more beneficial than generalized, off-the-shelf models in some circumstances. For instance, Shopify’s flagship AI assistant, Sidekick, which performs numerous specialized subtasks for merchants so they can “remove toil” from their day-to-day. Using smaller distilled models can be faster and cheaper than more generalized models, Thawar says. In some cases they have proven to be 2x cheaper and faster; in more extreme cases 30x cheaper and faster, he says. But “it isn’t just about cost and latency, which are big; it’s about accuracy,” Thawar says. Engineers feed the UDP their teacher model, training data, evals, and a target model — say, Opus 4.8 distilling down to Qwen 3.5. The pipeline runs for about a day, then returns an evaluation showing what the fine-tuned model actually achieved on speed, cost, and accuracy for that subtask. If the tradeoff looks good, the engineer deploys it — no approval process required. Shopify's internal platform, Tangle, lets anyone visualize the pipeline as it runs. Thawar says his “dream” is to eventually not give the distillation pipeline a target model at all. Instead, users could provide the teacher model with data and evals and the directive: ‘Based on your learnings over time, I want you to look at a different class of model, different sizes, different types, and you tell me what the right distillation target is.’ “Maybe we'll get surprised. Maybe it'll be such a small model it could run on a phone,” Thawar says. “Other times, maybe it comes back and says, ‘There isn't a way to distill this down to anything better than what we have at the frontier.’”

Moving away from "AI reflexivity" to "AI leverage"

Shopify users can apply whatever harness they want: Claude Code, Codex, Cursor, GitHub Copilot for VS Code. “We expose everyone to the different harnesses so they can get a feel for what may or may not work in their workflow.” But the company also implemented a usage dashboard; this allows Thawar’s team to ask interesting questions around not just token spend, but: Who’s using the most expensive tokens? Who's spending more time on reasoning? What types of models are being used, and what disciplines and levels? Regarding the "tokenmaxxing" question, Shopify does have “circuit breakers” in place. If a user has a model running for a long time (say, 10 hours) and it’s consuming a lot of tokens, they will get pinged, “Did you mean to spend this?” As Thawar explains, sometimes the reply is “Oh, absolutely.” Other times it’s: ‘Whoa, I didn't know that was running in the background. I totally forgot about it. I'd rather stop it now.’ The ultimate goal, as Thawar describes it, is to move from “AI reflexivity” to “AI leverage,” and get people to really think deeply about where they can benefit most from AI in their workflows. Listen to the full podcast to hear more about:

Shopify’s philosophy of building infrastructure before features. As Thawar puts it: “We've always built more infra. We will continue to always build more infra.”
How Shopify’s internal AI agent, River, creates a “substrate of information” across the company.
How Thawar's OpenClaw agent figured out he was traveling from his calendar — and what that moment told him about where agents are actually headed.

You can also listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts.

Read on the original site

Open the publisher's page for the full experience

View original article →

Tagged with

#generative AI for data analysis#Excel alternatives for data analysis#natural language processing for spreadsheets#real-time data collaboration#financial modeling with spreadsheets#big data management in spreadsheets#rows.com#real-time collaboration#big data performance#enterprise data management#conversational data analysis#automation in spreadsheet workflows#intelligent data visualization#no-code spreadsheet solutions#data visualization tools#data analysis tools#data cleaning solutions#business intelligence tools#cloud-based spreadsheet applications#natural language processing