Anthropic just launched Claude Design, an AI tool that turns prompts into prototypes and challenges Figma
Our take

Anthropic today launched Claude Design, a new product from its Anthropic Labs division that allows users to create polished visual work — designs, interactive prototypes, slide decks, one-pagers, and marketing collateral — through conversational prompts and fine-grained editing controls. The release, available immediately in research preview to all paid Claude subscribers, is the company's most aggressive expansion beyond its core language model business and into the application layer that has historically belonged to companies like Figma, Adobe, and Canva.
Claude Design is powered by Claude Opus 4.7, Anthropic's most capable generally available vision model, which the company also released today. Anthropic says it is rolling access out gradually throughout the day to Claude Pro, Max, Team, and Enterprise subscribers.
The simultaneous launches mark a watershed for Anthropic, whose ambitions now visibly extend from foundation model provider to full-stack product company — one that wants to own the arc from a rough idea to a shipped product. The timing is also significant: Anthropic hit roughly $20 billion in annualized revenue in early March 2026, according to Bloomberg, up from $9 billion at the end of 2025 — and surpassed $30 billion by early April 2026. The company is in early talks with Goldman Sachs, JPMorgan, and Morgan Stanley about a potential IPO that could come as early as October 2026.
How Claude Design turns a text prompt into a working prototype
The product follows a workflow that Anthropic has designed to feel like a natural creative conversation. Users describe what they need, and Claude generates a first version. From there, refinement happens through a combination of channels: chat-based conversation, inline comments on specific elements, direct text editing, and custom adjustment sliders that Claude itself generates to let users tweak spacing, color, and layout in real time.
During onboarding, Claude reads a team's codebase and design files and builds a design system — colors, typography, and components — that it automatically applies to every subsequent project. Teams can refine the system over time and maintain more than one. The import surface is broad: users can start from a text prompt, upload images and documents in various formats, or point Claude at their codebase. A web capture tool grabs elements directly from a live website so prototypes look like the real product.
What distinguishes Claude Design from the wave of AI design experiments that have proliferated in the past year is the handoff mechanism. When a design is ready to build, Claude packages everything into a handoff bundle that can be passed to Claude Code with a single instruction. That creates a closed loop — exploration to prototype to production code — all within Anthropic's ecosystem. The export options acknowledge that not everyone's next step is Claude Code: users can also share designs as an internal URL within their organization, save as a folder, or export to Canva, PDF, PPTX, or standalone HTML files.
Anthropic points to Brilliant, the education technology company known for intricate interactive lessons, as an early proof point. The company's senior product designer reported that the most complex pages required 20 or more prompts to recreate in competing tools but needed only 2 in Claude Design. The Brilliant team then turned static mockups into interactive prototypes they could share and user-test without code review, and handed everything — including the design intent — to Claude Code for implementation. Datadog's product team described a similar shift, compressing what had been a week-long cycle of briefs, mockups, and review rounds into a single conversation.
Why Anthropic's chief product officer just resigned from Figma's board
The launch arrives against a backdrop that makes Anthropic's claim of complementarity with existing design tools difficult to take entirely at face value. Mike Krieger, Anthropic's chief product officer, resigned from the board of Figma on April 14 — the same day The Information reported Anthropic's next model would include design tools that could compete with Figma's primary offering.
Figma has collaborated closely with Anthropic to integrate the frontier lab's AI models into its products. Just two months ago, in February, Figma launched "Code to Canvas," a feature that converts code generated in AI tools like Claude Code into fully editable designs inside Figma — creating a bridge between AI coding tools and Figma's design process. The partnership felt like a mutual bet that AI would make design more essential, not less. Claude Design complicates that narrative significantly.
Anthropic's position, based on VentureBeat's background conversations with the company, is that Claude Design is built around interoperability and is meant to meet teams where they already work, not replace incumbent tools. The company points to the Canva export, PPTX and PDF support, and plans to make it easier for other tools to connect via MCPs (model context protocols) as evidence of that philosophy. Anthropic is also making it possible for other tools to build integrations with Claude Design, a move clearly designed to preempt accusations of walled-garden ambitions.
But the market read the signals differently. The structural tension is clear: Figma commands an estimated 80 to 90% market share in UI and UX design, according to The Next Web. Both Figma and Adobe assume a trained designer is in the loop. Anthropic's tool does not. Claude Design is not merely another AI copilot embedded in an existing design application. It is a standalone product that generates complete, interactive prototypes from natural language — accessible to founders, product managers, and marketers who have never opened Figma. The expansion of the design user base to non-designers is the real competitive threat, even if the professional designer's workflow remains anchored in Figma for now.
Inside Claude Opus 4.7, the model Anthropic deliberately made less dangerous
The model powering Claude Design is itself a significant story. Claude Opus 4.7 is Anthropic's most capable generally available model, with notable improvements over its predecessor Opus 4.6 in software engineering, instruction following, and vision — but it is intentionally less capable than Anthropic's most powerful offering, Claude Mythos Preview, the model the company announced earlier this month as too dangerous for broad release due to its cybersecurity capabilities.
That dual-track approach — one model for the public, one model locked behind a vetted-access program — is unprecedented in the AI industry. Anthropic used Claude Mythos Preview to identify thousands of zero-day vulnerabilities in every major operating system and web browser, as reported by multiple outlets. The Project Glasswing initiative that houses Mythos brings together Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, Nvidia, and Palo Alto Networks as launch partners.
Opus 4.7 sits a deliberate step below Mythos. Anthropic stated in its release that it "experimented with efforts to differentially reduce" the new model's cyber capabilities during training and ships it with safeguards that automatically detect and block requests indicating prohibited or high-risk cybersecurity uses. What Anthropic learns from those real-world safeguards will inform the eventual goal of broader release for Mythos-class models. For security professionals with legitimate needs, the company has created a new Cyber Verification Program.
On benchmarks, the model posts strong numbers. Opus 4.7 reached 64.3% on SWE-bench Pro, and on Anthropic's internal 93-task coding benchmark, it delivered a 13% resolution improvement over Opus 4.6, including solving four tasks that neither Opus 4.6 nor Sonnet 4.6 could crack.
The vision improvements are substantial and directly relevant to Claude Design: Opus 4.7 can accept images up to 2,576 pixels on the long edge — roughly 3.75 megapixels, more than three times the resolution of prior Claude models. Early access partner XBOW, the autonomous penetration testing company, reported that the new model scored 98.5% on their visual-acuity benchmark versus 54.5% for Opus 4.6.
Meanwhile, Bloomberg reported that the White House is preparing to make a version of Mythos available to major federal agencies, with the Office of Management and Budget setting up protections for Cabinet departments — a sign that the government views the model's capabilities as too important to leave solely in private hands.
What enterprise buyers need to know about data privacy and pricing
For enterprise and regulated-industry buyers, the data handling architecture of Claude Design will be a critical evaluation criterion. Based on VentureBeat's exclusive background discussions with Anthropic, the system stores the design-system representation it generates — not the source files themselves. When users link a local copy of their code, it is not uploaded to or stored on Anthropic's servers. The company is also adding the ability to connect directly to GitHub. Anthropic states unequivocally that it does not train on this data. For Enterprise customers, Claude Design is off by default — administrators choose whether to enable it and control who has access.
On pricing, Claude Design is included at no additional cost with Pro, Max, Team, and Enterprise plans, using existing subscription limits with optional extra usage beyond those caps. Opus 4.7 holds the same API pricing as its predecessor: $5 per million input tokens and $25 per million output tokens. The pricing strategy mirrors the approach Anthropic took with Claude Code, which launched as a bundled feature and rapidly grew into a major revenue driver. Anthropic's reasoning is straightforward: the best way to learn what people will build with a new product category is to put it in their hands, then build monetization around demonstrated value.
Anthropic is also being transparent about the product's limitations. The design system import works best with a clean codebase; messy source code produces messy output. Collaboration is basic and not yet fully multiplayer. The editing experience has rough edges. There is no general availability date, and Anthropic says that is intentional — it will let the product and user feedback determine when Claude Design is ready for prime time.
Anthropic's bet that owning the full creative stack is worth the risk
Claude Design is the most visible expression of a trend that has been accelerating for months: the major AI labs are moving up the stack from model providers into full application builders, directly entering categories previously owned by established software companies. Anthropic now offers a coding agent (Claude Code), a knowledge-work assistant (Claude Cowork), desktop computer control, office integrations for Word, Excel, and PowerPoint, a browser agent in Chrome, and now a design tool. Each product reinforces the others. A designer can explore concepts in Claude Design, export a prototype, hand it to Claude Code for implementation, and have Claude Cowork manage the review cycle — all within Anthropic's platform.
The financial momentum behind this expansion is staggering. Anthropic has received investor offers valuing the company at approximately $800 billion, according to Reuters, more than doubling its $380 billion valuation from a funding round closed just two months ago. But building an application empire while simultaneously navigating an AI safety reputation, an impending IPO, growing public hostility toward the technology, and the diplomatic fallout of competing with your own partners is a balancing act that no technology company has attempted at this scale or speed.
When Figma launched Code to Canvas in February, the implicit promise was that AI coding tools and design tools would grow together, each making the other more valuable. Two months later, Anthropic's chief product officer has left Figma's board, and the company has shipped a product that lets anyone who can type a sentence create the kind of interactive prototype that once required years of design training and a Figma license. The partnership may survive. But the power dynamic just changed — and in the AI industry, that tends to be the only kind of change that matters.
Read on the original site
Open the publisher's page for the full experience
Related Articles
- Mystery solved: Anthropic reveals changes to Claude's harnesses and operating instructions likely caused degradationFor several weeks, a growing chorus of developers and AI power users claimed that Anthropic’s flagship models were losing their edge. Users across GitHub, X, and Reddit reported a phenomenon they described as "AI shrinkflation"—a perceived degradation where Claude seemed less capable of sustained reasoning, more prone to hallucinations, and increasingly wasteful with tokens. Critics pointed to a measurable shift in behavior, alleging that the model had moved from a "research-first" approach to a lazier, "edit-first" style that could no longer be trusted for complex engineering. While the company initially pushed back against claims of "nerfing" the model to manage demand, the mounting evidence from high-profile users and third-party benchmarks created a significant trust gap. Today, Anthropic addressed these concerns directly, publishing a technical post-mortem that identified three separate product-layer changes responsible for the reported quality issues. "We take reports about degradation very seriously," reads Anthropic's blog post on the matter. "We never intentionally degrade our models, and we were able to immediately confirm that our API and inference layer were unaffected." Anthropic claims it has resolved the issues by reverting the reasoning effort change and the verbosity prompt, while fixing the caching bug in version v2.1.116. The mounting evidence of degradation The controversy gained momentum in early April 2026, fueled by detailed technical analyses from the developer community. Stella Laurenzo, a Senior Director in AMD’s AI group, published an exhaustive audit of 6,852 Claude Code session files and over 234,000 tool calls on Github showing performance falling from her usage before. Her findings suggested that Claude’s reasoning depth had fallen sharply, leading to reasoning loops and a tendency to choose the "simplest fix" rather than the correct one. This anecdotal frustration was seemingly validated by third-party benchmarks. BridgeMind reported that Claude Opus 4.6’s accuracy had dropped from 83.3% to 68.3% in their tests, causing its ranking to plummet from No. 2 to No. 10. Although some researchers argued these specific benchmark comparisons were flawed due to inconsistent testing scopes, the narrative that Claude had become "dumber" became a viral talking point. Users also reported that usage limits were draining faster than expected, leading to suspicions that Anthropic was intentionally throttling performance to manage surging demand. The causes In its post-morem bog post, Anthropic clarified that while the underlying model weights had not regressed, three specific changes to the "harness" surrounding the models had inadvertently hampered their performance: Default Reasoning Effort: On March 4, Anthropic changed the default reasoning effort from high to medium for Claude Code to address UI latency issues. This change was intended to prevent the interface from appearing "frozen" while the model thought, but it resulted in a noticeable drop in intelligence for complex tasks. A Caching Logic Bug: Shipped on March 26, a caching optimization meant to prune old "thinking" from idle sessions contained a critical bug. Instead of clearing the thinking history once after an hour of inactivity, it cleared it on every subsequent turn, causing the model to lose its "short-term memory" and become repetitive or forgetful. System Prompt Verbosity Limits: On April 16, Anthropic added instructions to the system prompt to keep text between tool calls under 25 words and final responses under 100 words. This attempt to reduce verbosity in Opus 4.7 backfired, causing a 3% drop in coding quality evaluations. Impact and future safeguards The quality issues extended beyond the Claude Code CLI, affecting the Claude Agent SDK and Claude Cowork, though the Claude API was not impacted. Anthropic admitted that these changes made the model appear to have "less intelligence," which they acknowledged was not the experience users should expect. To regain user trust and prevent future regressions, Anthropic is implementing several operational changes: Internal Dogfooding: A larger share of internal staff will be required to use the exact public builds of Claude Code to ensure they experience the product as users do. Enhanced Evaluation Suites: The company will now run a broader suite of per-model evaluations and "ablations" for every system prompt change to isolate the impact of specific instructions. Tighter Controls: New tooling has been built to make prompt changes easier to audit, and model-specific changes will be strictly gated to their intended targets. Subscriber Compensation: To account for the token waste and performance friction caused by these bugs, Anthropic has reset usage limits for all subscribers as of April 23. The company intends to use its new @ClaudeDevs account on X and GitHub threads to provide deeper reasoning behind future product decisions and maintain a more transparent dialogue with its developer base.
- Anthropic introduces "dreaming," a system that lets AI agents learn from their own mistakesAnthropic on Tuesday unveiled a suite of updates to its Claude Managed Agents platform at its second annual Code with Claude developer conference in San Francisco, introducing a new capability called "dreaming" that lets AI agents learn from their own past sessions and improve over time — a step toward the kind of self-correcting, self-improving AI systems that enterprises have demanded before trusting agents with production workloads. The company also moved two previously experimental features — outcomes and multi-agent orchestration — from research preview into public beta, making them broadly available to developers building on the Claude platform. Together, the three features address what Anthropic says are the hardest problems in running AI agents at scale: keeping them accurate, helping them learn, and preventing them from becoming bottlenecks on complex, multi-step work. Early adopters are already reporting significant results. Legal AI company Harvey saw task completion rates increase roughly 6x after implementing dreaming. Medical document review company Wisedocs cut its document review time by 50% using outcomes. And Netflix is now processing logs from hundreds of builds simultaneously using multi-agent orchestration. The announcements come at a moment of extraordinary momentum for Anthropic. CEO Dario Amodei disclosed during a fireside chat at the conference that the company's growth has outpaced even its own aggressive internal projections. In the first quarter of 2026, Anthropic saw what Amodei described as 80x annualized growth in revenue and usage — far exceeding the 10x annual growth the company had planned for. API volume on the Claude platform is up nearly 70x year over year, and the average developer using Claude Code now spends 20 hours per week working with the tool. "We tried to plan very well for a world of 10x growth per year," Amodei said. "And yet we saw 80x. And so that is the reason we have had difficulties with compute." How Anthropic's dreaming feature teaches AI agents to learn from their own history Dreaming is the most novel of the three features and the one Anthropic is most eager to distinguish from conventional memory systems. While the company launched agent memory earlier this year — allowing Claude to retain preferences and context within and across individual sessions — dreaming works at a higher level of abstraction. It is a scheduled process that reviews an agent's past sessions and memory stores, extracts patterns across them, and curates those memories so agents improve over time. It surfaces insights that no single agent session could see on its own: recurring mistakes, workflows that multiple agents converge on independently, and preferences shared across a team of agents. Alex Albert, who leads research product management at Anthropic, explained the concept in an interview at the conference. He described dreaming as analogous to how people within organizations create skills after working through a task. "They might do a workflow with Claude, and at the end of that workflow, after they've iterated and zigzagged a little bit, they want to record that path from A to B," Albert said. "A very similar thing is happening with dreaming — instead of you manually creating the skill from your experience working with Claude, the model is doing it, so it has that same context for a future session." Crucially, dreaming does not modify the underlying model weights. "We're not changing the model itself through dreaming — it's not doing updates to the weights or anything like that," Albert said. Instead, the agent writes learnings as plain-text notes and structured "playbooks" that future sessions can reference, making the entire process observable and auditable by humans. When asked about the trust implications of agents consolidating their own knowledge, Albert acknowledged that "there is a level of trust that you need to place" but noted that all memories are inspectable and that smarter models are getting progressively better at managing this process. "They're learning to write better notes for their future self," he said. A live demo showed AI agents improving overnight without human guidance During the keynote, the Anthropic team demonstrated all three features live on stage using a fictional aerospace startup called "Lumara" that needed to autonomously land drones on the moon for resource mining. The team configured a multi-agent system with three specialists — a commander agent responsible for overall mission success, a detector agent that identified high-quality landing sites, and a navigator agent that handled safe drone flight and landing — and defined a success rubric requiring soft landings, clear ground, and enough fuel reserves for a return trip to Earth. An initial simulation across six hypothetical landing sites produced strong but imperfect results. To improve, the presenters triggered a dreaming session directly from the Claude Developer Console. Overnight, the dreaming agent reviewed all past simulation sessions and wrote a detailed descent playbook — a comprehensive set of heuristics drawn from patterns across multiple mission runs. When the team ran a new simulation the following morning with the dreaming-derived playbook in memory, the results improved meaningfully on the sites that had previously underperformed. "All we had to do was just have Caitlin press a button," said Angela Jiang, Head of Product for the Claude Platform, referring to her colleague on stage. "All dreaming." The demo illustrated how the three features compose together in practice. Multi-agent orchestration split the complex task across specialists with independent context windows. Outcomes provided the rubric against which a separate grader agent evaluated each run. And dreaming extracted lessons across those runs to improve future performance — forming what Anthropic describes as a continuous improvement loop that requires no human intervention between iterations. Why Anthropic built a separate 'grader' agent to check Claude's own work The outcomes feature, now in public beta, gives developers a way to define what success looks like using a rubric — a structural framework, a presentation standard, a brand voice, or any other set of criteria — and then lets the agent iterate toward that standard autonomously. What makes outcomes architecturally distinctive is its separation of concerns. When an agent completes its work, a separate grader agent evaluates the output against the developer-defined rubric in its own independent context window. Because the grader operates in a fresh context, it is not influenced by the working agent's reasoning or accumulated biases from the session. When the grader identifies gaps between the output and the rubric, it pinpoints specifically what needs to change, and the working agent takes another pass. This loop continues until the rubric criteria are met — without a human needing to review each attempt. Albert described Anthropic's broader verification strategy as employing "more test time compute, more models thinking about a problem for longer, to check over the work of another." He acknowledged that having a model check its own work raises reasonable questions, but said a fresh context window reviewing completed work consistently outperforms asking the same long-running thread to identify its own bugs. "You will get higher success if you give that output to a fresh Claude and say, 'what bugs do you see?'" he said. "There is still something to the attention" that degrades over very long sessions — a limitation he said Anthropic is actively working to fix in future models. The approach mirrors strategies already in use at GitHub. Mario Rodriguez, Chief Product Officer at GitHub, described during a separate talk at the conference how Copilot uses a similar advisor pattern with Claude models — pairing a smaller, cheaper model as an executor with a larger model as a mentor. When the smaller model encounters a problem beyond its capability, it calls the larger model for guidance, then continues executing on its own. Rodriguez said the approach delivers near-Opus-level intelligence at significantly lower cost, and that GitHub inserts critique models at three specific points in the coding workflow: after drafting a plan, after a complex implementation, and after writing tests but before running them. Parallel AI agents can now tackle tasks too complex for a single model thread Multi-agent orchestration, the third feature moving to public beta, allows a lead agent to decompose a large task into subtasks and delegate each one to a specialist agent — each with its own model, system prompt, tools, and independent context window. Every step in the process is traceable in the Claude Console, showing which agent did what, in what order, and why. The design gives each sub-agent an isolated context, which Anthropic says produces better results than having a single agent attempt to hold all the complexity in one thread. "Each sub-agent has its own independent thread and context window," the keynote presenters explained. "This is very intentional — we found that by splitting the work and then merging the results, we get better outcomes." Albert offered his own heuristic for when multi-agent architectures make sense versus sticking with a single thread. "Parallel agents are better for investigation," he said — situations where there is a lot of context that will ultimately be discarded. "If you're trying to answer a specific question, you don't need all the search results from the areas where it didn't find the answer. You just need the answer." He described spinning up disposable sub-agents for specific retrieval tasks and bringing only the result back to the main thread. Increasingly, he said, the model itself will decide when to parallelize. "In the future, you won't really care if it's one agent or multi-agent or whatever's happening. You just have a Claude that you're talking to, and it will deploy the right architecture automatically." Anthropic's bigger bet: closing the gap between AI capabilities and real-world adoption The three features arrive as part of a broader platform push that Anthropic framed throughout the conference as closing "the gap between what AI can do and what it's actually doing for people." Ami Vora, Anthropic's Chief Product Officer, set the theme in her opening keynote, noting that while model capabilities are advancing on an exponential curve, most organizations are still adopting AI on a linear path. Dianne Penn, who leads product for Anthropic's research team, described the company's measure of progress as "task horizon" — how long an AI agent can work autonomously while improving the quality of its deliverables. "This time last year, models could work for minutes," she said. "Now, most of us have agents running for hours on end. Tomorrow, we'll have agents that are proactive, always on, and know what to work on without losing the frame." The event also included several infrastructure announcements designed to help developers keep pace. Anthropic said it is doubling its five-hour rate limits for Pro, Max, Team, and Enterprise plans, and raising API rate limits considerably. The company announced a partnership with SpaceX to use the full capacity of its Colossus data center to expand compute availability — a direct response to the demand crunch Amodei described. All three features are built into Claude Managed Agents, which launched in public beta on April 8 as an opinionated harness that bundles best practices including memory, tool integration, and action handling. Anthropic says teams using Managed Agents have shipped 10x faster than those building their own agent infrastructure from scratch. Albert described the platform using an operating system analogy: "With managed agents, you don't need to think about all the technicalities of how you set up the surrounding system," he said. "You're building an application for Macs — you don't want to go have to re-implement every detail of macOS." What dreaming, outcomes, and multi-agent orchestration mean for the future of enterprise AI The competitive implications are significant. As AI agent platforms from OpenAI, Google, and others compete for developer adoption, Anthropic is betting that production reliability — not just raw model intelligence — will determine which platform wins enterprise budgets. The dreaming feature in particular stakes out new territory: while other platforms offer memory and tool use, the idea of agents systematically reviewing their own histories to extract reusable knowledge goes further toward the kind of continuously improving systems that enterprises need before delegating high-stakes work. The conference showcased companies already operating at that scale. Mercado Libre, Latin America's largest e-commerce platform, has 23,000 engineers running Claude Code, has reviewed more than 500,000 pull requests with human oversight, and is aiming for 90% autonomous coding by the third quarter of this year. Shopify has deployed Claude Code across not just engineering but design, product, and data science teams. But it was Dario Amodei who articulated the most expansive vision for where all of this leads. He described a progression from single agents to multiple agents to whole organizational intelligence — from "a team of smart people in a room" to what he called "a country of geniuses in the data center." And he reiterated a prediction he made roughly a year ago: that 2026 would see the first billion-dollar company run by a single person. "Hasn't quite happened yet," he said. "But we've got seven more months." Dreaming is available now in research preview. Outcomes and multi-agent orchestration are in public beta and available to all developers on the Claude platform. Whether seven months is enough time for a solo founder to build a billion-dollar business remains an open question — but after Tuesday, they have a few more tools to try.
- Anthropic says it hit a $30 billion revenue run rate after 'crazy' 80x growthDario Amodei is not the kind of CEO who talks loosely about numbers. The Anthropic co-founder and chief executive, a former VP of research at OpenAI with a PhD in computational neuroscience from Princeton, has built a reputation for measured public statements — particularly around the financial performance of a company that, until recently, disclosed almost nothing about its business. So when Amodei took the stage at Anthropic's Code with Claude developer conference on Wednesday and offered a genuinely striking piece of financial candor, the room paid attention. "We tried to plan very well for a world of 10x growth per year," Amodei said during a fireside chat with Anthropic's chief product officer, Ami Vora. "And yet we saw 80x. And so that is the reason we have had difficulties with compute." Anthropic had planned for tenfold growth. But revenue and usage increased 80-fold in the first quarter on an annualized basis, a rate Amodei described as "just crazy" and "too hard to handle." The number demands context. Annualized growth rates can overstate sustained performance — a single strong quarter, extrapolated across a full year, can paint a picture that doesn't hold. Amodei knows this. But the underlying trajectory is not a mirage. Anthropic has crossed a $30 billion annualized revenue run rate, up sharply from roughly $9 billion at the end of 2025, and that growth is being driven largely by enterprise demand. The company's revenue trajectory has been relentless: $87 million run rate in January 2024, $1 billion by December 2024, $9 billion by end of 2025, $14 billion in February 2026, $19 billion in March, and $30 billion in April. For context: Salesforce took about 20 years to reach $30 billion in annual revenue. Anthropic did it in under three years from a standing start. Claude Code became the fastest-growing product in enterprise software history The growth story at Anthropic is, to a remarkable degree, a single-product story. Claude Code, the company's agentic AI coding tool launched publicly in mid-2025, has become the fastest-growing product in the company's history — and, by several measures, one of the fastest-growing software products ever built. Claude Code hit $1 billion in annualized revenue within six months of launch, and the growth hasn't slowed down. By February 2026, the product was generating over $2.5 billion in run-rate revenue. The company also said Claude Code's weekly active users had doubled since January 1 and that business subscriptions had quadrupled since the start of 2026. The mechanics of the product are straightforward. Claude Code is not a chatbot that suggests snippets. It reads a codebase, plans a sequence of actions, executes them using real development tools, evaluates the result, and adjusts its approach. The developer sets the objective and retains control over what gets committed, but the execution loop runs independently. The average developer using Claude Code now spends 20 hours per week working with the tool. At Anthropic itself, the majority of code is now written by Claude Code. Engineers focus on architecture, product thinking, and continuous orchestration: managing multiple agents in parallel, giving direction, and making the decisions that shape what gets built. That last point may be the most revealing detail Amodei disclosed at the conference: this is the first year Anthropic's own internal pull requests have inflected upward due to Claude's work on the company's own codebase. The tool that Anthropic sells to developers is now a material contributor to Anthropic's own engineering output. That creates a feedback loop that is almost impossible for competitors without a comparable product to replicate — the company is using its own product to build the next version of its own product. The enterprise numbers tell the same story. The company now counts over 1,000 enterprise customers spending more than $1 million per year on Claude services, a figure that has doubled since February. Much of this increase has been fueled by a wave of corporate customers including Uber and Netflix. Amodei framed the adoption curve in economic terms. "Software engineers are the ones who are fastest to adopt new technology," he said on stage. "It's a foreshadowing of how things are going to work across the economy, and how the economy is going to be transformed by AI." Anthropic's 80x growth created a compute crisis it couldn't solve alone Hypergrowth creates its own category of problem. When demand outstrips supply by an order of magnitude, the constraint is not go-to-market strategy or product-market fit. The constraint is physics. The company is growing so fast that its infrastructure has struggled to keep up, forcing Anthropic into what may be the most unexpected partnership in the current AI cycle. Amodei's comments came hours after Anthropic announced a deal with Elon Musk's SpaceX to use all of the compute capacity at his company's Colossus 1 data center in Memphis, Tennessee. As part of the agreement, Anthropic will get access to more than 300 megawatts of capacity — over 220,000 Nvidia GPUs, including dense deployments of H100, H200, and next-generation GB200 accelerators. The deal is remarkable for several reasons. Musk has been, until very recently, one of Anthropic's most vocal critics. He has said Anthropic is "doomed to become the opposite of its name" and wrote in February that "Anthropic hates Western Civilization." But on Wednesday, Musk changed his tune, saying he spent a lot of time with senior members of the Anthropic team over the past week and that he was "impressed." "Everyone I met was highly competent and cared a great deal about doing the right thing. No one set off my evil detector," Musk wrote. The strategic logic on both sides is clear. xAI's Colossus 1 ended up with capacity that Grok's user base never grew into, while Anthropic needs compute immediately. Anthropic has been signing deals with Amazon, Google, Nvidia, and Microsoft for more compute capacity, but most of that isn't expected to come online until late 2026 or early 2027. The SpaceX deal gives Anthropic a significant boost now — the key word being "now." As one industry watcher summarized the alignment: "Elon's enemy is Sam. Dario's enemy is Sam. Enemy of my enemy is a compute partner." Last month, Anthropic said demand for Claude has led to "inevitable strain on our infrastructure," which has impacted "reliability and performance" for its users, particularly during peak hours. The company admitted in a postmortem from late April that three bugs had affected Claude Code since March 4, and that internal tests hadn't caught them, leading to several weeks of degraded performance. Amodei said at the Code with Claude conference that the company is "working as quickly as possible to provide more" capacity and will "pass that compute on to you as soon as we can." A near-trillion-dollar valuation makes Anthropic's IPO the most anticipated debut in years The growth figures arrive at a moment when Anthropic's valuation is itself becoming one of the defining financial stories of the AI era. Anthropic has begun weighing a fresh funding round that would value the company at more than $900 billion, according to people familiar with the matter, potentially leapfrogging its longtime rival OpenAI as the world's most valuable AI startup. The velocity of the escalation is difficult to overstate. From $61.5 billion in March 2025, to $183 billion by its Series F in September, to $380 billion in February, to, if the current discussions proceed, more than $900 billion in May. Anthropic's shares were already trading at an implied $1 trillion valuation on secondary markets earlier this month. Instead of cashing out, many existing investors are waiting to potentially exit during Anthropic's anticipated IPO later this year. The company is raising what is likely to be its last private round before going public to fund its massive computing needs. Bloomberg has reported that the company is weighing an IPO as early as October 2026, with Goldman Sachs, JPMorgan, and Morgan Stanley already in early discussions. Anthropic is also building out infrastructure on longer time horizons. Amazon has agreed to invest up to $25 billion in Anthropic, securing up to 5 gigawatts of compute capacity for training and deploying Claude models. Anthropic also secured 5 gigawatts of computing capacity as part of a separate deal with Google and Broadcom that will start to come online next year. The total commitment is staggering — tens of gigawatts of compute across three separate hardware ecosystems: Amazon's Trainium chips, Google's TPUs via Broadcom, and Nvidia GPUs through SpaceX and Microsoft Azure. For perspective: Anthropic's $30 billion run rate exceeds the trailing twelve-month revenues of all but approximately 130 S&P 500 companies. A company that was essentially pre-revenue in early 2024 now out-earns most of the Fortune 500. That comparison comes with caveats. Private-market revenue run rate is not the same thing as audited GAAP revenue, gross margin, free cash flow, or public float. OpenAI has internally argued that Anthropic's $30 billion figure is overstated by roughly $8 billion, pointing to questions about whether revenues from AWS and Google Cloud should be reported at gross value or net of the partner's cut. The accounting question will ultimately be resolved when both companies file IPO prospectuses — but even on a net basis, Anthropic's growth rate is unlike anything in enterprise software history. Dario Amodei's vision for AI extends far beyond coding — and he's given himself a deadline The financial story — 80x growth, a near-trillion-dollar valuation, a scramble to secure enough GPUs to meet demand — is dramatic on its own terms. But Amodei used his time on stage to place it inside a larger thesis about where AI is headed. He described a progression from single agents to multiple agents to what he called whole organizational intelligence — from "a team of smart people in a room" to "a country of geniuses in the data center." The framing is deliberately expansive. What Anthropic is selling today is a coding tool. What Amodei is describing is a future in which entire categories of knowledge work are performed by fleets of AI agents operating in parallel, supervised by humans who define objectives and review outputs. He reiterated a prediction he made roughly a year ago: that 2026 would see the first billion-dollar company run entirely by a single person. "Hasn't quite happened yet," he said. "But we've got seven more months." The company has also been navigating political headwinds. The Pentagon declared Anthropic a supply chain risk in March, blacklisting it from work with the military. The company has warned the designation could result in billions in lost revenue, with over one hundred enterprise customers reportedly expressing doubts about continuing their relationships. And yet — as that scuffle makes its way through the legal system, Anthropic is only getting more popular. Amodei said this week he's eventually hoping for "more normal" expansion. There is a temptation, when covering a company growing at this rate, to let the numbers speak for themselves. They shouldn't. Growth at 80x annualized is not a business plan — it's an emergency. It means demand has outrun infrastructure, that customers want something the company cannot yet reliably deliver at scale, and that every week of constrained capacity is a week during which competitors can close the gap. The investors funding Anthropic — including SoftBank, Amazon, Nvidia, Google, a16z, Lightspeed, and ICONIQ — are making a specific bet: that compute costs continue to fall per unit of intelligence, that revenue keeps compounding faster than burn, and that whoever owns the AI infrastructure layer in 2029 will generate returns that make the interim losses irrelevant. Amodei's candor at Code with Claude was not a victory lap. It was a diagnostic — an admission that his company is running faster than it can steer. He planned for a world of 10x growth and got 80x instead. Now he has seven months to prove that the infrastructure, the organization, and the vision can catch up to the demand. The country of geniuses in the data center is getting crowded. The question is whether anyone remembered to build enough rooms.
- OpenAI's GPT-5.5 is here, and it's no potato: narrowly beats Anthropic's Claude Mythos Preview on Terminal-Bench 2.0After months of rumors and reports that OpenAI was developing a new, more powerful AI large language model for use in ChatGPT and through its application programming interface (API), allegedly codenamed "Spud" internally, the company has today unveiled its latest offering under the more formal name GPT-5.5. And to likely no one's surprise, it's hardly a "potato" in the disparaging sense of the word: GPT-5.5 retakes the lead for OpenAI in generally available LLMs, coming ahead of rivals Anthropic's and Google's latest public offerings, and even beating the private Anthropic Claude Mythos Preview model narrowly on one benchmark (essentially a statistical tie). "It’s definitely our strongest model yet on coding, both measured by benchmarks and based on the feedback that we’ve gotten from trusted partners, as well as our own experience," explained Amelia "Mia" Glaese, VP of Research at OpenAI, in a video call with journalists ahead of the launch earlier today. OpenAI positions GPT-5.5 as a fundamental redesign of how intelligence interacts with a computer's operating system and professional software stacks. "What is really special about this model is how much more it can do with less guidance," said OpenAI co-founder and president Greg Brockman on the same call. "It’s way more intuitive to use. It can look at an unclear problem and figure out what needs to happen next." Brockman proceeded to emphasize the areas in which users can expect to see gains from using GPT-5.5 compared to OpenAI's prior state-of-the-art model, GPT-5.4, which remains available (for now) to users and enterprises at half the API cost of its new successor. "It’s extremely good at coding," Brockman said of GPT-5.5. "It’s also great at broader computer work, computer use, scientific research—these kinds of applications that are very intelligent bottlenecks." OpenAI CEO and-cofounder Sam Altman also weighed in on the launch and the company's philosophy in a post on X, writing, in part: "We want our users to have access to the best technology and for everyone to have equal opportunity." The model is available in two variants: GPT-5.5 and GPT-5.5 Pro, distinguished by the latter offering enhanced precision and specialized logic for handling the most rigorous cognitive demands. While the standard version serves as the versatile flagship for general intelligence tasks, the Pro model is architected specifically for high-stakes environments such as legal research, data science, and advanced business analytics where accuracy is paramount. This premium tier provides noticeably more comprehensive and better-structured responses, supported by specialized latency optimizations that ensure high-quality performance during complex, multi-step workflows. Unfortunately for third-party software developers, API access is not yet available for either GPT-5.5 nor GPT-5.5 Pro and will be coming "very soon," according to the company's announcement blog post. "API deployments require different safeguards and we are working closely with partners and customers on the safety and security requirements for serving it at scale," OpenAI writes. For the time being, GPT-5.5 is available only to paying subscribers of the ChatGPT Plus ($20 monthly), Pro ($100-$200 monthly), Business, and Enterprise users, with GPT-5.5 Pro access starting at the Pro tier and upwards. A focus on agency At the core of GPT-5.5 is a focus on "agentic" performance—specifically in coding, computer use, and scientific research. Unlike its predecessors, which often required granular, step-by-step prompting to avoid "hallucinating" a path forward, GPT-5.5 is designed to handle messy, multi-part tasks autonomously. It excels at researching online, debugging complex codebases, and moving between documents and spreadsheets without human intervention. One of the most significant technical leaps is the model's efficiency. While larger models typically suffer from increased latency, GPT-5.5 matches the per-token latency of the previous GPT-5.4 while delivering a higher level of intelligence. This was achieved through a deep hardware-software co-design. OpenAI served GPT-5.5 on NVIDIA GB200 and GB300 NVL72 systems, utilizing custom heuristic algorithms—written by the AI itself—to partition and balance work across GPU cores. This optimization reportedly increased token generation speeds by over 20%.For high-stakes reasoning, the "GPT-5.5 Thinking" mode in ChatGPT provides smarter, more concise answers by allowing the model more internal "compute time" to verify its own assumptions before responding. This capability is particularly visible in the model’s performance on "Expert-SWE," an internal OpenAI benchmark for long-horizon coding tasks with a median human completion time of 20 hours. GPT-5.5 notably outperformed GPT-5.4 on this metric while using significantly fewer tokens. Benchmarks show OpenAI has retaken the lead in most powerful publicly available LLM over Claude Opus 4.7 (but the unreleased Mythos still outperforms it) The market for leading U.S.-made frontier models has become an increasingly tight race between OpenAI, Anthropic, and Google. Literally a week ago to the date, OpenAI rival Anthropic released Opus 4.7, its most powerful generally available model, to the public, taking over the leaderboard in terms of the number of third-party benchmark tests in which it has the lead. Yet today, GPT-5.5 has surpassed it and even Anthropic's heavily restricted, more powerful model Claude Mythos Preview, albeit only on one benchmark, Terminal-Bench 2.0, which tests "a model's ability to navigate and complete tasks in a sandboxed terminal environment." GPT-5.5 achieved 82.7% accuracy on Terminal-Bench 2.0, easily surpassing Opus 4.7 (69.4%) and narrowly beating the Mythos Preview (82.0%). However, in multidisciplinary reasoning without tools, the landscape is more competitive. On Humanity's Last Exam without tools, GPT-5.5 Pro scored 43.1%, trailing behind Opus 4.7 (46.9%) and Mythos Preview (56.8%). Benchmark GPT-5.5 Claude Opus 4.7 Gemini 3.1 Pro Mythos Preview* Terminal-Bench 2.0 82.7 69.4 68.5 82.0 Expert-SWE (Internal) 73.1 — — — GDPval (wins or ties) 84.9 80.3 67.3 — OSWorld-Verified 78.7 78.0 — 79.6 Toolathlon 55.6 — 48.8 — BrowseComp 84.4 79.3 85.9 86.9 FrontierMath Tier 1–3 51.7 43.8 36.9 — FrontierMath Tier 4 35.4 22.9 16.7 — CyberGym 81.8 73.1 — 83.1 Tau2-bench Telecom (original prompts) 98.0 — — — OfficeQA Pro 54.1 43.6 18.1 — Investment Banking Modeling Tasks (Internal) 88.5 — — — MMMU Pro (no tools) 81.2 — 80.5 — MMMU Pro (with tools) 83.2 — — — GeneBench 25.0 — — — BixBench 80.5 — — — Capture-the-Flags challenge tasks (Internal) 88.1 — — — ARC-AGI-2 (Verified) 85.0 75.8 77.1 — SWE-bench Pro (Public) 58.6 64.3 54.2 77.8 This suggests that while OpenAI is winning on "computer use" and "agency," other models may still hold an edge in pure, zero-shot academic knowledge. It is important to clarify that Mythos Preview is not a generally available product; Anthropic has classified it as a strategic defensive asset due to its high cybersecurity risks, restricting its access to a small, limited audience of trusted partners and government agencies. Because Mythos is excluded from broad commercial use, the primary market competition remains between GPT-5.5, Gemini 3.1 Pro, and Claude Opus 4.7. So when it comes to models that the general public can access, GPT-5.5 has retaken the crown for OpenAI, achieving the state-of-the-art across 14 benchmarks compared to 4 for Claude Opus 4.7 and 2 for Google Gemini 3.1 Pro. It dominates in agentic computer use, economic knowledge work (GDPval), specialized cybersecurity (CyberGym), and complex mathematics (Frontier Math). In comparison, Claude Opus 4.7 leads on software engineering and reasoning without tools, while Gemini 3.1 Pro leads in three categories, specifically excelling in academic reasoning and financial analysis. Increased costs for users The shift in intelligence comes with a significant price increase for API developers, according to material OpenAI shared ahead of the model's public release. OpenAI has effectively doubled the entry price for its flagship model compared to the previous generation, and again double it from there for the most-cutting edge variant of the model, GPT-5.5 Pro: Model Input Price (per 1M tokens) Output Price (per 1M tokens) GPT-5.4 $2.50 $15.00 GPT-5.5 $5.00 $30.00 GPT-5.5 Pro $30.00 $180.00 To mitigate these costs, OpenAI emphasizes that GPT-5.5 is more "token efficient," meaning it uses fewer tokens to complete the same task compared to GPT-5.4. For users requiring speed over depth, OpenAI also introduced a Fast mode in Codex, which generates tokens 1.5x faster but at a 2.5x price premium. The "mini" and "nano" tiers seen in the GPT-5.4 era (priced at $0.75 and $0.20 per 1M input tokens respectively) currently have no GPT-5.5 equivalent, though the company notes that GPT-5.5 is rolling out to all subscription tiers, including Plus, Pro, and Enterprise. Licensing and the 'cyber-permissive' frontier OpenAI’s approach to safety and licensing for GPT-5.5 introduces a novel concept: Trusted Access for Cyber. Because the model is now capable of identifying and patching advanced security vulnerabilities, OpenAI has implemented stricter "cyber-risk classifiers" for general users. For legitimate security professionals, however, OpenAI is offering a specialized "cyber-permissive" license. This program allows verified defenders—those responsible for critical infrastructure like power grids or water supplies—to use models like GPT-5.4-Cyber or unrestricted versions of GPT-5.5 with fewer refusals for security-related prompts. This dual-use framework acknowledges that while AI can accelerate cyber defense, it can also be weaponized. Under OpenAI’s Preparedness Framework, GPT-5.5 is classified as "High" risk for biological and cybersecurity capabilities. To manage this, API deployments currently require different safeguards than the consumer-facing ChatGPT, and OpenAI is working with government partners to ensure these tools are used to strengthen—not undermine—digital resilience. Initial reactions: losing access feels like having a 'limb amputated' The early feedback from power users and engineers suggests that GPT-5.5 has crossed a psychological threshold in AI utility. For developers, the model's ability to maintain "conceptual clarity" across massive codebases is its standout feature. "The first coding model I've used that has serious conceptual clarity," noted Dan Shipper, CEO of Every. Shipper tested the model by asking it to debug a complex system failure that had previously required a team of human engineers to rewrite; GPT-5.5 produced the same fix autonomously. Similarly, Pietro Schirano, CEO of MagicPath, described a "step change" in performance when the model successfully merged a branch with hundreds of refactor changes into a main branch in a single, 20-minute pass.Perhaps the most visceral reaction came from an anonymous engineer at NVIDIA, who had early access to the model: "Losing access to GPT-5.5 feels like I've had a limb amputated". This sentiment is echoed in the scientific community. Derya Unutmaz, a professor at the Jackson Laboratory for Genomic Medicine, used GPT-5.5 Pro to analyze a dataset of 28,000 genes, producing a report in minutes that would have normally taken his team months. Brandon White, CEO of Axiom Bio, went further, stating that if OpenAI continues this pace, "the foundations of drug discovery will change by the end of the year". GPT-5.5 is more than an incremental update; it is a tool designed for a world where humans delegate entire workflows rather than single prompts. While the costs are higher and the safety guardrails tighter, the performance gains in agentic work suggest that AI is finally moving from the chat box and into the operating system. Perhaps most astonishingly of all, it's not even hearing the end of the scaling limits — whereupon models are trained on more and more GPUs — according to researchers at the company. "We actually still have headroom to train significantly smarter models than this," said OpenAI chief scientist Jakub Pachocki.