May 14, 2026•1 min read•from InfoQ

Anthropic Traces Six Weeks of Claude Code Quality Complaints to Three Overlapping Product Changes

Our take

Anthropic's recent postmortem reveals that six weeks of code quality complaints regarding Claude can be traced back to three overlapping product changes: a downgrade in reasoning efforts, a caching bug that compromised the model's own logic, and a verbosity limit in system prompts that resulted in a 3% drop in quality. Importantly, the API and model weights remained unaffected, and all identified issues were resolved by April 20.

Anthropic's recent postmortem sheds light on a series of quality complaints regarding Claude Code, tracing them back to three specific product-layer changes. This transparency is commendable, particularly in an industry that often grapples with the complexities of AI development. The identified issues—including a reasoning effort downgrade, a caching bug, and a system prompt verbosity limit—resulted in a noticeable decline in quality, yet the swift resolution of these problems illustrates a commitment to maintaining high standards. This is particularly relevant in an era where companies like Pinterest are also addressing technical challenges to improve performance, as seen in their efforts to eliminate CPU starvation issues affecting machine learning training jobs. As we navigate this rapidly evolving landscape, such proactive measures are essential.

The significance of these findings extends beyond the immediate quality concerns. They highlight the interconnectedness of product changes and the potential ripple effects on user experience. The fact that a 3% quality drop can arise from seemingly minor adjustments underscores the delicate balance that AI developers must strike. It's a reminder that while innovation is crucial, it must be approached with caution. Anthropic's experience serves as a case study for other organizations, including those involved in scaling social systems within software organizations, to consider the implications of their product changes more holistically. The risks associated with overlapping updates can lead to unexpected consequences, which can impact user trust and satisfaction.

Moreover, the resolution of these issues without affecting the API and model weights is a testament to Anthropic's technical resilience. Users rely on stability when integrating AI into their workflows, and any perceived decline in quality can result in hesitance to adopt or continue using a product. This situation reminds us of the critical importance of feedback loops in AI development. Open communication channels, such as those highlighted in discussions around OpenAI's new API voice models, can help bridge the gap between user expectations and product capabilities. By fostering an environment where users feel comfortable voicing concerns, companies can better navigate the complexities of AI technology and enhance overall user satisfaction.

Looking ahead, the question remains: how will companies like Anthropic evolve their product development processes to mitigate similar issues in the future? The industry is at a pivotal juncture, where the integration of AI into everyday tools is becoming increasingly commonplace. As more organizations embrace AI-driven solutions, the pressure to deliver consistent, high-quality experiences will only intensify. The challenge will be to maintain a balance between rapid innovation and the reliability that users demand. Watching how Anthropic and its peers respond to these challenges will be critical in shaping the future of AI technology and its acceptance across various sectors.

In conclusion, the postmortem analysis by Anthropic serves as a vital reminder of the complexities inherent in AI development. By learning from these experiences and maintaining a focus on user outcomes, companies can pave the way for a more reliable and transformative data management landscape. As we observe these developments, we must consider how transparency and proactive problem-solving will define the next generation of AI tools.

Anthropic Traces Six Weeks of Claude Code Quality Complaints to Three Overlapping Product Changes

Anthropic published a postmortem tracing six weeks of Claude Code quality complaints to three overlapping product-layer changes: a reasoning effort downgrade, a caching bug that progressively erased the model's own thinking, and a system prompt verbosity limit that caused a 3% quality drop. The API and model weights were unaffected. All issues were resolved April 20.

By Steef-Jan Wiggers

Read on the original site

Open the publisher's page for the full experience

View original article →

Mystery solved: Anthropic reveals changes to Claude's harnesses and operating instructions likely caused degradationFor several weeks, a growing chorus of developers and AI power users claimed that Anthropic’s flagship models were losing their edge. Users across GitHub, X, and Reddit reported a phenomenon they described as "AI shrinkflation"—a perceived degradation where Claude seemed less capable of sustained reasoning, more prone to hallucinations, and increasingly wasteful with tokens. Critics pointed to a measurable shift in behavior, alleging that the model had moved from a "research-first" approach to a lazier, "edit-first" style that could no longer be trusted for complex engineering. While the company initially pushed back against claims of "nerfing" the model to manage demand, the mounting evidence from high-profile users and third-party benchmarks created a significant trust gap. Today, Anthropic addressed these concerns directly, publishing a technical post-mortem that identified three separate product-layer changes responsible for the reported quality issues. "We take reports about degradation very seriously," reads Anthropic's blog post on the matter. "We never intentionally degrade our models, and we were able to immediately confirm that our API and inference layer were unaffected." Anthropic claims it has resolved the issues by reverting the reasoning effort change and the verbosity prompt, while fixing the caching bug in version v2.1.116. The mounting evidence of degradation The controversy gained momentum in early April 2026, fueled by detailed technical analyses from the developer community. Stella Laurenzo, a Senior Director in AMD’s AI group, published an exhaustive audit of 6,852 Claude Code session files and over 234,000 tool calls on Github showing performance falling from her usage before. Her findings suggested that Claude’s reasoning depth had fallen sharply, leading to reasoning loops and a tendency to choose the "simplest fix" rather than the correct one. This anecdotal frustration was seemingly validated by third-party benchmarks. BridgeMind reported that Claude Opus 4.6’s accuracy had dropped from 83.3% to 68.3% in their tests, causing its ranking to plummet from No. 2 to No. 10. Although some researchers argued these specific benchmark comparisons were flawed due to inconsistent testing scopes, the narrative that Claude had become "dumber" became a viral talking point. Users also reported that usage limits were draining faster than expected, leading to suspicions that Anthropic was intentionally throttling performance to manage surging demand. The causes In its post-morem bog post, Anthropic clarified that while the underlying model weights had not regressed, three specific changes to the "harness" surrounding the models had inadvertently hampered their performance: Default Reasoning Effort: On March 4, Anthropic changed the default reasoning effort from high to medium for Claude Code to address UI latency issues. This change was intended to prevent the interface from appearing "frozen" while the model thought, but it resulted in a noticeable drop in intelligence for complex tasks. A Caching Logic Bug: Shipped on March 26, a caching optimization meant to prune old "thinking" from idle sessions contained a critical bug. Instead of clearing the thinking history once after an hour of inactivity, it cleared it on every subsequent turn, causing the model to lose its "short-term memory" and become repetitive or forgetful. System Prompt Verbosity Limits: On April 16, Anthropic added instructions to the system prompt to keep text between tool calls under 25 words and final responses under 100 words. This attempt to reduce verbosity in Opus 4.7 backfired, causing a 3% drop in coding quality evaluations. Impact and future safeguards The quality issues extended beyond the Claude Code CLI, affecting the Claude Agent SDK and Claude Cowork, though the Claude API was not impacted. Anthropic admitted that these changes made the model appear to have "less intelligence," which they acknowledged was not the experience users should expect. To regain user trust and prevent future regressions, Anthropic is implementing several operational changes: Internal Dogfooding: A larger share of internal staff will be required to use the exact public builds of Claude Code to ensure they experience the product as users do. Enhanced Evaluation Suites: The company will now run a broader suite of per-model evaluations and "ablations" for every system prompt change to isolate the impact of specific instructions. Tighter Controls: New tooling has been built to make prompt changes easier to audit, and model-specific changes will be strictly gated to their intended targets. Subscriber Compensation: To account for the token waste and performance friction caused by these bugs, Anthropic has reset usage limits for all subscribers as of April 23. The company intends to use its new @ClaudeDevs account on X and GitHub threads to provide deeper reasoning behind future product decisions and maintain a more transparent dialogue with its developer base.

Tagged with

#no-code spreadsheet solutions#spreadsheet API integration#rows.com#Claude Code#quality drop#postmortem#reasoning effort downgrade#caching bug#product-layer changes#system prompt verbosity limit#code quality#technical postmortem#model thinking#issue resolution#product changes#quality issues#performance degradation#API#model weights#progressive erasure

Anthropic Traces Six Weeks of Claude Code Quality Complaints to Three Overlapping Product Changes

Related Articles

Tagged with