May 21, 2026•8 min read•from VentureBeat

Alibaba's proprietary Qwen3.7-Max can run for 35 hours autonomously and supports external harnesses like Anthropic's Claude Code

Our take

Alibaba's Qwen3.7-Max marks a significant advancement in the AI landscape, boasting 35 hours of continuous autonomous operation. This proprietary model can execute complex tasks, positioning itself firmly in the emerging "agent era," where AI actively plans and adapts over extended periods. By integrating with external frameworks like Anthropic's Claude Code, Qwen3.7-Max offers enterprises a powerful tool for automation and innovation. However, its API-only access raises questions about accessibility, reflecting a shift from Alibaba's historically open approach.

Alibaba's proprietary Qwen3.7-Max can run for 35 hours autonomously and supports external harnesses like Anthropic's Claude Code

Alibaba's unveiling of Qwen3.7-Max marks a significant milestone in the AI landscape, heralding the true arrival of the "agent era." This new model's ability to autonomously execute tasks for approximately 35 hours without human intervention is not just a technical feat; it represents a transformative shift in how artificial intelligence can be utilized across various industries. As AI systems evolve from mere text generators to sophisticated agents capable of planning, executing, and refining complex tasks, the implications for businesses and developers are profound. The competitive dynamics are shifting, with Alibaba positioning itself as a formidable contender against established American giants like OpenAI and Google. As we explore these developments, it's essential to consider their broader significance within the context of AI evolution and user experience, especially when juxtaposed with ongoing narratives about technology, like those in articles such as Six search engines worth trying now that Google isn’t really Google anymore and Spotify and Universal Music strike deal allowing fan-made AI covers and remixes.

The technical prowess embedded in Qwen3.7-Max, particularly its ability to perform long-horizon reasoning and execute complex engineering tasks autonomously, demonstrates how far we’ve come in AI development. The model's design surpasses previous limitations often encountered by language models, such as memory degradation and logical loops during extended interactions. Instead, it was built to function as a "versatile agent foundation," capable of executing a staggering number of tool calls while continually refining its output. Such capabilities not only enhance productivity but also redefine the potential applications of AI in software development and enterprise automation. The fact that Qwen3.7-Max can be integrated into existing frameworks through its "cross-harness generalization" means it can seamlessly operate alongside various tools, fostering innovation while addressing the diverse needs of users.

However, the proprietary nature of Qwen3.7-Max raises questions about accessibility and the future of open-source AI. Historically, Alibaba's Qwen models have contributed significantly to the open-source community, allowing developers to run and adapt models on their own hardware. The shift to a strictly API-based model for Qwen3.7-Max could be seen as a retreat from the principles that have driven collaboration and innovation in the AI space. While this commercial strategy may be financially prudent for Alibaba, as it aligns with practices adopted by Western tech giants, it risks alienating users who value transparency and control over their AI solutions. This tension reflects a broader conversation within the tech community about the balance between corporate interests and the collaborative spirit that has fostered innovation in AI technologies.

Looking ahead, the emergence of Qwen3.7-Max prompts critical questions regarding the future landscape of AI. As we witness the continued evolution of AI capabilities, will there be a growing divide between proprietary and open-source solutions? How will this affect individual developers and smaller companies seeking to leverage advanced AI without compromising their data security or creative control? The industry is at a crossroads, facing the challenge of navigating these dynamics while fostering an environment that encourages exploration and innovation. Ultimately, the journey toward democratizing advanced AI technologies will likely continue to shape how we interact with and utilize these powerful tools in our daily lives. As Qwen3.7-Max exemplifies, the potential for AI to enhance user productivity and creativity is immense, yet the path forward requires thoughtful consideration of accessibility and ethical implications in the rapidly evolving technological landscape.

The AI industry has fully entered the "agent era," a paradigm where AI models do far more than generate text — they now actively plan, execute, and course-correct complex tasks over days rather than seconds.

Thus, it's perhaps unsurprising to see Chinese e-commerce giant Alibaba's famed Qwen Team of AI researchers release a model capable of performing autonomous agentic AI work over multiple days: that model has arrived in the form of Qwen3.7-Max which the company reports in a blog post achieved "~35 hours of continuous autonomous execution" — albeit, in a proprietary, not open source format, as prior Qwen Team releases were.

This is also to be expected — it's what many analysts and industry experts feared in the wake of the departure of several key Qwen Team leaders earlier this year. But it makes sense for Alibaba financially, at least in the short term: training AI models, especially ones as powerful as Qwen3.7-Max, is expensive, and giving them away essentially for free, as open source models are, does not immediately help recoup any costs.

In that sense, Alibaba is simply aligning its efforts with American AI giants like OpenAI and Google by offering the latest and greatest models only through paid APIs and subscription or paid web plan bundles, and slightly less performant ones through open source.

Still, the arrival of Qwen3.7-Max offers further optionality to enterprises and individual users, and more competition for American AI labs — rarely a bad thing for consumers at all budget levels. Yet, the fact that the model is only accessible from Chinese-based endpoints means it may be limited in its appeal to American and European enterprises seeking to maximize compliance and security posturing when fulfilling government contracts, or even just attempting to comply with all relevant state, local, and national data sovereignty regulations.

The marathon AI era

To understand why Qwen3.7-Max is a departure from previous models, one must look at how it was trained and how it operates in practice.

Language models typically degrade when forced to maintain a single train of thought over thousands of conversational turns; they forget instructions, hallucinate variables, or simply get stuck in logical loops. Qwen3.7-Max was specifically designed as a "versatile agent foundation" capable of "long-horizon reasoning" to overcome this exact bottleneck.

The starkest demonstration of this capability is an autonomous engineering task detailed by the Qwen team. The model was given access to an isolated server equipped with a T-Head ZW-M890 PPU—a hardware architecture the model had never encountered during its training. Its task was to optimize an attention kernel.

Over the course of 35 straight hours, Qwen3.7-Max operated entirely autonomously. It executed 1,158 distinct tool calls, performed 432 kernel evaluations, diagnosed compilation failures, and iteratively improved the code to achieve a 10.0x geometric mean speedup.

By comparison, Chinese competitor models like z.ai's GLM-5.1 and Moonshot's Kimi K2.6 capped out at 7.3x and 5.0x speedups respectively, often voluntarily terminating their sessions when they failed to make progress. However, both are available open source.

This endurance is achieved through what Alibaba calls "environment scaling". Just as early LLMs grew smarter by ingesting more diverse text, Qwen3.7-Max was trained across a vast, scaled array of dynamic agentic environments.

It is capable of simulating a one-year lifecycle of a startup in the "YC-Bench" evaluation, navigating hundreds of decision-making rounds encompassing personnel management and contract screening. In this simulation, the model managed to generate $2.08 million in virtual revenue, nearly doubling the performance of the prior generation, Qwen3.6-Plus.

Furthermore, the model has built-in reward-hacking self-monitoring, autonomously detecting when it attempts to cheat a training environment and adding heuristic rules to correct its own behavior.

A brain for any scaffold

From a product perspective, Qwen3.7-Max is designed to be the cognitive engine for modern software development and enterprise automation.

The model offers a massive 1-million-token context window and a 64K maximum output limit, providing immense overhead for processing sprawling codebases or lengthy technical documents.

One of its most compelling features is "cross-harness generalization". Rather than being hardcoded to work best within a specific proprietary interface, Qwen3.7-Max is built to act as a drop-in intelligence layer for diverse agent frameworks. It supports the Anthropic API protocol natively, allowing developers to plug it directly into existing tools like Claude Code or OpenClaw.

The benchmark data provided by Alibaba indicates that this generalized approach has paid massive dividends.

On the Apex Math Reasoning benchmark, Qwen3.7-Max scored 44.5, eclipsing Claude Opus-4.6 Max's score of 34.5 and DeepSeek V4-Pro Max's 38.3. It also posted dominant scores on Humanity's Last Exam (41.4) and the realistic coding agent benchmark MCP-Atlas (76.4).

This translates into tangible utility for end-users. Through open source Model Context Protocol (MCP) integrations, the model can operate as an autonomous office assistant, capable of reading university formatting specs and automatically reformatting a messy Word document via command-line tools without human intervention.

Running this level of intelligence comes at a distinct cost. Developers accessing the API via Alibaba Cloud Model Studio will pay $2.50 per 1 million input tokens and $7.50 per 1 million output tokens. The platform also features explicit cache creation and read pricing, as well as a $10 fee per 1,000 calls for integrated web searches, though code interpreter tools remain free for a limited time.

Qwen3.7-Max occupies a strategic middle ground in the current API economy. While it demands a notable premium over aggressively priced domestic rivals—costing nearly double DeepSeek V4 Pro ($5.22) and Z.ai's GLM-5.1 ($5.80)—it drastically undercuts the Western frontier giants it routinely matches on benchmarks.

For context, running heavy agentic workflows through OpenAI's GPT-5.4 or Anthropic's Claude Opus 4.7 will run developers $17.50 and $30.00 per million tokens, respectively. See VentureBeat's pricing chart below:

VentureBeat Frontier AI Model API Pricing Snapshot

Model	Input	Output	Total Cost	Source
MiMo-V2.5 Flash	$0.10	$0.30	$0.40	Xiaomi MiMo
MiniMax M2.7	$0.30	$1.20	$1.50	MiniMax
Gemini 3.1 Flash-Lite	$0.25	$1.50	$1.75	Google
MiMo-V2.5	$0.40	$2.00	$2.40	Xiaomi MiMo
Kimi-K2.6	$0.95	$4.00	$4.95	Moonshot/Kimi
GLM-5	$1.00	$3.20	$4.20	Z.ai
Grok 4.3 (low context)	$1.25	$2.50	$3.75	xAI
DeepSeek V4 Pro	$1.74	$3.48	$5.22	DeepSeek
GLM-5.1	$1.40	$4.40	$5.80	Z.ai
Claude Haiku 4.5	$1.00	$5.00	$6.00	Anthropic
Grok 4.3 (high context)	$2.50	$5.00	$7.50	xAI
Qwen3.7-Max	$2.50	$7.50	$10.00	Alibaba Cloud
Gemini 3.5 Flash	$1.50	$9.00	$10.50	Google
Gemini 3.1 Pro Preview (≤200K)	$2.00	$12.00	$14.00	Google
GPT-5.4	$2.50	$15.00	$17.50	OpenAI
Gemini 3.1 Pro Preview (>200K)	$4.00	$18.00	$22.00	Google
Claude Opus 4.7	$5.00	$25.00	$30.00	Anthropic
GPT-5.5	$5.00	$30.00	$35.00	OpenAI

By positioning Qwen3.7-Max just below Google's Gemini 3.5 Flash ($10.50) but well above budget-tier models, Alibaba is signaling that this isn't a commodity release; it’s a flagship reasoning engine priced to lure enterprise workloads away from Silicon Valley's most expensive offerings.

Licensing remains proprietary for now

For all its technical brilliance, the most controversial aspect of Qwen3.7-Max is how it is distributed. Qwen is billing the release as a "proprietary model". It is strictly API-only.

Historically, Alibaba’s Qwen has been a hero to the open-source and local LLM communities. Previous iterations, like Qwen 2.5 and Qwen 3.6, released their weights publicly. Open weights allow developers, researchers, and enterprises to download the model, run it on their own hardware, and fine-tune it for highly specific or data-sensitive use cases without sending proprietary information to a third-party server.

By locking Qwen3.7-Max behind an API, Alibaba is pivoting to the standard commercial playbook utilized by OpenAI (with GPT-4) and Anthropic (with Claude). For enterprise users, this means utilizing Qwen3.7-Max requires trusting Alibaba Cloud with their data streams and relying entirely on internet connectivity to run their agentic workflows. For the open-source community, it means losing access to what is currently one of the most capable models on the planet.

Community reactions split between awe and disappointment

The reaction from the developer community has been swift, characterized by a mix of profound respect for the engineering achievement and frustration over the licensing model.

Prominent AI commentator Sudo su (@sudoingX) captured the prevailing sentiment on X (formerly Twitter). "qwen is unreal," they wrote. "they just dropped 3.7 max and it is beating opus 4.6 max on most of the benchmarks they ran".

The technical metrics, particularly the model's endurance, have left many in the field stunned. "the apex math number, 44.5 against opus 34.5, that is not a small gap," Sudo su noted. "the 35 hours straight on a kernel optimization task with 1000+ tool calls is the part i keep rereading. that is the agent era thing actually happening, not a slide".

The speed of Alibaba's iteration is also drawing notice. With Qwen 3.6 released just last month, the leap to 3.7-Max highlights a relentless development cadence. As Sudo su observed, "nobody else is moving like this".

Yet, the praise is heavily caveated by the shift to a closed ecosystem. The loss of the model weights is seen as a blow to the localized AI movement, which relies on state-of-the-art open models to push the boundaries of what can be done on consumer hardware or private enterprise clusters.

"one thing though, please open source this one too," Sudo su pleaded in their post. "3.6 dense made the entire local llm ecosystem better. the max tier going api only would close a door we have been keeping open. give us the weights eventually".

Qwen3.7-Max proves that the autonomous agent era is no longer a theoretical projection; it is a present reality capable of executing complex engineering feats while humans sleep. The only question now is whether this new frontier of AI will be a democratized resource you can download to your laptop, or an intelligence utility rented strictly from the cloud. For now, with Qwen3.7-Max, it is undeniably the latter.

Read on the original site

Open the publisher's page for the full experience

View original article →

Tagged with

#financial modeling #generative AI automation #workflow automation #enterprise data management #cognitive automation