Liquid AI's smallest model yet LFM2.5-230M beats models 4X its size at data extraction, can run 'anywhere'
Our take

Liquid AI's unveiling of LFM2.5-230M marks a significant shift in the AI landscape, particularly for enterprises grappling with data management. Founded by former MIT computer scientists, the company's latest model, a remarkably compact 230-million-parameter foundation, challenges the prevailing trend of brute-force scaling in AI. The ability to run "anywhere" – from smartphones to robotics – alongside its surprising performance exceeding models four times its size in data extraction, presents a compelling alternative to the computationally expensive and latency-prone reliance on massive cloud-based models. This development arrives at a time when concerns about the responsible rollout of powerful AI models are escalating, as evidenced by the White House recently asking OpenAI to slow roll the release of its new model over safety concerns The White House is asking OpenAI to slow roll the release of its new model over safety concerns. We’re also seeing product adjustments across the tech landscape prompted by rising costs, as Xbox recently demonstrated with its own price increases Xbox follows Apple with price increases.
The brilliance of LFM2.5-230M lies not in its sheer size, but in its architectural ingenuity. Liquid AI’s LFM2 framework, a hybrid system of gated short-range convolutions and grouped-query attention, avoids the quadratic memory costs associated with traditional transformer architectures. This allows for impressive inference speeds and a remarkably small memory footprint, demonstrated by its performance on devices ranging from a Samsung Galaxy S25 Ultra to a Raspberry Pi 5. The implications for businesses are profound. Traditional Extract, Transform, Load (ETL) processes are often brittle and require constant maintenance, while relying on large language models for routine tasks like invoice parsing or telemetry routing is prohibitively expensive. LFM2.5-230M offers a compelling solution: a lightweight, on-device engine that automates data formatting and parsing at a fraction of the cost and latency, reducing reliance on constantly connected cloud APIs. Its performance relative to other small models—outpacing even Google’s Gemma 3 1B IT—further solidifies its value proposition.
The dual-use commercial license adopted by Liquid AI is a shrewd move, balancing accessibility with a degree of protection against corporate absorption. By offering free use for smaller entities, they foster grassroots adoption and innovation, while reserving the right to negotiate commercial agreements with larger corporations. This approach aligns with a broader trend of democratizing access to AI technology, moving away from the exclusive domain of a handful of tech giants. Comparing LFM2.5-230M to other "small" models like Weibo's VibeThinker-3B highlights its specialized focus. While models like VibeThinker excel at complex reasoning tasks, LFM2.5-230M shines in data extraction and tool calling—a crucial capability showcased through its successful integration with a Unitree G1 humanoid robot, translating complex instructions into structured action plans. This demonstrates the potential for on-device AI to power increasingly sophisticated robotics and autonomous systems.
Ultimately, Liquid AI’s LFM2.5-230M isn't about replacing the largest AI models; it’s about redefining the possibilities for edge computing and on-device intelligence. The company's focus on architectural efficiency over brute force scaling represents a refreshing and potentially transformative approach. As AI continues to permeate various industries, we’ll be watching closely to see if this shift towards smaller, more specialized models gains further traction and whether other companies will follow Liquid AI's lead in prioritizing efficiency and accessibility over sheer parameter count. Will this model spark a wider re-evaluation of the current AI scaling paradigm, or will the industry continue its relentless pursuit of ever-larger models?
Liquid AI, founded by former MIT computer scientists, today released its smallest AI language model yet, LFM2.5-230M, and enterprises would do well to consider it for their uses in data extraction and local deployment on smartphones, laptops and robotics.
This is a 230-million-parameter foundation model explicitly designed for on-device agentic workflows, and as Liquid states in its release blog post, that small size makes it possible to run nearly "anywhere." According to Liquid, it also outperforms models more than 4X its size on selected benchmarks, specifically doing better at data extraction than the 800 million parameter count Alibaba Qwen3.5-0.8B (Instruct) and 1-billion parameter Google Gemma 3 1B.
The model targets developers and engineers building lightweight data extraction pipelines and autonomous edge systems.
Operating under a dual-use commercial license, the model remains free for individuals and companies generating less than $10 million in annual revenue, while requiring a paid enterprise agreement for larger corporations.
This release distinguishes itself from other small AI models by utilizing the LFM2 architecture to achieve high inference speeds without the massive memory overhead typical of parameter-heavy transformers.
While major AI companies Anthropic, OpenAI, Google, Microsoft, Meta and others push parameter counts into the hundreds of billions or trillions to achieve frontier performance, a parallel race focuses entirely on the edge and local deployments.
Liquid AI's launch of LFM2.5-230M signals a pivotal shift toward architectural efficiency over brute-force scaling. By squeezing 19 trillion tokens of pre-training into a 230-million-parameter footprint, the company demonstrates that edge devices do not need massive computational power or persistent cloud connections to execute complex, multi-step agentic workflows.
How LFM2.5-230M works
The LFM2.5-230M model diverges from standard transformer architectures, relying instead on the LFM2 framework. This architecture functions as a hybrid system, interleaving gated short-range convolutions with grouped-query attention to process information efficiently.
For those tracking the evolution of efficient architectures, Liquid’s approach shares a similar conceptual goal: managing long contexts and sequential data effectively on edge hardware without the quadratic memory costs of pure attention mechanisms. The model supports an expansive 32K context window, allowing it to ingest substantial documents or continuous streams of robotic telemetry.
When analyzing the performance charts provided in the release, the architectural efficiency becomes visually apparent. The model maintains a memory footprint of under 400MB while achieving prefill and decode speeds that outpace comparable models like Gemma 3 1B IT and Granite 4.0-H-350M.
On a Samsung Galaxy S25 Ultra equipped with a Qualcomm Snapdragon Gen4 CPU, the model reaches a decode speed of 213 tokens per second. Even on a highly constrained Raspberry Pi 5, the model maintains a decode rate of 42 tokens per second. Furthermore, internal benchmarking shows the GPU inference stack delivers lower end-to-end latency than competing small models across all concurrency levels.
Why it matters for enterprises
To understand why a 230-million-parameter model is necessary, one must look at how enterprises currently manage data.
Organizations have traditionally relied on rigid, rule-based Extract, Transform, Load (ETL) scripts to move and process data. However, these legacy systems are notoriously brittle; a simple change in a document's layout or a schema update can break the entire pipeline.
To solve this, the industry is shifting toward "AI ETL," where machine learning infers mappings, detects schema drift, and adapts to changes automatically. In a modern lightweight data extraction pipeline, an AI model connects to unstructured sources—like PDFs, emails, or web forms—and structures the data into formats like JSON without requiring hardcoded rules.
For enterprises, using a massive flagship model like Claude Opus 4.6 (which costs $5.00 per million input tokens) to parse routine invoices, format addresses, or route telemetry data is economically unviable.
This is where models like LFM2.5-230M become critical. Designed explicitly as a lightweight extraction engine, it allows companies to automate repetitive formatting and data parsing at a fraction of the compute cost and latency, running directly on local hardware rather than relying on expensive, continuous cloud API calls.
Small Model Benchmarks: LFM vs. The 3B Class
The AI industry in mid-2026 is seeing a renaissance in "small" models, but the definition of "small" varies wildly.
Recently, the open-weight community was stunned by Weibo's VibeThinker-3B, a 3-billion-parameter model built on a Qwen2-style backbone that achieved a massive 94.3 on the AIME 2026 math benchmark, rivaling 600-billion-parameter behemoths through aggressive data curation and reinforcement learning.
Similarly, Google's Gemma 4 family — which recently crossed 200 million downloads — pushes frontier AI to the edge, including the E2B (2 billion parameters) designed specifically for mobile and IoT deployments.
By contrast, Liquid AI's LFM2.5-230M operates in a completely different weight class. At just 230 million parameters, it is roughly one-tenth the size of Google's smallest Gemma 4 model and VibeThinker-3B.
Because of its microscopic footprint, LFM2.5-230M is not designed to compete on reasoning-heavy workloads like advanced math, coding, or creative writing—a constraint Liquid AI explicitly acknowledges.
However, in its intended domains of data extraction and tool calling, the model punches well above its weight class.
Benchmarks released by Liquid AI show LFM2.5-230M scoring 43.26 on the BFCLv3 tool-use benchmark, dominating IBM's Granite 4.0-350M (39.58) and completely outpacing larger 1-billion-parameter models like Google's Gemma 3 1B IT (16.61).
On CaseReportBench for data extraction, it scores 22.51, decimating the Qwen3.5-0.8B (Instruct).
LFM2.5-230M proves that while 3-billion-parameter models like VibeThinker are solving advanced calculus, a 230-million-parameter model is the superior, highly optimized choice for executing structured tool calls and keeping agentic pipelines running efficiently on constrained hardware.
Advanced research uses
Because it excels at tool calling, LFM2.5-230M functions primarily as a skill-selection layer. Liquid AI demonstrated this capability by deploying the model on a Unitree G1 humanoid robot.
Running entirely on-device via the robot's onboard NVIDIA Jetson Orin compute module, the model successfully processes complex environmental commands.
As noted in the company's technical blog, the model takes a free-form instruction like, *"Hold still for 2 seconds, then walk forward at 1 meter per second for 3 meters, hold a forward one-leg kneel for 5 seconds, and walk backward at 0.5 meters per second for 3 meters,"* and automatically translates it into a structured multi-step plan calling on pre-trained low-level skills provided by NVIDIA's SONIC framework.
The base and post-trained models are available immediately on Hugging Face, with native day-one support across the inference ecosystem for llama.cpp (GGUF), MLX, vLLM, SGLang, and ONNX.
Dual-use, custom LFM Open License
Liquid AI ships LFM2.5-230M under the LFM Open License v1.0. Despite the word "open" in the title, this is not an Open Source Initiative (OSI) compliant license; it operates as a restricted, dual-use commercial framework.
For independent developers, researchers, and early-stage startups, the license functions identically to open-source software.
Users receive a perpetual, worldwide, royalty-free license to reproduce, modify, and distribute the model, provided they retain original copyright notices and prominently state any modifications.
However, the license includes a strict "Commercial Use Limitation". Any legal entity generating $10 million or more in annual revenue loses the right to use the model commercially under this agreement.
Large enterprises crossing this financial threshold must negotiate a separate, paid commercial agreement with Liquid AI to deploy the model in production.
This strategy protects the company from having its intellectual property absorbed by major technology conglomerates for free, while still seeding the model at the grassroots developer level.
Read on the original site
Open the publisher's page for the full experience