May 12, 2026•8 min read•from VentureBeat

Perceptron Mk1 shocks with highly performant video analysis AI model 80-90% cheaper than Anthropic, OpenAI & Google

Our take

Perceptron Inc. has launched its flagship video analysis AI model, Mk1, offering advanced capabilities at a fraction of the cost of competitors like Anthropic and OpenAI. Priced at just $0.15 per million input tokens, Mk1 empowers enterprises to harness real-time video insights for security, marketing, and research applications. With a focus on understanding object dynamics and physical interactions, this model sets a new standard for accessible, efficient AI solutions. Explore how Perceptron Mk1 can transform your workflows and elevate your data management strategies.

The recent launch of Perceptron’s Mk1 video analysis AI model has the potential to redefine how enterprises leverage video content and data analysis. Priced significantly lower than industry giants like Anthropic, OpenAI, and Google, Mk1 is not just a cost-effective alternative; it introduces advanced capabilities that could democratize access to sophisticated AI solutions. By offering a model that can analyze live video feeds for various applications—ranging from security surveillance to marketing content optimization—Perceptron is positioning itself to disrupt a market dominated by legacy tools and high costs. This shift toward more accessible AI tools parallels trends in cybersecurity, as highlighted by articles such as Protect your enterprise now from the Shai-Hulud worm and npm vulnerability in 6 actionable steps and US bank discloses security lapse after sharing customer data with AI app, which emphasize the increasing need for innovative yet practical solutions in a rapidly evolving tech landscape.

The capabilities of Mk1 extend beyond simple video analysis. Its ability to understand temporal continuity and physical reasoning allows it to effectively process complex visual information in real-time. This creates significant opportunities across various sectors, including marketing, where businesses can efficiently clip highlights for social media, and in research, where the model can assess body language and actions in a controlled study. Such applications can streamline workflows and enhance productivity, demonstrating a clear shift from traditional, manual methods toward more intelligent systems that intuitively understand their surroundings. This is a compelling example of how AI is transforming not just tasks but entire workflows, echoing the themes discussed in the article Kevin Hartz’s A* just closed its third fund with $450M, which reflects ongoing investment in innovative tech solutions.

What stands out is Perceptron’s commitment to the "Efficiency Frontier," which aims to deliver high performance at a fraction of the cost of leading competitors. By targeting this dual goal of affordability and capability, Perceptron is not only challenging existing models but also setting a precedent for future developments in the AI landscape. The emphasis on cost-effective, high-performance AI tools might encourage more organizations, particularly smaller enterprises, to adopt advanced technologies that have historically been out of reach. This could catalyze a significant shift in how AI is applied across industries, making it more pervasive and integrated into everyday tasks.

Looking ahead, the implications of Mk1's release are profound. As organizations increasingly recognize the value of data-driven decision-making, the demand for accessible AI tools that can analyze and interpret complex visual data will only grow. The ability of Mk1 to maintain object identity and understand physical interactions in video streams opens up new avenues for applications in robotics and smart technology. It raises essential questions about how we will integrate such technologies into our daily practices and what ethical considerations will arise as AI continues to evolve. As we navigate this new frontier, the focus should be on ensuring that these innovative solutions not only enhance productivity but also align with user needs and ethical standards in data management.

Perceptron Mk1 shocks with highly performant video analysis AI model 80-90% cheaper than Anthropic, OpenAI & Google

AI that can see and understand what's happening in a video — especially a live feed — is understandably an attractive product to lots of enterprises and organizations. Beyond acting as a security "watchdog" over sites and facilities, such an AI model could also be used to clip out the most exciting parts of marketing videos and repurpose them for social, identify inconsistencies and gaffs in videos and flag them for removal, and identify body language and actions of participants in controlled studies or candidates applying for new roles.

While there are some AI models that offer this type of functionality today, it's far from a mainstream capability. The two-year-old startup Perceptron Inc. is seeking to change all that, however. Today, it announced the release of its flagship proprietary video analysis reasoning model, Mk1 (short for "Mark One") at a cost — $0.15 per million tokens input / $1.50 per million output through its application programming interface (API) — that comes in about 80-90% less than other leading proprietary rivals, namely, Anthropic's Claude Sonnet 4.5, OpenAI's GPT-5, and Google's Gemini 3.1 Pro.

Led by Co-founder and CEO Armen Aghajanyan, formerly of Meta FAIR and Microsoft, the company spent 16 months developing a "multi-modal recipe" from the ground up to address the complexities of the physical world.

This launch signals a new era where models are expected to understand cause-and-effect, object dynamics, and the laws of physics with the same fluency they once applied to grammar.

Interested users and potential enterprise customers can try it out for themselves on a public demo site from Perceptron here.

Performance across spatial and video benchmarks

The model's performance is backed by a suite of industry-standard benchmarks focused on grounded understanding.

In spatial reasoning (ER Benchmarks), Mk1 achieved a score of 85.1 on EmbSpatialBench, surpassing Google’s Robotics-ER 1.5 (78.4) and Alibaba’s Q3.5-27B (approx. 84.5).

In the specialized RefSpatialBench, Mk1's score of 72.4 represents a massive leap over competitors like GPT-5m (9.0) and Sonnet 4.5 (2.2), highlighting a significant advantage in referring expression comprehension.

Video benchmarks show similar dominance; on the EgoSchema "Hard Subset"—where first-and-last-frame inference is insufficient—Mk1 scored 41.4, matching Alibaba’s Q3.5-27B and significantly beating Gemini 3.1 Flash-Lite (25.0).

On the VSI-Bench, Mk1 reached 88.5, the highest recorded score among the compared models, further validating its ability to handle actual temporal reasoning tasks.

Market positioning and the efficiency frontier

Perceptron has explicitly targeted the "Efficiency Frontier," a metric that plots mean scores across video and embodied reasoning benchmarks against the blended cost per million tokens.

Benchmarking data reveals that Mk1 occupies a unique position: it matches or exceeds the performance of "frontier" models like GPT-5 and Gemini 3.1 Pro while maintaining a cost profile closer to "Lite" or "Flash" versions.

Specifically, Perceptron Mk1 is priced at $0.15 per million input tokens and $1.50 per million output tokens. In comparison, the "Efficiency Frontier" chart shows GPT-5 at a significantly higher blended cost (near $2.00) and Gemini 3.1 Pro at approximately $3.00, while Mk1 sits at the $0.30 blended cost mark with superior reasoning scores.

This aggressive pricing strategy is intended to make high-end physical AI accessible for large-scale industrial use rather than just experimental research.

Architecture and temporal continuity

The technical core of Perceptron Mk1 is its ability to process native video at up to 2 frames per second (FPS) across a significant 32K token context window.

Unlike traditional vision-language models (VLMs) that often treat video as a disjointed sequence of still images, Mk1 is designed for temporal continuity.

This architecture allows the model to "watch" extended streams and maintain object identity even through occlusions, a critical requirement for robotics and surveillance applications.

Developers can query the model for specific moments in a long stream and receive structured time codes in return, streamlining the process of video clipping and event detection.

Reasoning with the laws of physics

A primary differentiator for Mk1 is its "Physical Reasoning" capability. Perceptron defines this as a high-precision spatial awareness that allows the model to understand object dynamics and physical interactions in real-world settings.

For example, the model can analyze a scene to determine if a basketball shot was taken before or after a buzzer by jointly reasoning over the ball's position in the air and the readout on a shot clock.

This requires more than just pattern recognition; it requires an understanding of how objects move through space and time.

The model is capable of "pixel-precise" pointing and counting into the hundreds within dense, complex scenes. It can also read analog gauges and clocks, which have historically been difficult for purely digital vision systems to interpret with high reliability.

It also seems to have strong general world and historical knowledge. In my brief test, I uploaded a vintage public domain film of skyscraper construction in New York City dated 1906 from the U.S. Library of Congress, and Mk1 was able to not only correctly describe the contents of the footage — including odd, atypical sights as workers being suspended by ropes — but did so rapidly and even correctly identified the rough date (early 1900s) from the look of the footage alone.

A developer platform for physical AI

Accompanying the model release is an expanded developer platform designed to turn these high-level perception capabilities into functional applications with minimal code.

The Perceptron SDK, available via Python, introduces several specialized functions such as "Focus," "Counting," and "In-Context Learning".

The Focus feature allows users to zoom and crop into specific regions of a frame automatically based on a natural language prompt, such as detecting and localizing personal protective equipment (PPE) on a construction site. The Counting function is optimized for dense scenes, such as identifying and pointing to every puppy in a group or individual items of produce.

Furthermore, the platform supports in-context learning, allowing developers to adapt Mk1 to specific tasks by providing just a few examples, such as showing an image of an apple and instructing the model to label every instance of Category 1 in a new scene.

Licensing strategies and the Isaac series

Perceptron is employing a dual-track strategy for its model weights and licensing. The flagship Perceptron Mk1 is a closed-source model accessed via API, designed for enterprise-grade performance and security.

However, the company is also maintaining its "Isaac" series, which kicked off with the launch of Isaac 0.1 in September 2025, as an open-weights alternative. Isaac 0.2-2b-preview, released in December 2025, is a 2-billion parameter vision-language model with reasoning capabilities that is available for edge and low-latency deployments.

While the weights for the Isaac models are open on the popular AI code sharing community Hugging Face, Perceptron offers commercial licenses for companies that require maximum control or on-premise deployment of the weights.

This approach allows the company to support both the open-source community and specialized industrial partners who need proprietary flexibility. The documentation notes that Isaac 0.2 models are specifically optimized for sub-200ms time-to-first-token, making them ideal for real-time edge devices.

Background on Perceptron founding and focus

Perceptron AI is a Bellevue, Washington-based physical AI startup founded by Aghajanyan and Akshat Shrivastava, both former research scientists at Meta’s Facebook AI Research (FAIR) lab.

The company’s public materials date its founding to November 2024, while a Washington corporate filing record for Perceptron.ai Inc. shows an earlier foreign registration filing on October 9, 2024, listing Shrivastava and Aghajanyan as governors.

In founder launch posts from late 2024, Aghajanyan said he had left Meta after nearly six years and “joined forces” with Shrivastava to build AI for the physical world, while Shrivastava said the company grew out of his work on efficiency, multimodality and new model architectures.

The founding appears to have followed directly from the pair’s work on multimodal foundation models at Meta. In May 2024, Meta researchers published Chameleon, a family of early-fusion models designed to understand and generate mixed sequences of text and images, work that Perceptron later described as part of the lineage behind its own models.

A July 2024 follow-on paper, MoMa, explored more efficient early-fusion training for mixed-modal models and listed both Shrivastava and Aghajanyan among the authors. Perceptron’s stated thesis extends that research direction into “physical AI”: models that can process real-world video and other sensory streams for use cases such as robotics, manufacturing, geospatial analysis, security and content moderation.

Partner ecosystems and future outlook

The real-world impact of Mk1 is already being demonstrated through Perceptron's partner network. Early adopters are using the model for diverse applications, such as auto-clipping highlights from live sports, which leverages the model's temporal understanding to identify key plays without human intervention.

In the robotics sector, partners are curating teleoperation episodes into training data, effectively automating the process of labeling and cleaning data for robotic arms and mobile units.

Other use cases include multimodal quality control agents on manufacturing lines, which can detect defects and verify assembly steps in real-time, and wearable assistants on smart glasses that provide context-aware help to users.

Aghajanyan stated that these releases are the culmination of research intended to make AI function best in the physical world, moving toward a future where "physical AI" is as ubiquitous as digital AI.

Read on the original site

Open the publisher's page for the full experience

View original article →

Tagged with

#generative AI for data analysis#Excel alternatives for data analysis#natural language processing for spreadsheets#real-time data collaboration#real-time collaboration#financial modeling with spreadsheets#big data performance#conversational data analysis#data analysis tools#machine learning in spreadsheet applications#cloud-based spreadsheet applications#natural language processing#enterprise data management#data cleaning solutions#big data management in spreadsheets#enterprise-level spreadsheet solutions#intelligent data visualization#data visualization tools#google sheets#digital transformation in spreadsheet software