Google’s new AI agent can draft your emails, monitor your inbox and eventually spend your money
Our take

Google's recent unveiling of Gemini Spark at the Google I/O 2026 marks a significant leap in the evolution of personal AI agents, showcasing the company's ambition to transform its AI assistant from a reactive tool into an autonomous entity capable of executing complex tasks. This development is not just about enhancing user convenience; it signifies a pivotal shift in how we interact with technology. As we see major players like Microsoft, OpenAI, and Apple racing to create AI systems that actively manage workflows, Google's introduction of Spark is both timely and strategically critical. For context, this announcement comes on the heels of Google redesigning the search box for the first time in 25 years, which emphasizes the company's broader commitment to innovation in user experience and functionality. Similarly, the capabilities of the new Gemini 3.5 Flash model illustrate how Google intends to slash enterprise AI costs by over $1 billion annually, reinforcing its leadership in AI technology.
The architecture behind Spark, which allows it to run persistently on Google Cloud, underscores a fundamental shift towards a more integrated, cloud-based experience. Unlike traditional AI assistants that require user prompts to activate, Spark operates continuously, providing users with a seamless experience that blends various Google applications. This integration fosters a more intuitive and efficient workflow, which is crucial in today's fast-paced digital landscape. For users overwhelmed by the complexity of managing multiple tasks, Spark's ability to synthesize information from emails, documents, and calendars into cohesive outputs could significantly enhance productivity. However, this raises important questions about the trust we place in AI. As Google acknowledges, the ability for Spark to manage financial transactions autonomously introduces complexities surrounding user intent and potential misuse. It's a delicate balance between empowering users and ensuring robust safeguards.
Moreover, Google’s approach to spending safeguards, likened to giving a teenager their first debit card, illustrates the nuanced challenges of implementing AI systems that act on our behalf. While the promise of automation is enticing, it is crucial to address issues of reliability and privacy. AI models, regardless of their sophistication, can misinterpret instructions or make errors that could lead to serious consequences. Google’s insistence on explicit user approval for significant actions represents a cautious but necessary measure to mitigate risks. Yet, this could also limit the true autonomy that users may expect from such advanced technology.
Looking ahead, the launch of Gemini Spark prompts us to consider the broader implications of AI in our daily lives. As these technologies become more integrated into our routines, users will need to renegotiate their relationship with AI—from merely seeking information to placing trust in systems that handle critical tasks. The industry is at a crossroads, and Google’s bold move could set the pace for how AI assistants will evolve in the coming years. As we observe the rollout of Spark to testers and its eventual availability to subscribers, the question remains: will users embrace this level of automation, or will concerns over privacy and control temper their enthusiasm? The answer will shape the future of AI-driven productivity tools and how we interact with technology.
Google on Tuesday unveiled Gemini Spark, a personal AI agent designed to work around the clock — drafting emails, assembling documents, monitoring inboxes, and eventually making purchases — even when a user's laptop is closed and their phone is locked.
The announcement, made at Google I/O 2026, is the company's most ambitious attempt yet to transform its AI assistant from a tool that answers questions into one that autonomously completes tasks. It also arrives at a moment of extraordinary competition, as Microsoft, OpenAI, Anthropic, and Apple all race to build AI systems that don't merely converse but act — completing multi-step workflows with decreasing human supervision.
"We are in that part of the cycle where people want to see real value in the products they use on a day-to-day basis," Sundar Pichai, CEO of Google and Alphabet, said during a press briefing ahead of the keynote address. With Spark, he argued, that value comes from an agent that never stops working. It operates around the clock in Google's cloud, he said, so "you don't need to keep your laptop open to make sure it's running."
The product arrives at an inflection point for the technology industry, as Google, Microsoft, OpenAI, Anthropic, and Apple all race to build AI systems that don't merely converse but do — completing multi-step workflows with decreasing human supervision. It also raises urgent questions about trust, spending guardrails, and what happens when an artificial intelligence agent misinterprets a user's intent.
Spark will begin rolling out this week to a small group of trusted testers, with a beta planned for Google AI Ultra subscribers in the United States next week.
Inside the cloud architecture that lets Gemini Spark work while you sleep
Unlike conventional AI assistants that activate only when prompted, Gemini Spark is architecturally different. It runs persistently on Google Cloud infrastructure, powered by the company's new Gemini 3.5 Flash model and what Google calls the Antigravity agent harness — the same underlying system that powers the company's internal developer tools.
In practical terms, this means Spark can accept a complex instruction — "email my boss a status update pulling the latest figures from our shared spreadsheet and the project timeline in our Slides deck" — and then execute it across multiple Google applications without further input. The agent can pull context from emails, documents, and calendar entries, synthesize the information, and produce a finished output.
Josh Woodward, VP of Google Labs, Gemini App, and AI Studio, described the experience in visceral terms during the briefing: "When you use it, it almost feels like you're tossing things over your shoulder — Spark's catching them and gets the job done."
The cloud-based architecture is a deliberate design choice. Because Spark operates on remote servers rather than on a user's device, it can continue working through tasks after a user walks away. A student could ask Spark to build a study guide that updates itself as new assignments arrive from a professor. A small business owner could instruct it to monitor their inbox and flag potential customer inquiries. A parent could delegate the logistics of a neighborhood block party — tracking RSVPs, coordinating contributions, scouting venues. These are not hypothetical scenarios. Woodward said they reflect how early testers have actually been using the product.
Over the coming months, Google plans to expand Spark's capabilities significantly. The company will roll out MCP (Model Context Protocol) connections to more than 30 third-party partners, including Canva, OpenTable, and Instacart. Users will also be able to text and email Spark directly, create custom sub-agents for specialized tasks, and connect Spark to Chrome for web-based actions. Later this year, a new Android interface called Android Halo will provide live, at-a-glance visibility into what Spark is working on, displayed at the top of a user's phone screen.
Google compares its AI spending safeguards to giving a teenager their first debit card
For all its ambition, Spark confronts a fundamental challenge that has bedeviled every AI agent to date: How do you trust an autonomous system to act on your behalf — particularly when money is involved?
Google is acutely aware of the concern. When asked during the press briefing how Spark would avoid making unauthorized purchases, Woodward reached for an analogy that was striking in its candor. "On the team, we think a lot of it is like if you're giving a teenager their first debit card — there's sort of limits and sort of constraints around it, and that's how we'll be designing Spark as we go through the year," he said.
At launch, Spark will not autonomously make purchases. Users will be given explicit opportunities to review and approve any transaction before it goes through. But Google has built the infrastructure for a more autonomous future. Vidhya Srinivasan, who leads Google's ads and commerce teams, introduced the Agent Payments Protocol, or AP2 — a system designed to let AI agents make secure purchases within user-defined boundaries.
The concept works like this: a user tells their agent the specific brands, products, and spending limits they're comfortable with. If the criteria are met, the agent can automatically complete a purchase. AP2 creates what Google describes as a transparent, verifiable link between the user, the merchant, and payment processors, using privacy-preserving technology and tamper-proof digital mandates to ensure the agent is acting within its authorization. AP2 also generates a permanent digital paper trail, so that if a return is needed, the user and the merchant are looking at the same record. Google plans to bring AP2 to its products in the coming months, starting with Gemini Spark.
The system is underpinned by the Universal Commerce Protocol (UCP), an open-source standard Google announced earlier this year that gives agents and commerce systems a common language across the entire shopping journey. The UCP Tech Council now includes Amazon, Meta, Microsoft, Salesforce, and Stripe — a remarkable coalition that underscores how seriously the industry takes the prospect of agent-driven commerce.
Google also announced the Universal Cart, an intelligent shopping cart that works across merchants and Google services. Users can add items while browsing Search, chatting with Gemini, watching YouTube, or reading Gmail. The cart then works in the background — tracking price drops, surfacing deals based on payment card perks, and even flagging product incompatibilities. The shopping infrastructure is rolling out in the U.S. this summer across Search and the Gemini app, with YouTube and Gmail to follow.
How Google, OpenAI, Microsoft, Anthropic, and Apple are racing to build the definitive AI agent
The announcement lands in the middle of the most intense competitive period in AI history. Google, Microsoft, OpenAI, Anthropic, and Apple are all racing to ship autonomous agents that can do real work — and each is placing a fundamentally different architectural bet on how to get there.
OpenAI recently unified its Operator and deep research capabilities into ChatGPT agent — a system that brings together website interaction, information synthesis, and conversational intelligence. It carries out tasks using its own virtual computer, shifting between reasoning and action to handle complex workflows. The company emphasizes that users remain in control, with ChatGPT requesting permission before taking consequential actions. But the product has faced scrutiny over reliability. OpenAI's Computer-Using Agent scores 38.1% on OSWorld, the industry benchmark for computer use tasks, while humans score over 72%.
Anthropic launched its Claude Computer Use Agent in research preview in March, giving Claude the ability to see, navigate, and control a user's desktop — clicking buttons, opening applications, filling spreadsheets, and completing multi-step workflows. Claude Cowork handles tasks autonomously — users give it a goal and Claude works on their computer, local files, and applications to return a finished deliverable. Anthropic has iterated aggressively, recently shipping ten pre-built financial agents and pursuing deep Microsoft 365 integration.
Microsoft introduced Copilot Cowork to move beyond chat and into execution — helping users delegate real tasks and have them completed. Cowork runs in the cloud, meaning users don't have to worry about closing their laptop. The system is grounded in Work IQ, Microsoft's intelligence layer that understands organizational data, tools, and structure. The shift moves Copilot from a sidebar helper to an orchestrator of autonomous agents.
Apple is also preparing a revamped Siri for WWDC 2026 that will act as an "always-on agent" capable of handling tasks across apps using personal data. Google's Gemini models will help power the upgraded Siri through a multi-year deal reportedly costing Apple around $1 billion per year.
The convergence is unmistakable: every major platform is moving from assistants that talk to agents that act. But each is approaching the problem differently. OpenAI's agent operates primarily through a browser. Anthropic's works directly on a user's desktop. Microsoft's is tightly bound to the Office 365 ecosystem. Apple's emphasizes on-device processing and privacy. Google's approach with Spark is distinctive in its bet on cloud persistence and deep integration with its own services.
Rather than controlling a user's screen pixel by pixel, Spark works through structured integrations — Google's own Workspace APIs, and increasingly, third-party connections through MCP. The advantage is reliability and speed: structured tool use is far more predictable than screen-reading. The disadvantage is that Spark, at least initially, can only act within the systems it's been connected to.
The AI model behind Spark processes trillions of tokens a day — and Google says it could save enterprises billions
Spark's capabilities are inseparable from the model that drives it. Gemini 3.5 Flash, also announced Monday, is Google's new workhorse AI model — designed specifically for the demands of agentic workflows.
The performance claims are important. Google says 3.5 Flash outperforms its previous frontier model, Gemini 3.1 Pro, across nearly all benchmarks, while running four times faster than comparable frontier models in terms of output tokens per second. An even more optimized version, available within Google's Antigravity development platform, runs twelve times faster.
Pichai framed the economics bluntly. Companies processing roughly one trillion tokens per day on Google Cloud — a figure he said top enterprise customers are hitting — could save over $1 billion annually by shifting 80% of their workloads to a mix of Flash and frontier models like 3.5 Pro. In a market where, as Pichai noted, CIOs are already "blowing through their annual token budgets and it's only May," the cost argument may matter as much as the capability argument.
Internally, Google's own developers have been consuming Gemini 3.5 Flash at a staggering and rapidly accelerating pace. In March, Google was processing about half a trillion tokens per day internally. That figure has since grown to more than three trillion — doubling roughly every few weeks. Pichai described this as a "powerful feedback loop" that continually improves the model.
Koray Kavukcuoglu, CTO of Google DeepMind and Chief AI Architect for Google, said the model's speed is what makes agentic use cases practical. "3.5 Flash is especially good when deploying multiple agents simultaneously and completing long-running tasks," he said during the briefing, adding that Google had successfully tested agents building "a working operating system entirely from scratch."
The 3.5 Pro model, the more powerful sibling, is currently being tested internally and will roll out next month.
What Gemini Spark costs and where it fits in Google's new subscription tiers
Gemini Spark will be available to Google AI Ultra subscribers. The company is simultaneously restructuring its subscription tiers to make the technology more accessible. A new Ultra plan at $100 per month provides a 5x higher usage limit than the Pro plan, along with priority access to Antigravity and 20TB of cloud storage. The top-tier Ultra plan drops from $250 to $200 per month, with a 20x higher usage limit and access to the full suite of capabilities.
Both tiers include Gemini Spark, the Daily Brief agent — a proactive morning digest that triages email, calendar, and tasks overnight — and access to the new Gemini Omni and 3.5 Flash models. The pricing positions Spark as a premium product — more expensive than Anthropic's Claude Pro at $20 per month, but comparable to the higher tiers of competing products like Claude Max ($100–$200/month) and OpenAI's ChatGPT Pro ($200/month).
Why privacy, reliability, and ecosystem lock-in could undermine Google's agent ambitions
The risks are real and multidimensional.
Reliability remains the industry's greatest challenge. Even the best AI models hallucinate, misinterpret instructions, and make errors that a human would never make. An agent that drafts an email to the wrong person, misreads a spreadsheet figure, or sends a payment to the wrong merchant could create consequences that are difficult to reverse. Google's approach of requiring explicit approval for high-stakes actions like spending money or sending emails is a sensible safeguard — but it also limits how autonomous the agent can actually be. An agent that asks for confirmation at every turn isn't much of an agent at all.
Privacy is another concern. Spark's ability to synthesize information across a user's entire Gmail inbox, calendar, documents, and chat history means it has an extraordinarily deep view of a person's digital life. Google says Spark operates on a fully managed, secure runtime with isolated ephemeral virtual machines, encrypted credentials, and Data Loss Prevention policies. But the concentration of personal context in a single AI system — accessible through natural language — creates a surface area that will attract scrutiny from regulators, privacy advocates, and security researchers.
Market timing is uncertain, too. The consumer appetite for always-on AI agents is unproven at scale. Google says the Gemini app has 900 million monthly users, but it's unclear how many of those users are ready for the conceptual leap from "ask a question, get an answer" to "delegate a task, trust the outcome." The history of digital assistants — from Clippy to early Siri to Alexa — is littered with products that promised proactive intelligence and delivered frustration.
And then there is the question of ecosystem lock-in. Spark works best within Google's own services. While MCP connections to third-party apps will broaden its reach, the initial experience is one of deep Workspace integration. For the billions of people who live inside Google's ecosystem, this is a natural fit. For those who split their digital lives across Microsoft, Apple, and other platforms, Spark's utility will be more limited — at least initially.
Woodward acknowledged as much when asked whether Spark would remain confined to the Google ecosystem. "It's going to be cross-platform in two ways," he said — through MCP integrations with third-party apps, and through availability on the web, Android, and iOS, with tasks syncing across devices via the cloud.
The real test for Gemini Spark isn't whether it can do the work — it's whether people will let it
Google's bet with Gemini Spark is that the AI industry's center of gravity is shifting from models that think to systems that act — and that the company best positioned to win that transition is the one with the most comprehensive set of consumer services to act within. It is a bet backed by enormous infrastructure investment. Google expects to spend approximately $180 to $190 billion in capital expenditure this year — roughly six times what it spent in 2022 — much of it on the AI compute required to run agents like Spark at scale for hundreds of millions of users.
The technology, in other words, is arriving. The models are fast enough, the integrations deep enough, the payment rails secure enough. Google has built a system that can draft your emails, organize your calendar, monitor your inbox, and soon enough, spend your money — all while you sleep.
But the hardest problem in artificial intelligence has never been making a machine capable. It has been making a human comfortable. For two decades, Google's core promise has been ten blue links and a search box — a transaction built on the assumption that the user is in control. Gemini Spark asks users to renegotiate that relationship entirely, to hand a set of keys to a system that is brilliant, tireless, and still, by its maker's own admission, best compared to a teenager with a debit card.
Gemini Spark rolls out to trusted testers this week, with a broader beta for U.S. Google AI Ultra subscribers expected next week.
Read on the original site
Open the publisher's page for the full experience
Related Articles
- Google and AWS split the AI agent stack between control and executionThe era of enterprises stitching together prompt chains and shadow agents is nearing its end as more options for orchestrating complex multi-agent systems emerge. As organizations move AI agents into production, the question remains: "how will we manage them?" Google and Amazon Web Services offer fundamentally different answers, illustrating a split in the AI stack. Google’s approach is to run agentic management on the system layer, while AWS’s harness method sets up in the execution layer. The debate on how to manage and control gained new energy this past month as competing companies released or updated their agent builder platforms—Anthropic with the new Claude Managed Agents and OpenAI with enhancements to the Agents SDK—giving developer teams options for managing agents. AWS with new capabilities added to Bedrock AgentCore is optimizing for velocity—relying on harnesses to bring agents to product faster—while still offering identity and tool management. Meanwhile, Google’s Gemini Enterprise adopts a governance-focused approach using a Kubernetes-style control plane. Each method offers a glimpse into how agents move from short-burst task helpers to longer-running entities within a workflow. Upgrades and umbrellas To understand where each company stands, here’s what’s actually new. Google released a new version of Gemini Enterprise, bringing its enterprise AI agent offerings—Gemini Enterprise Platform and Gemini Enterprise Application—under one umbrella. The company has rebranded Vertex AI as Gemini Enterprise Platform, though it insists that, aside from the name change and new features, it’s still fundamentally the same interface. “We want to provide a platform and a front door for companies to have access to all the AI systems and tools that Google provides,” Maryam Gholami, senior director, product management for Gemini Enterprise, told VentureBeat in an interview. “The way you can think about it is that the Gemini Enterprise Application is built on top of the Gemini Enterprise Agent Platform, and the security and governance tools are all provided for free as part of Gemini Enterprise Application subscription.” On the other hand, AWS added a new managed agent harness to Bedrock Agentcore. The company said in a press release shared with VentureBeat that the harness “replaces upfront build with a config-based starting point powered by Strands Agents, AWS’s open source agent framework.” Users define what the agent does, the model it uses and the tools it calls, and AgentCore does the work to stitch all of that together to run the agent. Agents are now becoming systems The shift toward stateful, long-running autonomous agents has forced a rethink of how AI systems behave. As agents move from short-lived tasks to long-running workflows, a new class of failure is emerging: state drift. As agents continue operating, they accumulate state—memory, too, responses and evolving context. Over time, that state becomes outdated. Data sources change, or tools can return conflicting responses. But the agent becomes more vulnerable to inconsistencies and becomes less truthful. Agent reliability becomes a systems problem, and managing that drift may need more than faster execution; it may require visibility and control. It’s this failure point that platforms like Gemini Enterprise and AgentCore try to prevent. Though this shift is already happening, Gholami admitted that customers will dictate how they want to run and control any long-running agent. “We are going to learn a lot from customers where they would be using long-running agents, where they just assign a task to these autonomous agents to just go ahead and do,” Gholami said. “Of course, there are tricks and balances to get right and the agent may come back and ask for more input.” The new AI stack What’s becoming increasingly clear is that the AI stack is separating into distinct layers, solving different problems. AWS and, to a certain extent, Anthropic and OpenAI, optimize for faster deployment. Claude Managed Agents abstracts much of the backend work for standing up an agent, while the Agents SDK now includes support for sandboxes and a ready-made harness. These approaches aim to lower the barrier to getting agents up and running. Google offers a centralized control panel to manage identity, enforce policies and monitor long-running behaviors. Enterprises likely need both. As some practitioners see it, their businesses have to have a serious conversation on how much risk they are willing to take. “The main takeaway for enterprise technology leaders considering these technologies at the moment may be formulated this way: while the agent harness vs. runtime question is often perceived as build vs. buy, this is primarily a matter of risk management. If you can afford to run your agents through a third-party runtime because they do not affect your revenue streams, that is okay. On the contrary, in the context of more critical processes, the latter option will be the only one to consider from a business perspective,” Rafael Sarim Oezdemir, head of growth at EZContacts, told VentureBeat in an email. Iterating quickly lets teams experiment and discover what agents can do, while centralized control adds a layer of trust. What enterprises need is to ensure they are not locked into systems designed purely for a single way of executing agents.
- Claude, OpenClaw and the new reality: AI agents are here — and so is the chaosThe age of agentic AI is upon us — whether we like it or not. What started with an innocent question-answer banter with ChatGPT back in 2022 has become an existential debate on job security and the rise of the machines. More recently, fears of reaching artificial general intelligence (AGI) have become more real with the advent of powerful autonomous agents like Claude Cowork and OpenClaw. Having played with these tools for some time, here is a comparison. First, we have OpenClaw (formerly known as Moltbot and Clawdbot). Surpassing 150,000 GitHub stars in days, OpenClaw is already being deployed on local machines with deep system access. This is like a robot “maid” (Irona for Richie Rich fans, for instance) that you give the keys to your house. It’s supposed to clean it, and you give it the necessary autonomy to take actions and manage your belongings (files and data) as it pleases. The whole purpose is to perform the task at hand — inbox triaging, auto-replies, content curation, travel planning, and more. Next we have Google’s Antigravity, a coding agent with an IDE that accelerates the path from prompt to production. You can interactively create complete application projects and modify specific details over individual prompts. This is like having a junior developer that can not only code, but build, test, integrate, and fix issues. In the realworld, this is like hiring an electrician: They are really good at a specific job and you only need to give them access to a specific item (your electric junction box). Finally, we have the mighty Claude. The release of Anthropic's Cowork, which featured AI agents for automating legal tasks like contract review and NDA triage, caused a sharp sell-off in legal-tech and software-as-a-service (SaaS) stocks (referred to as the SaaSpocalypse). Claude has anyway been the go-to chatbot; now with Cowork, it has domain knowledge for specific industries like legal and finance. This is like hiring an accountant. They know the domain inside-out and can complete taxes and manage invoices. Users provide specific access to highly-sensitive financial details. Making these tools work for you The key to making these tools more impactful is giving them more power, but that increases the risk of misuse. Users must trust providers like Anthorpic and Google to ensure that agent prompts will not cause harm, leak data, or provide unfair (illegal) advantage to certain vendors. OpenClaw is open-source, which complicates things, as there is no central governing authority. While these technological advancements are amazing and meant for the greater good, all it takes is one or two adverse events to cause panic. Imagine the agentic electrician frying all your house circuits by connecting the wrong wire. In an agent scenario, this could be injecting incorrect code, breaking down a bigger system or adding hidden flaws that may not be immediately evident. Cowork could miss major saving opportunities when doing a user's taxes; on the flip side, it could include illegal writeoffs. Claude can do unimaginable damage when it has more control and authority. But in the middle of this chaos, there is an opportunity to really take advantage. With the right guardrails in place, agents can focus on specific actions and avoid making random, unaccounted-for decisions. Principles of responsible AI — accountability, transparency, reproducibility, security, privacy — are extremely important. Logging agent steps and human confirmation are absolutely critical. Also, when agents deal with so many diverse systems, it's important they speak the same language. Ontology becomes very important so that events can be tracked, monitored, and accounted for. A shared domain-specific ontology can define a “code of conduct." These ethics can help control the chaos. When tied together with a shared trust and distributed identity framework, we can build systems that enable agents to do truly useful work. When done right, an agentic ecosystem can greatly offload the human “cognitive load” and enable our workforce to perform high-value tasks. Humans will benefit when agents handle the mundane. Dattaraj Rao is innovation and R&D architect at Persistent Systems.
- Claude, OpenClaw and the new reality: AI agents are here — and so is the chaosThe age of agentic AI is upon us — whether we like it or not. What started with an innocent question-answer banter with ChatGPT back in 2022 has become an existential debate on job security and the rise of the machines. More recently, fears of reaching artificial general intelligence (AGI) have become more real with the advent of powerful autonomous agents like Claude Cowork and OpenClaw. Having played with these tools for some time, here is a comparison. First, we have OpenClaw (formerly known as Moltbot and Clawdbot). Surpassing 150,000 GitHub stars in days, OpenClaw is already being deployed on local machines with deep system access. This is like a robot “maid” (Irona for Richie Rich fans, for instance) that you give the keys to your house. It’s supposed to clean it, and you give it the necessary autonomy to take actions and manage your belongings (files and data) as it pleases. The whole purpose is to perform the task at hand — inbox triaging, auto-replies, content curation, travel planning, and more. Next we have Google’s Antigravity, a coding agent with an IDE that accelerates the path from prompt to production. You can interactively create complete application projects and modify specific details over individual prompts. This is like having a junior developer that can not only code, but build, test, integrate, and fix issues. In the realworld, this is like hiring an electrician: They are really good at a specific job and you only need to give them access to a specific item (your electric junction box). Finally, we have the mighty Claude. The release of Anthropic's Cowork, which featured AI agents for automating legal tasks like contract review and NDA triage, caused a sharp sell-off in legal-tech and software-as-a-service (SaaS) stocks (referred to as the SaaSpocalypse). Claude has anyway been the go-to chatbot; now with Cowork, it has domain knowledge for specific industries like legal and finance. This is like hiring an accountant. They know the domain inside-out and can complete taxes and manage invoices. Users provide specific access to highly-sensitive financial details. Making these tools work for you The key to making these tools more impactful is giving them more power, but that increases the risk of misuse. Users must trust providers like Anthorpic and Google to ensure that agent prompts will not cause harm, leak data, or provide unfair (illegal) advantage to certain vendors. OpenClaw is open-source, which complicates things, as there is no central governing authority. While these technological advancements are amazing and meant for the greater good, all it takes is one or two adverse events to cause panic. Imagine the agentic electrician frying all your house circuits by connecting the wrong wire. In an agent scenario, this could be injecting incorrect code, breaking down a bigger system or adding hidden flaws that may not be immediately evident. Cowork could miss major saving opportunities when doing a user's taxes; on the flip side, it could include illegal writeoffs. Claude can do unimaginable damage when it has more control and authority. But in the middle of this chaos, there is an opportunity to really take advantage. With the right guardrails in place, agents can focus on specific actions and avoid making random, unaccounted-for decisions. Principles of responsible AI — accountability, transparency, reproducibility, security, privacy — are extremely important. Logging agent steps and human confirmation are absolutely critical. Also, when agents deal with so many diverse systems, it's important they speak the same language. Ontology becomes very important so that events can be tracked, monitored, and accounted for. A shared domain-specific ontology can define a “code of conduct." These ethics can help control the chaos. When tied together with a shared trust and distributed identity framework, we can build systems that enable agents to do truly useful work. When done right, an agentic ecosystem can greatly offload the human “cognitive load” and enable our workforce to perform high-value tasks. Humans will benefit when agents handle the mundane. Dattaraj Rao is innovation and R&D architect at Persistent Systems.
- Google says Gemini 3.5 Flash can slash enterprise AI costs by more than $1 billion a yearGoogle unveiled Gemini 3.5 Flash at its annual I/O developer conference on Tuesday, a new artificial intelligence model that the company says shatters what had become a seemingly iron law of the AI industry: that the smartest models must also be the slowest and most expensive to run. The model sits at the center of a sweeping set of announcements — from a video-generating "world model" called Gemini Omni to a 24/7 personal AI agent called Gemini Spark — but 3.5 Flash carries perhaps the most immediate consequence for the enterprises pouring billions of dollars into AI infrastructure. Sundar Pichai, Google's chief executive, told reporters during a press briefing Monday that companies running roughly one trillion tokens per day on Google Cloud could save more than $1 billion annually by shifting 80 percent of their workloads to a mix of Flash and other frontier models. "You've probably heard anecdotes from other CIOs that companies are already blowing through their annual token budgets, and it's only May," Pichai said, framing the model not just as a technical achievement but as a financial lifeline for organizations struggling with the runaway costs of deploying AI at scale. The claim, if it holds, would be one of the most significant shifts in the economics of enterprise AI since large language models entered corporate computing. Why enterprises have been forced to choose between AI quality and AI speed For the past three years, organizations adopting generative AI have faced a painful trade-off. The most capable models — the ones that can reason through complex multistep problems, write reliable code, and parse dense financial documents — tend to be large, slow, and expensive to query. Faster, cheaper models sacrifice accuracy. Chief information officers have been forced into a kind of AI portfolio management: routing simple queries to lightweight models and reserving the heavy-duty reasoning engines for high-stakes tasks. It is a complex, brittle system that adds engineering overhead and often delivers inconsistent user experiences. Gemini 3.5 Flash attacks that trade-off directly. According to Google's internal benchmarks and a third-party analysis from Artificial Analysis, the model outperforms Google's own Gemini 3.1 Pro — a model the company positioned as its top-tier flagship just four to five months ago — on nearly every major benchmark. It scores 76.2 percent on Terminal-Bench 2.1, reaches 1656 Elo on GDPval-AA, hits 83.6 percent on MCP Atlas, and leads in multimodal understanding with 84.2 percent on CharXiv Reasoning. Yet it does all of this while generating output tokens at four times the speed of comparable frontier models from competitors. Koray Kavukcuoglu, chief technology officer of Google DeepMind and chief AI architect for Google, told reporters the team has pushed even further: "We have developed an even more optimized version of Flash, not just four times, but actually 12 times faster with the same quality." That turbo variant is available starting Tuesday inside Antigravity, Google's agentic development platform. Pichai put the performance gap in blunt terms: "3.5 Flash is better than 3.1 Pro, which was just four months ago, and it's at the almost, I would say, 90% of the performance of frontier models, 4x faster, much faster in Antigravity, maybe 12x, and about 1/3 to one half the cost." Landing in what Artificial Analysis calls the "top-right quadrant" of its intelligence-versus-speed index — the only model to do so — Flash occupies a position no competitor currently holds. The trillion-token math behind Google's $1 billion savings claim To understand why Flash matters so much to enterprise buyers, you need to understand the economics of tokens — the fundamental units of data that AI models process. Every query a customer service chatbot answers, every legal document an AI summarizes, every line of code an agent writes, consumes tokens. And at frontier-model pricing, those tokens add up fast. Google says its model APIs now process around 19 billion tokens per minute. Across all of Google's own surfaces — Search, the Gemini app, Workspace, and more — the company processes over 3.2 quadrillion tokens per month, a figure that has jumped seven-fold in the past year alone. Two years ago, at I/O 2024, the number was 9.7 trillion per month. The explosion in token consumption is not unique to Google. Enterprises across industries are discovering that the more capable their AI deployments become, the more tokens they burn. Agentic workflows — where AI systems autonomously execute multistep tasks, call tools, write and run code, and iterate on their own output — are particularly token-hungry. A single agentic coding session can consume orders of magnitude more tokens than a simple question-and-answer exchange. This is where Flash's cost advantage becomes transformative. The model delivers what Google describes as frontier-level capabilities at less than half the price, in some cases almost a third the price, of comparable frontier models. For a hypothetical enterprise processing one trillion tokens per day on Google Cloud — a scale Pichai said top customers are already reaching — the savings from shifting 80 percent of workloads to a Flash-and-frontier blend would exceed $1 billion per year. That is not a rounding error. It is the kind of number that reshapes procurement decisions, accelerates deployment timelines, and fundamentally alters the return-on-investment calculus for AI initiatives that many boards of directors have been scrutinizing with increasing impatience. How Google's own engineers created a data flywheel that rivals cannot easily copy Perhaps the most strategically significant detail Google shared Tuesday was not a benchmark score or a price point. It was a chart showing the company's own internal token consumption on Antigravity 2.0, its reimagined agentic development platform. In March 2026, Google's developers were processing roughly half a trillion tokens per day inside Antigravity. By the time of the I/O press briefing in mid-May, that figure had surged past three trillion — a six-fold increase in approximately ten weeks, with usage doubling "literally every few weeks," according to Pichai. This internal usage creates what AI researchers call a data flywheel: the more Google's own engineers use 3.5 Flash to build products, the more real-world signal the model team collects on where the model excels and where it stumbles. That signal feeds back into model improvement, which makes the model more useful, which drives more usage, which generates more signal. It is a virtuous cycle — and it is one that competing AI labs, which rely primarily on external developer usage and synthetic benchmarks, cannot easily replicate at the same speed or fidelity. "That scale creates a powerful feedback loop, and that is what has allowed us to keep improving the 3.5 series of models," Pichai said. When pressed during the Q&A about the competitive frontier — particularly in light of recent advances from rival labs — Pichai acknowledged the landscape is "very dynamic" and "moving fast" but expressed confidence in Google's breadth. He added that the company's focus with the 3.5 series has been on "taking the model intelligence, making sure tool use, instruction following, long horizon use cases, agent decoding all work well." Kavukcuoglu reinforced the agentic emphasis, noting that 3.5 Flash "can now handle multi-hour autonomous sessions" and "can independently execute complex coding pipelines or manage iterative research projects entirely by itself." The team, he said, even tested the model by having agents build a working operating system entirely from scratch. Antigravity 2.0 transforms Google's code editor into an agent command center The arrival of 3.5 Flash is tightly coupled with the launch of Antigravity 2.0, a significant expansion of the agentic development platform Google first introduced six months ago. What began as a coding environment has evolved into what Google describes as a full platform for developing and managing teams of autonomous AI agents, and the company says millions of developers are already building with it. Antigravity 2.0 ships as a new standalone desktop application that serves as a central hub for orchestrating multiple agents simultaneously. Google offered the example of running one agent to code a website, a second to generate brand assets, and a third to plan product architecture — all in parallel, all managed from a single interface. For developers who prefer command-line workflows, there is Antigravity CLI. And for those building programmatic integrations, the new Antigravity SDK provides direct access to the same agent harness powering Google's own first-party products. The co-development of 3.5 Flash and Antigravity 2.0 is no accident. "We have co-developed 3.5 Flash together with Google Antigravity, our agentic development platform," Kavukcuoglu said. This tight integration means Flash's strengths — speed, tool use, long-context reasoning, and code generation — are specifically tuned for the kinds of workloads developers execute inside the platform. Google is also launching Managed Agents in the Gemini API, allowing developers to spin up an agent with a single API call that reasons, uses tools, and executes code in an isolated Linux environment. And it introduced CodeMender, an AI security agent that uses Gemini's advanced reasoning to automatically find and fix critical code vulnerabilities — a capability Kavukcuoglu described as essential as agentic systems write an increasing share of the world's code. Google's $190 billion infrastructure bet and the custom silicon powering cheaper AI The models and platforms sit atop a staggering infrastructure investment that Pichai revealed during the briefing: Google expects capital expenditures of approximately $180 billion to $190 billion in 2026 — roughly six times the $31 billion the company spent in 2022, just four years ago. A key component of that spending is custom silicon. The company recently unveiled its eighth generation of Tensor Processing Units, adopting for the first time a dual-chip architecture with specialized designs for training (TPU 8o) and inference (TPU 8i). Google says it can now distribute model training across multiple data center sites using a system called Pathways, scaling beyond one million TPUs globally — a setup the company claims constitutes the largest training cluster in the world. "This means training larger, more capable models in weeks, rather than months," Pichai said. The infrastructure advantage matters enormously for Flash's economics. Custom silicon optimized for inference means Google can run Flash at lower cost per token than competitors relying on general-purpose GPUs, and the savings get passed along — at least partially — to customers. The capex figure also signals something strategic about Google's long-term posture. While some investors have grown nervous about the astronomical sums cloud providers are spending on AI infrastructure, Google is framing the spending as a competitive moat. The more infrastructure it builds, the cheaper it can run inference, the more attractive its models become, and the more usage it captures to improve the next generation. It is the flywheel logic again, extended from software all the way down to silicon. Gemini Omni, Spark, and the consumer products Flash now powers at massive scale While the enterprise cost story dominates the Flash narrative, Google also made sweeping moves on the consumer side that put the model to work across products reaching billions of people. Flash is now the default model powering the Gemini app — which has surpassed 900 million monthly active users, more than doubling from 400 million a year ago — and AI Mode in Google Search, which has crossed one billion monthly users in its first year. Google introduced Gemini Spark, a 24/7 personal AI agent that runs on dedicated virtual machines in Google Cloud and operates in the background even when a user's device is off. Powered by 3.5 Flash with the full Antigravity harness, Spark integrates with Gmail, Docs, Sheets, and Slides. Josh Woodward, who leads Google Labs and the Gemini app, described the experience vividly: "When you use it, it almost feels like you're tossing things over your shoulder, Spark's catching them and gets the job done." On the safety front, Spark requires explicit user approval before high-stakes actions. Google also announced the Agent Payments Protocol, which lets users set strict guardrails — approved brands, spending caps, specific merchants — before an agent can spend money on their behalf. Woodward compared the design to "giving a teenager their first debit card — there's sort of limits and sort of constraints around it." Alongside Flash, Google unveiled Gemini Omni, a model capable of generating any output from any input, starting with video. Kavukcuoglu drew a sharp distinction from Google's existing Veo model: "Veo is a text-to-video model. Omni is a true and true multi-model input, multi-model output model." All Omni-generated content carries Google's SynthID watermark, and the company announced that OpenAI, Kakao, and ElevenLabs are adopting SynthID as well. The company also reimagined its search box for the first time in over 25 years, introduced information agents that monitor the web around the clock for user-defined conditions, and launched the Universal Cart — an AI-powered cross-merchant shopping cart built on Google Wallet. Liz Reid, who leads Google Search, called the new search box "the biggest upgrade to our iconic search box since its debut." What Google's six-month model cadence means for the enterprise AI cost curve Google signaled that 3.5 Flash is just the opening act of the 3.5 series. Gemini 3.5 Pro is currently in internal testing and will roll out to everyone next month. Kavukcuoglu indicated the company has been operating on roughly a six-month cadence for major model updates — Gemini 3 in November, 3.5 in May — and expects that rhythm to continue. When a reporter from The New York Times asked how Google determines whether a release warrants a full numerical jump or a half-step increment, Kavukcuoglu said the numbering reflects the magnitude of research progress: "What defines the numbering update is really the progress that we see in our research and how it is reflected in the models and the impact that they have." For enterprise buyers, that cadence carries an important implication: the cost-performance curve is not just improving — it is improving on a predictable schedule. A model that outperforms the previous flagship at a third the cost every six months fundamentally changes the planning horizon for AI investments. It means the token budgets that companies are blowing through today may look quaint by the end of the year. Google's announcements arrive at a moment of intense competition. OpenAI, Anthropic, Meta, and a constellation of smaller labs are all racing to deliver models that balance capability with cost. Microsoft has been aggressively integrating OpenAI's models into Azure and Copilot. But Google benefits from a structural advantage that is easy to overlook: distribution. With 13 products serving more than a billion users each — five of which exceed three billion — Google can deploy Flash to an audience no pure-play AI lab can match. Every improvement immediately benefits Search, Gmail, Docs, Maps, and YouTube. And the usage data flowing back from those billions of interactions feeds the very flywheel that makes the next model better. The question now is whether the $1 billion savings figure — an eye-catching projection based on a specific workload mix — will survive contact with the messy reality of corporate AI deployments, where legacy systems, compliance requirements, and organizational inertia have a way of blunting even the most compelling cost curves. But if Google's own internal usage is any guide — three trillion tokens a day and climbing, doubling every few weeks, with no sign of slowing — the company is not just selling the bet. It is making the bet itself, with its own engineers, on its own infrastructure, at a scale no customer has yet attempted. In the AI cost wars, the most persuasive pitch may simply be: we did it first.