Google Made Its Fastest AI Model Free. Here's What It's Really Selling.

Sophie Larsen
2 days ago
10 min read

On May 19, Google set Gemini 3.5 Flash as the default model for 900 million users, at no extra charge. The model outperforms Google's own Pro-tier flagship on five separate benchmarks. No subscription change, no waitlist, no migration required. If you opened Google Search or the Gemini app that morning, you were already running it.

The timing sharpens the question. OpenAI's GPT-5.5 costs developers $5 per million input tokens. Anthropic's Claude Opus 4.7 costs the same. Gemini 3.5 Flash costs $1.50 per million input tokens, and at the consumer level google gemini is free by default. For a company that just shipped a model competitive with the most expensive offerings on the market, the decision to give it away demands an explanation.

The explanation is not new. In 2008, Google made Android free for phone manufacturers and Chrome free for users. Neither decision was about generosity. Both were moves to control the platform layer of an emerging market while monetizing everything above it. Nokia and BlackBerry built the best closed mobile operating systems of their era. Google gave away the open one and captured the market underneath both of them. Gemini Flash is the third time Google has run this play. This time, the companies in the Nokia position are OpenAI and Anthropic.

The Model That Arrived With 900 Million Users Already Installed

Most AI model launches follow a predictable arc: announcement, API access, developer adoption, gradual consumer rollout over weeks or months. Google skipped all of it. Gemini 3.5 Flash became generally available the same day it was announced at Google I/O 2026, and on that same day became the default model inside the Gemini app, AI Mode in Google Search, Antigravity 2.0, Vertex AI, Android Studio, AI Studio, and GitHub Copilot simultaneously.

What that means at scale: the model did not need to find users. It inherited them. Gemini had 900 million monthly active users on launch day, up from 400 million at I/O 2025. Google's AI Overviews, the AI-generated summaries that appear above traditional search results, has 2.5 billion monthly active users. AI Mode, Google's fully conversational search interface, has 1 billion. When Flash became the default, it immediately ran at the base of the largest AI delivery infrastructure in the world.

Token volumes make the scale concrete. At I/O 2025, Google processed 480 trillion tokens per month. By May 19, 2026, Sundar Pichai reported that number had grown to 3.2 quadrillion, a sevenfold increase in twelve months. More than 8.5 million developers build with Gemini APIs each month. At least 375 enterprise customers each process over one trillion tokens annually. Flash became the engine underneath all of it on day one.

The contrast with how competitors release models is operational, not cosmetic. When OpenAI shipped GPT-5.5 in April 2026, the model launched to an existing user base that had to actively switch to it, update integrations, or pay for upgraded access. When Google releases a model and sets it as the platform default, nothing is required of users. The model simply becomes the thing they are using.

Jeff Dean, Google's Chief Scientist, described the design intent in explicit agentic terms. Flash was built to "deploy sub-agents that collaborate, run high-frequency iterative loops, and solve real-world problems at scale." As a demonstration, Google showed Flash generating a functioning operating system in approximately 12 hours, using agent coordination across a sustained, complex task that would typically require a sustained engineering effort.

Google's Flash Model Beats Its Own Pro and Most Frontier Models from OpenAI and Anthropic

The word "Flash" in a model name has historically signaled a trade-off: faster and cheaper, but measurably less capable than the flagship tier it sits below. Gemini 3.5 Flash broke that convention.

On five benchmarks tied to real-world agentic and coding workflows, Flash 3.5 outperforms Gemini 3.1 Pro, a model in a higher pricing tier, and matches or leads competing models from OpenAI and Anthropic. On Terminal-Bench 2.1, which tests sustained multi-step command-line task completion, Flash 3.5 scores 76.2% against Pro's 70.3%. On MCP Atlas (multi-agent tool orchestration, the benchmark most directly tied to production agent pipelines), Flash 3.5 hits 83.6% versus Pro's 78.2%. On Finance Agent v2, it scores 57.9% against Pro's 43.0%. On GDPval-AA, an Elo-based measure of practical dialogue performance, Flash 3.5 records 1,656 against Pro's 1,314. On MMMU-Pro, which evaluates multimodal reasoning across image, video, and text, Flash 3.5 reaches 84%, the highest score recorded by any model at the time of the launch.

Speed extends the gap further. Flash 3.5 runs at 289 output tokens per second, according to the on-stage figure Sundar Pichai cited, compared to GPT-5.5's 71 tokens per second. The ratio is roughly four to one. Artificial Analysis, which independently benchmarks model performance, placed Flash 3.5 at 55 on its Intelligence Index, up 9 points from Gemini 3 Flash, driven primarily by agentic performance gains and a 31 percentage-point reduction in hallucination rate.

Flash 3.5 does not win every category. On Humanity's Last Exam, which tests deep academic knowledge across specialized domains, it trails Gemini 3.1 Pro. On ARC-AGI-2, designed to test novel abstract reasoning that cannot be pattern-matched from training data, GPT-5.5 holds a clear lead at 84.6% against Flash 3.5's 72.1%. For workflows that require sustained long-context retrieval or dense deductive reasoning, the Pro tier still has the edge.

But the benchmarks Flash 3.5 wins are the categories where AI adoption is accelerating fastest in 2026: agent tool use, finance automation, multimodal data processing, high-frequency iterative workflows. The model that leads those categories at Flash pricing is the model that gets built into production systems. That is the competitive position Google is trying to establish.

OpenAI and Anthropic Charge $5 Per Million Tokens. Google Charges $1.50, Then Gives It Away Free

The pricing gap between Google and its two primary AI competitors is structural, not marginal.

GPT-5.5 costs developers $5 per million input tokens and $30 per million output. Claude Opus 4.7 is priced at $5 input and $25 output. Gemini 3.5 Flash costs $1.50 input and $9 output, roughly one-third the input price and one-third the output price of the alternatives. For applications making millions of API calls, saving $3.50 per million input tokens is not a rounding error. It is a budget-level decision.

Sundar Pichai put a figure on it at I/O 2026: enterprises switching from competing AI services to Gemini could save more than $1 billion annually. That figure reflects Google's own modeling and should be read accordingly, but the directional arithmetic is straightforward at scale.

Consumer access goes further. For anyone using Google Search, the Gemini app, or AI Mode, Gemini 3.5 Flash is simply the model they are using, with no payment required. ChatGPT Plus costs $20 per month. Claude Pro costs $20 per month. Making google gemini free at the consumer layer, while matching or beating those products on agent benchmarks, applies pressure at both ends simultaneously.

The reason Google can structure access this way is that it does not need Flash to be a revenue center. OpenAI's revenue comes almost entirely from API fees and subscriptions. The model is the product. Anthropic generates approximately 80% of its revenue from enterprise contracts, making its pricing directly dependent on what customers will pay for model access. When Google sets gemini flash pricing at $1.50 per million tokens, it does not just make its own offering cheaper. It compresses the addressable margin for both companies.

Google's returns from Flash sit one level above the model. Every developer building on the Flash API pays for Google Cloud compute. Every consumer using AI Mode in Search generates advertising inventory inside Google's ecosystem. Every query handled by Flash rather than routed to a competitor keeps behavioral data inside Google's infrastructure. The model is the acquisition cost. Cloud, Search, and data are the returns.

The structural irony is that this dynamic already applies to Google's competitors directly. Google has invested $40 billion in Anthropic and operates the cloud infrastructure on which Anthropic runs its models. Every Claude inference processed on Google Cloud generates margin for Google Cloud. OpenAI has run training workloads on Google's TPU infrastructure. Google does not need to displace either company in the model market. It is positioned to profit from both outcomes regardless.

OpenAI's current financial position makes the pressure asymmetric. OpenAI reportedly fell short of Q1 revenue targets and is projected to burn approximately $25 billion in cash this year. For a company with that burn rate and no platform layer generating parallel revenue, competing with Google on pricing is a fundamentally different problem.

Google Has Done This Before. Android Was Free Too.

In 2008, the dominant smartphone platforms were Nokia's Symbian and BlackBerry OS. Both were licensed under proprietary terms. Both generated licensing revenue. Both were technically capable products for the devices of that era.

Google launched Android as a free, open-source operating system that any hardware manufacturer could adopt without licensing fees. The decision did not reflect a principle that mobile operating systems should be free. It reflected a business analysis: Google's revenue came from web search and advertising, and the fastest way to protect that revenue was to ensure every mobile device shipped with a default pathway into Google's services. Nokia and BlackBerry were not positioned to compete with a free alternative. Within four years, Android held the majority of the global smartphone market.

Chrome followed the same logic the same year. Internet Explorer held approximately 65% of the browser market. Google's advertising revenue depended on users spending time on web pages, and a slow, fragmented browser experience was a structural threat to that revenue. Chrome was built, given away, and became the dominant browser. Microsoft's browser market share did not recover.

The pattern is consistent: identify the infrastructure layer that controls access to users, provide it for free, and monetize at the layer above. For Android, the layers above were Search, advertising, and the Play Store. For Chrome, they were Search and ad inventory. For Gemini Flash, they are Google Cloud compute revenue, Google Search advertising, and the behavioral data generated by 900 million Gemini users at near-zero marginal distribution cost.

Google DeepMind documents the compute infrastructure that makes this possible: custom TPU chips lower training costs, Google Cloud monetizes that capacity externally, Gemini models run on the same infrastructure, and consumer products distribute AI output to billions of users at near-zero marginal cost. Each layer reinforces the others. OpenAI and Anthropic exist inside this structure as paying customers. They fund the compute that powers the platform Google is deploying against them.

This is the part of the Gemini Flash story that benchmark comparisons do not capture. The advantage is not primarily that Flash is faster or cheaper than GPT-5.5. The advantage is that Google can keep google gemini free at the consumer layer indefinitely, because the returns accumulate at layers where OpenAI and Anthropic have no presence.

The Catch: Flash 3.5 Costs Three Times More Per Token Than Its Predecessor

The "cheap" framing for Gemini 3.5 Flash holds relative to competitor pricing. It is less accurate relative to Google's own previous generation.

Gemini 3 Flash was priced at $0.50 per million input tokens and $3 per million output. Gemini 3.5 Flash is priced at $1.50 and $9, a tripling of the per-token rate for every developer who has been building on the Flash tier. TechTimes documented this directly on launch day, noting that despite the agent performance improvements, the 3x per-token cost increase over the previous Flash generation was not receiving equivalent attention. For a developer running high-volume inference on Gemini 3 Flash, migrating to 3.5 triples the API bill, regardless of how gemini flash pricing compares to GPT-5.5.

Flash 3.5 also has bounded capability limits that are material in specific workflows. On Humanity's Last Exam, Flash 3.5 scores below Gemini 3.1 Pro on dense, specialized academic knowledge. On ARC-AGI-2, which tests novel abstract reasoning, GPT-5.5 scores 84.6% against Flash 3.5's 72.1%. For teams whose AI workflows depend on long-context dense retrieval or abstract reasoning rather than high-frequency agent execution, the Flash-beats-Pro narrative does not hold in those categories.

Market sentiment reflects the limits. On Polymarket prediction markets, Anthropic holds a 96% probability of being rated the best AI model provider through the end of May 2026. Google sits at 1%. According to analysis tracking AI momentum scores across nine dimensions of competitive position, Google scores 3 out of 10 against OpenAI's 10 and Anthropic's 8. The primary driver of that gap is that coding-based use cases have become the dominant vector of AI adoption in 2026, and in that conversation, Google is not the default choice.

Wall Street has flagged the cost of the pricing strategy as well. Google's aggressive discounts across the Gemini product line, including a 20% reduction to the AI Ultra subscription, compress Google Cloud margins at a point when the company is spending aggressively on AI infrastructure. The strategy is coherent. It is not free.

Three Signals That Will Show Whether Google's Gamble Pays Off

Google's distribution play is in place. Whether it produces durable competitive advantage depends on three events over the next 90 days.

The first is Gemini 3.5 Pro, expected in June 2026. Google confirmed at I/O that a Pro-tier model in the 3.5 family is in development. Flash 3.5 wins on agentic and multimodal benchmarks. Pro is where dense reasoning and long-context recall currently favor Gemini 3.1 Pro and Claude Opus 4.7. If 3.5 Pro closes those gaps while maintaining Flash's pricing and speed advantage across the lineup, Google will simultaneously hold quality and distribution. That combination is structurally unavailable to any company without Google's infrastructure depth, and it would validate the Flash strategy at a different scale of impact.

The second is OpenAI and Anthropic's pricing response. Large model launches historically produce competitive moves within two to four weeks. If either company reduces per-token API pricing, it confirms that the Flash launch has created genuine margin pressure on the API monetization model. If neither responds, it signals that enterprise customers' preference for model quality currently overrides price sensitivity, and the strategy's effectiveness at the enterprise tier remains unproven.

The third is GitHub Copilot's Flash adoption rate. Google confirmed that Gemini 3.5 Flash is rolling out inside GitHub Copilot. The distance between "rolling out" and "set as default" is where the developer adoption story resolves. Consumer distribution at 900 million users creates scale. Adoption at the daily coding tool layer creates habit. If GitHub moves Flash to the primary inference engine in Copilot, it signals that Google has entered the workflow layer where Anthropic currently has its strongest ground.

For knowledge workers navigating an AI-first information environment, building systems that work independently of any single model's defaults is becoming a practical concern. Understanding how personal AI knowledge systems interact with an increasingly embedded AI search layer is one place to start. The default model in Google Search is no longer a neutral information layer. It is a model Google built, controls, and is deploying as infrastructure.

Google's bet is that by the time the question of which company has the best model is settled, the question of which company is the default infrastructure will already have been answered. If the two previous versions of this playbook are any guide, the second question is the one that determines the outcome.

The shift from AI as a product to AI as embedded infrastructure is not a future scenario. It is the state of Google Search, the Gemini app, and GitHub Copilot as of this week. For developers, the question is whether $1.50-per-million-token access to an agent-capable model changes the economics of what they are building. For enterprise teams, the claimed $1 billion in annual savings is a number worth running against actual workloads. And for anyone whose work depends on how information is discovered and processed online, the more fundamental question is what it means when the search layer and the AI layer are now the same product, controlled by the same company, and optimizing for the same objectives. That is the product Google is actually selling with Gemini Flash. The model itself is just how they get you inside it.