Google I/O '25 - Research to reality

How Google is finally taking the lead on AI

May 23, 2025

∙ Paid

This post is part of my 2¢ series - my raw thoughts about recent topics in AI. Not always practical thoughts, but always thought-provoking. Some of my previous ones covered the new wave of conversational AI, economies of scale for foundation AI models, and the consolidation in the AI space.

This post captures my takeaways from attending Google’s flagship event, I/O 2025. It’s not a comprehensive announcement round-up. Instead, I’ve focused on the launches that matter most to anyone building or working with AI. I also share my perspective on what these moves mean for the broader AI ecosystem and founders, developers, and researchers alike.

A NotebookLM-powered podcast episode discussing this post:

0:00

-16:47

Since 2017, when Google unveiled the groundbreaking Duplex demo at its biggest event of the year, Google I/O, I've been captivated by the company's AI advancements. For me, it was the first truly practical, consumer-facing use of AI—a clear example of how AI could take over routine tasks like booking appointments. But more importantly, it marked a key step toward a future where AI helps people express themselves in ways that were previously out of reach.

In recent years, the AI community has often viewed Google as trailing behind leaders like OpenAI and Anthropic. However, this year's Google I/O conference felt different—everything finally clicked. Google moved from research to reality, capitalizing on its massive distribution channels and deep technological prowess. The perfect combination of state-of-the-art technology with access to real-world usage through Search, Google Workspace (Gmail, Sheets, Docs, etc.), and Android (smart TVs, glasses, phones).

And it wasn’t only me. The same sentiment echoed across the press tent at I/O last Tuesday, capturing an energy reminiscent of OpenAI’s inaugural DevDay.

The winning combination, as defined by Google in this week’s I/O, manifests across three principles:

Powerful - deploying best-in-class models to support real-time, reliable experiences
Personalize - tailoring AI to understand and cater to individual user preferences and needs
Proactive - developing AI that anticipates user needs and acts accordingly without being too intrusive or eager

Out of these three, the one I found the most promising is Personalize.

Google's unparalleled access to user data gives it a powerful edge over competitors like OpenAI and Apple. It understands my interests through the searches I make (Search), the places I go (Maps), the music I listen to (YouTube), my payment habits (Google Pay), and even my work life (Gmail, Calendar, Docs). This breadth of insight uniquely positions Google to deliver truly personalized AI experiences.

Google didn’t just launch new products at I/O, it made deliberate moves into markets long held by OpenAI, Meta, Perplexity, Anthropic, and even Shopify and Stripe. Each announcement, from Jules to Gemini Live, stepped directly into competitive territory. If you’re working on dev tools, agent platforms, creative apps, e-commerce flows, or voice interfaces, these updates are worth reading. I’ve included a breakdown of the most directly affected companies and industries at the end of this post—worth reviewing if you want to stay ahead of what’s coming.

The real story isn’t about how many features Google shipped, though. It’s about the strategy taking shape. Google is doubling down on vertical integration and deeply contextual AI. That’s the new game. In Ben Thompson (Stratechery) terms, it’s Aggregation Theory with agency. Google owns the user interface, the distribution (Android, Chrome, Search), and now, increasingly, the intelligence layer.

In this post, I'll outline a selected subset of announcements I found most promising and share my 2¢ on why this event marks a turning point in AI's evolution.

Google AI Studio, Jules, and Stitch

Perhaps one of the most significant announcements at Google I/O was unveiling the upgraded Google AI Studio, with a whole new Build section—an integrated development environment explicitly designed for building AI-driven applications.

Positioned directly against IDEs like Cursor, Windsurf, Lovable, and Bolt, Google AI Studio unifies Google's flagship multimodal Gemini models into one streamlined interface. Developers now have the ability to build and deploy their creations using natural language and with a single click to Google Cloud, reinforcing Google's strategic advantage through infrastructure integration.

Jules, a particularly intriguing release, is Google's take on the autonomous coding agent, similar to the likes of Devin and Factory. Quietly entering public beta at jules.google, Jules represents Google's ambitions to dominate the software development lifecycle: from writing documentation and deploying applications to autonomously submitting pull requests. Though overshadowed by flashier announcements, Jules may well emerge as a sleeper hit among developers seeking highly efficient, AI-augmented development workflows.

Stitch, another groundbreaking tool revealed at I/O, could radically simplify UI design processes. Through natural language prompts, designers can describe interfaces, which Stitch then generates and exports directly into Figma.

Together, Google AI Studio, Jules, and Stitch exemplify Google's strategy of leveraging its state-of-the-art models and infrastructure to deliver highly integrated, practical, and transformative tools for developers and designers alike.

temp.mov [optimize output image] — Julez, Google’s new coding agent, in action

Powerful models

Gemini 2.5 took center stage at I/O, outperforming nearly every major AI benchmark: from coding and web development to complex reasoning and video understanding. Compared to leading commercial models, it stands out with a January 2025 knowledge cutoff, a 1 million-token context window, and operates at around a quarter of the cost of OpenAI’s GPT-4o.

Gemini 2.5 leads the leaderboard for web coding tasks

Key improvements include:

Deep Think - an advanced reasoning capability, achieving state-of-the-art results in complex mathematical and programming tasks in exchange to increased cost and latency.
Enhanced function calling and Structured Outputs - until now, the real-time Gemini models haven’t been usable for anyone needing function calling or structured output. Now, it’s finally fixed.
Gemini Diffusion - Google unveiled Gemini Diffusion, generating text 5x faster than the leading Flash Lite model. This advancement is powered by recent research utilizing diffusion models for text generation, marking a significant leap forward in efficiency and responsiveness.

Search & AI Mode

Google has been experimenting with a new way of search over the last few weeks, dubbed “AI Mode”. This new mode just got generally available in the US last Tuesday. Powered by the Gemini 2.5 Pro model, AI Mode allows users to engage in multi-turn dialogues, enabling more complex and nuanced information retrieval.

Highlights from the new search experience:

Personal Context - for an even more customized experience, AI Mode will offer personalized suggestions based on your connected Google apps, starting with Gmail, to bring in more of your personal context. For example, if you’re searching for “things to do in Nashville this weekend with friends, we're big foodies who like music” ahead of an upcoming trip, AI Mode can show you restaurants with outdoor seating based on your past restaurant bookings and searches.
Agentic Checkout - streamlining the purchasing process by allowing users to complete transactions directly within Search, bypassing the need to navigate to third-party websites. For example, when searching for concert tickets, AI Mode will find the best options and facilitate the purchase through Google Pay, all within the same interface. This seamless integration has the potential to disrupt traditional e-commerce models and reshape how users interact with online marketplaces. I wrote a whole series on the new agentic internet!
Try It On - enhancing the virtual shopping experience, Google's "Try It On" feature utilizes Google’s strong image generation diffusion models to allow users to visualize clothing items on themselves. Users can upload their picture using Google Photos and see how different garments would look on their own bodies. Google’s generative AI capabilities meet distribution (Google Photos).
Deep Search - by synthesizing information from multiple sources, AI Mode can provide comprehensive answers to multifaceted questions, making Google Search relevant again in the face of competing tools such as OpenAI’s and Perplexity’s Deep Research.

The revenue gamble

While Google's AI Mode represents a significant leap forward in search capabilities, it also reveals a fundamental tension at the heart of the company's strategy. Google is essentially betting against its own golden goose: the advertising-driven search model that has generated over 50% of its revenues for over two decades.

The math is straightforward but concerning: if AI Mode provides comprehensive answers directly within search results, users will click through to fewer websites. Independent studies already suggest this trend with AI Overviews, and AI Mode's conversational interface offers even fewer opportunities for traditional paid link placements. Google's executives at I/O spoke confidently about the technical capabilities of their new search experience, but when it came to discussing how this translates into sustainable revenue streams, the answers were notably vague.

This isn't just a minor product pivot, it's a fundamental reimagining of how Google makes money. The company appears to be racing toward a future where AI assistants and conversational interfaces replace link-based search, and while there are certainly ways to imagine business models around personalized AI assistants and agentic workflows, Google hasn't articulated what those might look like or how they'll replace the massive cash flows from traditional search advertising.

Project Mariner

Project Mariner is Google's step toward giving AI true agency across your devices. It’s their answer to OpenAI's Operator and Anthropic's Computer Use. An infrastructure-level system for teaching AI to interact with your digital environment just like a human would.

At its core, Mariner is about "teach and repeat". Show Gemini how to perform a task: filling out a form, generating a weekly status report, uploading data to a dashboard, and it can replicate that workflow again and again.

Mariner will be released as part of the Gemini API later this summer, which means developers can build agents that don’t just reason and plan, but act: navigating apps, automating browser actions, and manipulating on-screen interfaces.

Whether it’s booking a flight, copying events into a spreadsheet, or handling repetitive workflows across company tools, Mariner helps AI move beyond suggestions and into action.

Gemini app and Gemini Live

With the new Gemini app and its Live feature, Google is officially entering the race for the "everything AI assistant”, a direct challenger to ChatGPT, Meta AI, and Apple Intelligence.

The Gemini app is no longer just a chatbot. It’s a real-time, context-aware assistant that lives across your devices and ties directly into Google’s ecosystem: Gmail, Calendar, Keep, Docs, Maps, and even YouTube. Thanks to its tight OS-level integration (powered by Project Mariner), Gemini can also take actions on your phone.

But what really sets Gemini apart isn’t just input, it’s output:

Search Live and Project Astra - building on the capabilities of AI Mode, Google introduced Search Live, a feature that combines real-time camera input with search functionality. Users can point their device's camera at an object or scene and receive immediate information (similar to OpenAI’s Advanced Voice Mode), effectively turning their environment into an interactive search field. This feature is powered by Project Astra, Google's multimodal AI assistant that integrates visual and auditory data to provide contextually relevant responses.
Canvas is Google’s answer to tools like OpenAI’s Canvas and Anthropic’s Artifacts. Ask Gemini to summarize an article and it will build an interactive webpage, infographic, quiz, or even a lightweight app.
Deep Research now supports uploaded personal files, synthesizing them into study guides, plans, or insights, connecting directly to your Drive and Gmail, offering context-rich reasoning grounded in your data.
Agent Mode enables task automation across Gmail, Calendar, and partner services like Zillow. Unlike a basic plugin system, this builds on Mariner’s deeper Android-level control and Google's new MCP support, enabling multi-step reasoning and actions.
Quiz and Video Generation taps into Veo (text2video) and Lyria (music generation model), turning documents into test prep material and short videos.

Generative models for creatives

Google’s generative media stack is finally starting to feel competitive.

Veo 3 is their new text-to-video model - high-quality, photorealistic footage, now with native audio generation. Think Pika or Runway, but with better motion, longer clips, and built-in sound.
Image 4 improved with sharper details, better text rendering, and is now integrated into Gemini.
Lyria 2 is Google’s music generation model. Based on the demo, Lyria is still in its infancy and far from the quality of Suno and Udio.
Flow is a new AI-powered video editor. Type a prompt, get an 8-second clip. Stitch clips together, tweak scenes with natural language. It’s Google’s answer to creative environments like Adobe Premiere, but for AI-native workflows.

Taken together, this is Google’s most serious push yet into generative video, music, and imagery, accessible via Google AI Studio and the Gemini API.

Google AI Glasses

Twelve years after the original Google Glass flop, Google’s trying again, and this time, it looks promising.

Google unveiled a new pair of smart glasses powered by Android XR and deeply integrated with the Gemini model family. They come equipped with microphones, speakers, a camera, and an in-lens display, offering a level of interactivity that goes beyond Meta’s Ray-Ban, which don’t have a display. Google is going a step further: your real world now comes with real-time captions, directions, translations, and a personal assistant whispering relevant information.

And that’s the key difference: Google has the phone and app distribution. Meta and OpenAI with its ChatGPT consumer app do not. That means Google can natively integrate with Gmail, Calendar, Maps, Docs, Translate, and YouTube—capabilities that come pre-installed on Android and are used by billions. Need to translate a live conversation? Snap a photo and auto-organize it? Navigate to a meeting while rescheduling the next one? All of that is now on your face.

To get there, Google partners with Gentle Monster and Warby Parker for manufacturing, echoing the Meta + Ray-Ban strategy.

If you’re thinking this sounds like something Ben Thompson would write a thousand-word piece about, you're not wrong. This is exactly the kind of vertical integration that makes Apple and others sweat: powerful native models, fused with real-time inputs (voice, vision), and paired with a ubiquitous OS.

The world was not ready for wearable AI in 2013. But in 2025, with AI-native operating systems and mainstream model adoption, and after Meta has proven market traction, Google may have found the perfect moment for a comeback.

Industry impact

So what does all this mean if you're not Google? Below is a breakdown of the major announcements from I/O and the companies most likely to feel the heat.

Keep reading with a 7-day free trial

Subscribe to AI Tidbits to keep reading this post and get 7 days of free access to the full post archives.