The Summary AI
Posts
🔥 Google Launches Gemma 3

🔥 Google Launches Gemma 3

PLUS: Gemini Native Image Editing

The Summary AI
March 14, 2025

Welcome back!

Google dropped a powerful lineup at its Developer Day, unveiling Gemma 3, a multimodal AI that runs on a single GPU, native image editing in Gemini that genuinely feels like magic, and new robotics models with spatial reasoning. Let’s break it down…

Today’s Summary:

🚀 Google launches Gemma 3
✏️ Gemini 2.0 Flash gets impressive image editing
🦾 Google's AI models take control of robots
🗣️ Sesame open-sources realistic voice AI
📝 Sam Altman teases creative writing AI
🤖 Manus steps up task automation
🛠️ 2 new tools

TOP STORY

Google unveils open-source multimodal Gemma 3

The Summary: Google has released Gemma 3, a family of optimized multimodal AIs that run on a single GPU while offering high performance. The models come in four sizes (1B, 4B, 12B, and 27B) and can handle text, images, and short videos. Gemma 3 supports over 140 languages and includes tools for function calling and structured output.

Source: Google

Key details:

Handles inputs from multiple modalities : text, images, and videos
Ranked #9 on LM Arena leaderboard, above DeepSeek V3
128K token context window (16× larger than previous)
Can be easily fine-tuned for specific needs
Available on HuggingFace under an open weight license

Source: Google

Why it matters: Google released an impressive open model to challenge DeepSeek-V3, pushing the frontier of local models. Early testers say it also matches Gemini 1.5 Flash in real world performance, while running on a single GPU with a large 128K context window. Developers can use it to build multimodal, multilingual AI applications for text, image, and video, without relying on remote cloud infrastructure.

Technical report

GOOGLE

Google adds native image editing to Gemini

The Summary: Google has rolled out native image generation capabilities in its experimental Gemini 2.0 Flash model, available to developers through Google AI Studio and the Gemini API. This is the first time a major AI company has integrated image output directly into a single multimodal LLM rather than connecting two separate text and image models. The new system enables consistent style generation and conversational image editing.

Source: Google DeepMind

Key details:

Users can modify uploaded or generated images with simple natural language instructions like "add chocolate drizzle" or "make the curtains light green", no image editing tools required
The model maintains character consistency across image sequences
Beats OpenAI to market, even though GPT-4o demonstrated similar native image capabilities nearly a year ago but never released them
Internal benchmarks show superior text rendering performance

Why it matters: This integration represents a shift in how AI generates images, as instead of translating text-to-image, the model is able to directly “think” visually, enabling impressive interactive editing of existing images. Google's move puts pressure on OpenAI, which demonstrated similar capabilities with GPT-4o last May but hasn't released them so far.

Try it in AI Studio

GOOGLE

Google DeepMind releases AI Robot Control Systems

The Summary: Google DeepMind has launched two new AI models that extend Gemini into robotics. Gemini Robotics acts as a “vision-language-action” system that directly controls robots with twice the performance of current leaders. Its companion, Gemini Robotics-ER, adds advanced spatial reasoning. Both models demonstrate advanced abilities to handle unfamiliar objects, adapt to changing environments, and perform complex manual tasks like origami folding.

Key details:

Gemini Robotics maintains continuous awareness of its surroundings, instantly adapting when objects move or slip from its grasp
Gemini Robotics-ER achieves 2-3x higher success rates than standard Gemini 2.0 models in robot control tasks
Can control various robot types including ALOHA 2, Franka arms, and Apptronik's humanoid Apollo

Why it matters: As these models advance and partner with hardware specialists like Boston Dynamics, we're witnessing the foundation for general-purpose robots that can truly assist in everyday environments.

Playlist of 20 video demonstrations

QUICK NEWS

Quick news

Sam Altman teased a new OpenAI model specialized in creative writing
Sesame open-sourced its CSM-1B realistic voice
Manus generated buzz for its advanced task automation, which was later confirmed to be powered by Claude

TOOLS

🥇 New tools

Wispr Flow - 3x faster dictation on PC & Mac
Bolt x Figma - Turn Figma designs into apps

That’s all for today!

If you liked the newsletter, share it with your friends and colleagues by sending them this link: https://thesummary.ai/