🔥 Google Launches Gemma 3

PLUS: Gemini Native Image Editing

Welcome back!

Google dropped a powerful lineup at its Developer Day, unveiling Gemma 3, a multimodal AI that runs on a single GPU, native image editing in Gemini that genuinely feels like magic, and new robotics models with spatial reasoning. Let’s break it down…

Today’s Summary:

  • 🚀 Google launches Gemma 3

  • ✏️ Gemini 2.0 Flash gets impressive image editing

  • 🦾 Google's AI models take control of robots

  • 🗣️ Sesame open-sources realistic voice AI

  • 📝 Sam Altman teases creative writing AI

  • 🤖 Manus steps up task automation

  • 🛠️ 2 new tools

TOP STORY

Google unveils open-source multimodal Gemma 3

The Summary: Google has released Gemma 3, a family of optimized multimodal AIs that run on a single GPU while offering high performance. The models come in four sizes (1B, 4B, 12B, and 27B) and can handle text, images, and short videos. Gemma 3 supports over 140 languages and includes tools for function calling and structured output.

Key details:

  • Handles inputs from multiple modalities : text, images, and videos

  • Ranked #9 on LM Arena leaderboard, above DeepSeek V3

  • 128K token context window (16× larger than previous)

  • Can be easily fine-tuned for specific needs

  • Available on HuggingFace under an open weight license

Why it matters: Google released an impressive open model to challenge DeepSeek-V3, pushing the frontier of local models. Early testers say it also matches Gemini 1.5 Flash in real world performance, while running on a single GPU with a large 128K context window. Developers can use it to build multimodal, multilingual AI applications for text, image, and video, without relying on remote cloud infrastructure.

GOOGLE

Google adds native image editing to Gemini

The Summary: Google has rolled out native image generation capabilities in its experimental Gemini 2.0 Flash model, available to developers through Google AI Studio and the Gemini API. This is the first time a major AI company has integrated image output directly into a single multimodal LLM rather than connecting two separate text and image models. The new system enables consistent style generation and conversational image editing.

Key details:

  • Users can modify uploaded or generated images with simple natural language instructions like "add chocolate drizzle" or "make the curtains light green", no image editing tools required

  • The model maintains character consistency across image sequences

  • Beats OpenAI to market, even though GPT-4o demonstrated similar native image capabilities nearly a year ago but never released them

  • Internal benchmarks show superior text rendering performance

Why it matters: This integration represents a shift in how AI generates images, as instead of translating text-to-image, the model is able to directly “think” visually, enabling impressive interactive editing of existing images. Google's move puts pressure on OpenAI, which demonstrated similar capabilities with GPT-4o last May but hasn't released them so far.

GOOGLE

Google DeepMind releases AI Robot Control Systems

The Summary: Google DeepMind has launched two new AI models that extend Gemini into robotics. Gemini Robotics acts as a “vision-language-action” system that directly controls robots with twice the performance of current leaders. Its companion, Gemini Robotics-ER, adds advanced spatial reasoning. Both models demonstrate advanced abilities to handle unfamiliar objects, adapt to changing environments, and perform complex manual tasks like origami folding.

Key details:

  • Gemini Robotics maintains continuous awareness of its surroundings, instantly adapting when objects move or slip from its grasp

  • Gemini Robotics-ER achieves 2-3x higher success rates than standard Gemini 2.0 models in robot control tasks

  • Can control various robot types including ALOHA 2, Franka arms, and Apptronik's humanoid Apollo

Why it matters: As these models advance and partner with hardware specialists like Boston Dynamics, we're witnessing the foundation for general-purpose robots that can truly assist in everyday environments.

QUICK NEWS

Quick news

TOOLS

🥇 New tools

That’s all for today!

If you liked the newsletter, share it with your friends and colleagues by sending them this link: https://thesummary.ai/