🚀 Google Launches Gemini 3

PLUS: Grok 4.1 Briefly Took the Crown

Welcome back!

Google just fired the next shot in the model wars with Gemini 3, built to reason, plan, and create with more precision than ever. In the same 24 hours, xAI’s Grok 4.1 briefly topped global charts, only to be overtaken again by Google. Let’s unpack…

Today’s Summary:

  • 🚀 Google launches Gemini 3

  • ❤️ xAI’s Grok 4.1 raises emotional IQ bar

  • đź’» Google Antigravity enters agent IDE race

  • 🎥 Veo 3.1 adds Multi-Image Control

  • 👥 ChatGPT tests Group Chats in Asia

  • 🏗️ Bezos starts $6.2B AI Project Prometheus

  • 🛠️ 2 new tools

TOP STORY

Google launches Gemini 3

The Summary: Google has released Gemini 3, topping the LMArena leaderboard with a 1501 Elo score, far ahead of GPT-5.1 and Claude 4.5. Built for deep reasoning and multimodal understanding, Gemini 3 interprets long inputs and intent with high precision. Early testers report clear improvements in depth and coding reliability.

Key details:

  • Tops all major AI leaderboards: 45.1% on ARC-AGI-2 (vs 17.6% of GPT‑5.1) and 41% on Humanity’s Last Exam (vs 26.5% of GPT-5.1).

  • On WebDev Arena, Gemini 3 jumps to 1487 Elo, delivering a gain of +92 points over GPT-5.1, translating into fewer dead-ends

  • Supports 1M tokens context and multimodal input

  • Runs faster than GPT-5.1 while using fewer tokens

  • Rolls out today in the Gemini app, Google AI Mode, Google AI Studio, and developer tools including Vertex AI

Why it matters: Gemini 3 shows that the next advances in AI will come from systems that can reason deeply about structure and intent. A model that can plan a year of vending-machine strategy, write a sci-fi 3D world, analyze a pickleball video and operate a browser through an autonomous agent is no longer just “completing text”. This looks closer to Software 2.0 that understands workflows and acts within them with purpose. The result for users and developers is fewer retries, fewer hallucinations, and fewer moments where the model needs babysitting.

XAI

Grok 4.1 raises the bar on emotional AI

The Summary: xAI has released Grok 4.1, now live on Grok.com, X, and mobile apps. The model brings a significant performance jump and briefly reached #1 in the global AI rankings before Gemini 3 reclaimed the lead hours later. It also delivers major gains in emotional awareness, writing skill, and factual accuracy. However, xAI’s own report notes that these gains come with behavior marked by excessive flattery and reduced honesty.

Key details:

  • Grok 4.1 Thinking reached 1483 Elo points on LMArena and held the top spot for a few hours before Gemini 3 dethroned it again with 1501 points

  • Hallucination rate dropped from 12.09% to 4.22%

  • Emotional intelligence benchmark (EQ-Bench3) ranked Grok 4.1 first

  • Flattery jumped nearly x3, suggesting a friendlier yet less self-critical model

Why it matters: xAI’s short-lived lead shows how fast the top of the AI field is moving, and how thin the margin is between dominance and obsolescence. Grok 4.1’s design choices reveal a hard tradeoff: boosting empathy and personality risks eroding intellectual rigor.

GOOGLE

Google Antigravity enters the IDE race

The Summary: Google released Antigravity, a new development environment built around Gemini 3 and designed for agent-first coding. It lets developers run multiple agents across the editor, terminal, and browser, then watch their actions in real time through artifacts.

Key details:

  • Supports Gemini 3 Pro, Claude Sonnet 4.5, and GPT-OSS 120B; model switching happens inside the IDE

  • Includes a standalone Agent Manager that runs multiple workers in parallel and shows each task as an artifact

  • Built-in Chrome control lets agents open pages, test apps, and record screenshots for verification

  • Based on Google’s Windsurf codebase acquisition

  • Available on macOS, Windows, and Linux as a public preview with one free plan and five-hour rate-limit resets

Why it matters: Antigravity lands in a crowded field where every vendor now ships an AI IDE. Google’s version pushes a multi-agent workflow verified through artifacts. The browser integration adds a feedback loop where models can verify their own output. Early testers find it a bit unstable, with a free tier burning out too fast, yet still full of ideas.

QUICK NEWS

Quick news

TOOLS

🥇 New tools

  • Emma - AI food scanner (iOS)

  • Willow - AI voice dictation (iOS)

That’s all for today!

If you liked the newsletter, share it with your friends and colleagues by sending them this link: https://thesummary.ai/