🚀 Gemini 3.1 Pro Reclaims #1

PLUS : Claude Sonnet 4.6

In partnership with

Welcome back!

Google just fired another shot in the model wars. Gemini 3.1 Pro more than doubled its reasoning score, reclaimed benchmark leadership, and runs at less than half the cost of key rivals. Let’s unpack…

Today’s Summary:

  • 🚀 Gemini 3.1 Pro doubles reasoning

  • 🎵 Lyria 3 turns photos into songs

  • đź’» Claude Sonnet 4.6 boosts automation

  • 🦞 OpenAI acquires OpenClaw for agents

  • 🆓 Alibaba’s Qwen3.5 challenges leaders

  • 🤖 OpenAI launches Frontier agent platform

  • 🛠️ 2 new tools

TOP STORY

Gemini 3.1 Pro doubles its reasoning score

The Summary: Google released Gemini 3.1 Pro, scoring 77.1% on ARC-AGI-2, more than doubling its predecessor’s score and outperforming competitors across reasoning and agent benchmarks. A new three-tier thinking system (low, medium, high) lets the same model scale from instant answers to Deep Think reasoning. It's rolling out now in preview across the Gemini app and API.

Key details:

  • Arena.ai human-preference leaderboard puts 3.1 Pro tied #1 in Text (scoring 1500); Artificial Analysis also ranks it #1 while finding it 50% cheaper than Claude and GPT-5.2

  • On BrowseComp (agentic web search), 3.1 Pro hit 85.9%, up from 59.2%, on a test that matters most for autonomous AI agents

  • Runs at less than half the cost of Claude Opus

  • Knowledge cutoff is still January 2025, unchanged from Gemini 3

  • The new "medium thinking” tier is essentially what "high" used to be earlier. The new "high" mode is like a lightweight Deep Think

Why it matters: The cost gap between Gemini and Anthropic's flagship is now so wide that any team paying Claude Opus prices may need to benchmark Gemini 3.1 Pro. The Gemini 3.0 launch in November triggered a wave of competitor releases, and 3.1 Pro now takes the lead back. This cycle is about to repeat, and the pace between releases is now measured in weeks.

FROM OUR PARTNERS

Voice Prompts That Work

Dictate prompts and tag files automatically

Stop typing reproductions and start vibing code. Wispr Flow captures your spoken debugging flow and turns it into structured bug reports, acceptance tests, and PR descriptions. Say a file name or variable out loud and Flow preserves it exactly, tags the correct file, and keeps inline code readable. Use voice to create Cursor and Warp prompts, call out a variable like user_id, and get copy you can paste straight into an issue or PR. The result is faster triage and fewer context gaps between engineers and QA. Learn how developers use voice-first workflows in our Vibe Coding article at wisprflow.ai. Try Wispr Flow for engineers.

GOOGLE

Lyria 3 brings custom AI music generation to Gemini

The Summary: Google DeepMind has launched Lyria 3 generative music model in the Gemini app, letting users describe a mood or memory and get back a 30-second track complete with vocals, lyrics, and cover art. The model works from text prompts or uploaded photos and videos. Every track carries a SynthID watermark, and Gemini can detect it in uploaded audio files to verify AI origin.

Key details:

  • Available to Gemini users in 8 languages on desktop. Mobile is rolling out over the next few days. Higher limits for paid tiers

  • Outputs 48kHz stereo tracks with auto-generated lyrics, custom cover art, and a built-in share link

  • When you name a specific artist, Gemini treats it as loose stylistic inspiration rather than direct mimicry

  • OpenAI is reportedly building a competing music generator. Udio, acquired by Universal Music late last year, has slowed development, leaving Suno as Lyria 3's only serious rival at launch

Why it matters: The photo-to-song input is the most interesting feature here. It means your camera roll may become a songwriting tool, which changes how people might naturally document and share moments. Embedding this at zero cost inside the Gemini app, already installed on hundreds of millions of devices, may make fully produced AI music the default expectation.

FROM OUR PARTNERS

A Smarter Way to Read the News

Tired of news that feels like noise?

Every day, 4.5 million readers turn to 1440 for their factual news fix. We sift through 100+ sources to bring you a complete summary of politics, global events, business, and culture — all in a brief 5-minute email. No spin. No slant. Just clarity.

ANTHROPIC

Anthropic releases Claude Sonnet 4.6

The Summary: Anthropic released Claude Sonnet 4.6, a mid-tier model that matches or beats its previous flagship Opus 4.5 on coding, computer use, and financial analysis. Developers in early tests preferred it over the pricier Opus 4.5. The model's “computer use” score on OSWorld jumped from 14.9% in October 2024 to 72.5% today, a 5x increase in 16 months.

Key details:

  • The free tier now also defaults to Sonnet 4.6

  • A new 1M token context window ships in beta

  • A new Dynamic Filtering technique for web search cuts irrelevant content before it hits the context window, reducing costs and increasing accuracy

  • The developer community's read: the cost floor keeps dropping while capability rises, doubling the intelligence per dollar every 6-9 months

  • Those running multi-agent workflows note that “babysitting requirements” have dropped enough to allow running parallel agents with confidence

Why it matters: Computer use reaching 72.5% on OSWorld means the "legacy software automation" problem affecting hospitals, insurers, and government agencies still running pre-API systems, just became significantly more solvable. The competitive pressure this puts on OpenAI is immediate, as GPT-5.2 only scores 38.2% on the same computer use benchmark.

QUICK NEWS

Quick news

TOOLS

🥇 New tools

That’s all for today!

If you liked the newsletter, share it with your friends and colleagues by sending them this link: https://thesummary.ai/