The Summary AI
Posts
🚀 ChatGPT o3 Sets New Standard

🚀 ChatGPT o3 Sets New Standard

PLUS: AI Decodes Dolphin Language

The Summary AI
April 24, 2025

In partnership with

Welcome back!

ChatGPT o3 is redefining what we expect from AI. It outperforms PhDs, handles very long context, and now even geoguesses the location of photos with great precision. Despite some increase in hallucinations, it appears to be the new state of the art model. Let’s unpack…

Today’s Summary:

🚀 ChatGPT o3 sets new standard
⚡ Gemini 2.5 Flash is pushing speed limits
🐬 Google trains AI on dolphin sounds
👁️ xAI launches Grok Vision
🧠 Reasoning AIs are not smarter, just more efficient
🔍 Cohere launches Embed 4 for enterprise search
🛠️ 2 new tools

TOP STORY

ChatGPT o3 sets new standard

The Summary: The recently released ChatGPT o3 model is setting new records in specialized tasks, from perfectly analyzing 96,000-word texts to solving wet-lab virology problems better than PhDs. A stunning new capability is its ability to guess the geographic location of photos, reasoning through small visual details.

Source: Ethan Mollick

Key details:

Scored 57% on ARC-AGI v1 reasoning benchmark
Outperformed 94% of PhD virologists on lab tasks in a study
Can guess geolocations of uploaded photos with high accuracy
Achieves near-perfect performance on long context benchmarks
Hallucinates more than its predecessors in factual answers

Why it matters: o3 seems to operate on a new level, outscoring PhD experts and unlocking surprising capabilities like photo-based geolocation. With tools and web access built in, it acts as a flexible, general purpose problem solver. But it can also make mistakes, so users will need to double check answers, especially on factual tasks.

o3 System Card

GOOGLE

Gemini 2.5 Flash is pushing speed limits

The Summary: Google has released Gemini 2.5 Flash, a fast, low-cost model offering advanced reasoning, math, and even image segmentation. It’s optimized for app development, with high speed and fine control while keeping costs very low.

Source: Google

Key details:

Gemini 2.5 Flash scored 12.1% on Humanity’s Last Exam, beating Claude 3.7 Sonnet (8.9%) and DeepSeek-R1 (8.6%)
Ranks nearly at the same level as GPT-4.5 in LMArena
“Thinking” phase is optional, devs can disable it for max speed
Image segmentation now costs less than 1/10th of a cent using Gemini 2.5 Flash, outputting JSON-encoded PNG masks in real time
Speed up to 1000 tokens/sec in early developer tests

Why it matters: Google continues to offer some of the best value in the market with Gemini 2.5 Flash, turning foundation models into app infrastructure, optimized for speed, control and deployment at scale. It’s a model you can meter, shape, and plug into real systems like a cloud tool.

Try it for free in Google AI Studio

FROM OUR PARTNERS

Automate the repetitive tasks

Use AI as Your Personal Assistant

Ready to save precious time and let AI do the heavy lifting?

Save time and simplify your unique workflow with HubSpot’s highly anticipated AI Playbook—your guide to smarter processes and effortless productivity.

Download the free guide today.

GOOGLE

Google builds AI model to understand dolphin sounds

The Summary: Google and Georgia Tech have teamed up to train DolphinGemma, an AI model built to analyze dolphin vocalizations. Using over 40 years of underwater sound data, the model can identify patterns in dolphin communication and generate realistic sound sequences. Researchers hope this may open a new path to meaningful interaction with wild dolphins.

Key details:

DolphinGemma is a 400M-parameter audio model running on Pixel phones, trained on Atlantic spotted dolphin sounds.
The Wild Dolphin Project holds the longest continuous dataset of wild dolphins, including behavior-linked sounds across generations
Dolphins produce specific signature whistles and burst-pulse “squawks”
The CHAT system, now running on Pixel 9, enables real-time mimic detection and could allow dolphins to request objects using AI sounds

Why it matters: DolphinGemma is an experiment in whether machines can start to understand animal minds. If the model can reliably find structure in dolphin vocalizations, it would suggest dolphins use communication systems complex enough to be modeled like human language, and could mark the beginning of interspecies dialogue.

QUICK NEWS

Quick news

xAI introduces Grok Vision with realtime search
Reasoning models aren’t smarter, just more efficient
Cohere introduces Embed 4 for enterprise multimodal search

TOOLS

🥇 New tools

Checklist - AI driven checklist management tool
Dia - Realistic open-source text to speech model (demo)

That’s all for today!

If you liked the newsletter, share it with your friends and colleagues by sending them this link: https://thesummary.ai/