- The Summary AI
- Posts
- 🚀 ChatGPT o3 Sets New Standard
🚀 ChatGPT o3 Sets New Standard
PLUS: AI Decodes Dolphin Language

Welcome back!
ChatGPT o3 is redefining what we expect from AI. It outperforms PhDs, handles very long context, and now even geoguesses the location of photos with great precision. Despite some increase in hallucinations, it appears to be the new state of the art model. Let’s unpack…
Today’s Summary:
🚀 ChatGPT o3 sets new standard
⚡ Gemini 2.5 Flash is pushing speed limits
🐬 Google trains AI on dolphin sounds
👁️ xAI launches Grok Vision
🧠 Reasoning AIs are not smarter, just more efficient
🔍 Cohere launches Embed 4 for enterprise search
🛠️ 2 new tools

TOP STORY
ChatGPT o3 sets new standard
The Summary: The recently released ChatGPT o3 model is setting new records in specialized tasks, from perfectly analyzing 96,000-word texts to solving wet-lab virology problems better than PhDs. A stunning new capability is its ability to guess the geographic location of photos, reasoning through small visual details.
Key details:
Scored 57% on ARC-AGI v1 reasoning benchmark
Outperformed 94% of PhD virologists on lab tasks in a study
Can guess geolocations of uploaded photos with high accuracy
Achieves near-perfect performance on long context benchmarks
Hallucinates more than its predecessors in factual answers
Why it matters: o3 seems to operate on a new level, outscoring PhD experts and unlocking surprising capabilities like photo-based geolocation. With tools and web access built in, it acts as a flexible, general purpose problem solver. But it can also make mistakes, so users will need to double check answers, especially on factual tasks.

Gemini 2.5 Flash is pushing speed limits
The Summary: Google has released Gemini 2.5 Flash, a fast, low-cost model offering advanced reasoning, math, and even image segmentation. It’s optimized for app development, with high speed and fine control while keeping costs very low.
Key details:
Gemini 2.5 Flash scored 12.1% on Humanity’s Last Exam, beating Claude 3.7 Sonnet (8.9%) and DeepSeek-R1 (8.6%)
Ranks nearly at the same level as GPT-4.5 in LMArena
“Thinking” phase is optional, devs can disable it for max speed
Image segmentation now costs less than 1/10th of a cent using Gemini 2.5 Flash, outputting JSON-encoded PNG masks in real time
Speed up to 1000 tokens/sec in early developer tests
Why it matters: Google continues to offer some of the best value in the market with Gemini 2.5 Flash, turning foundation models into app infrastructure, optimized for speed, control and deployment at scale. It’s a model you can meter, shape, and plug into real systems like a cloud tool.

FROM OUR PARTNERS
Automate the repetitive tasks
Use AI as Your Personal Assistant
Ready to save precious time and let AI do the heavy lifting?
Save time and simplify your unique workflow with HubSpot’s highly anticipated AI Playbook—your guide to smarter processes and effortless productivity.

Google builds AI model to understand dolphin sounds
The Summary: Google and Georgia Tech have teamed up to train DolphinGemma, an AI model built to analyze dolphin vocalizations. Using over 40 years of underwater sound data, the model can identify patterns in dolphin communication and generate realistic sound sequences. Researchers hope this may open a new path to meaningful interaction with wild dolphins.
Key details:
DolphinGemma is a 400M-parameter audio model running on Pixel phones, trained on Atlantic spotted dolphin sounds.
The Wild Dolphin Project holds the longest continuous dataset of wild dolphins, including behavior-linked sounds across generations
Dolphins produce specific signature whistles and burst-pulse “squawks”
The CHAT system, now running on Pixel 9, enables real-time mimic detection and could allow dolphins to request objects using AI sounds
Why it matters: DolphinGemma is an experiment in whether machines can start to understand animal minds. If the model can reliably find structure in dolphin vocalizations, it would suggest dolphins use communication systems complex enough to be modeled like human language, and could mark the beginning of interspecies dialogue.

QUICK NEWS
Quick news
xAI introduces Grok Vision with realtime search
Cohere introduces Embed 4 for enterprise multimodal search

TOOLS
🥇 New tools

That’s all for today!
If you liked the newsletter, share it with your friends and colleagues by sending them this link: https://thesummary.ai/