- The Summary AI
- Posts
- 🚀 Gemini 2.5 Pro Hits #1
🚀 Gemini 2.5 Pro Hits #1
PLUS: ChatGPT Gets Image Editing Superpowers

Welcome back!
Gemini 2.5 Pro just rocketed to the top of the LLM leaderboard, outscoring GPT-4.5 and Grok 3 by a whopping +40 Elo points. This is Google’s clearest frontier AI play yet. Let’s unpack…
Today’s Summary:
🚀 Google debuts Gemini 2.5 Pro
🤯 ChatGPT adds native image gen
đź’» DeepSeek releases frontier open model
🧠ARC-AGI-2 reasoning benchmark
🎥 Open-Sora 2.0 rivals top video AIs
🖼️ Reve Image sets new image SOTA
🗣️ ChatGPT voice gets fresh updates
🛠️ 2 new tools

TOP STORY
Gemini 2.5 pushes Google back into the frontier
The Summary: Google just launched Gemini 2.5 Pro Experimental (code-named “Nebula”), its most advanced model and now the top-ranked LLM in the world. It has top scores in reasoning, coding, and multimodal performance. It debuts at #1 on LMArena with a massive 40-point jump over GPT-4.5 and Grok 3. With a built-in “thinking” ability, 1M context window (2M soon), and elite performance in math, science, and software tasks, Gemini 2.5 shows Google is back in the AI arms race.
Key details:
Now live for Gemini Advanced users in the Gemini app (mobile + desktop), and available to developers via Google AI Studio, with API access coming soon
Hits #1 on LMArena with a historic +40 Elo points leap, beating GPT-4.5, Claude 3.7 Sonnet, and Grok-3 across instruction-following, creative writing, and longform queries
Scored 18.8% (no tool use) on Humanity’s Last Exam, a brutal, expert-designed test of knowledge and reasoning meant to stump elite models
Remains behind Claude 3.7 Sonnet on agentic coding
Early users report strong multimodal capabilities such as transcriptions with mixed-language audio and accurate visual bounding box detection
Why it matters: Gemini 2.5 is Google’s first real move back into the ring with frontier AI models. After years of sluggish AI rollouts, they’re shipping a frontier grade, reasoning-first model directly into user hands. For Google, this isn’t just about matching OpenAI anymore, it’s about showing they’re playing to win.

OPENAI
ChatGPT adds native image generation
The Summary: Ten months after first demonstrating it, OpenAI has rolled out native image generation in GPT‑4o, making it available in ChatGPT and Sora apps. The model can now create and refine images directly, without relying on the separate DALL-E model, closing the gap between text and visuals to greatly improve visual understanding and prompt accuracy. This update comes just 10 days after Google’s Gemini 2.0 Flash quietly became the first multimodal LLM to ship with native image generation.
Key details:
Users can prompt GPT‑4o to transform uploaded images, like turning a selfie into one with a grizzly bear beside you, change it to anime style, generate logos, diagrams, and photorealistic scenes
GPT-4o now handles native image generation with 10–20 distinct objects per scene, maintains visual consistency across iterations, and can render legible text, something DALL-E 3 struggled with
All generated images include C2PA metadata for provenance. OpenAI has also updated its policy to allow realistic generation of adult public figures, a shift from DALL-E’s blanket ban
GPT‑4o is becoming the default image generator across Free, Plus, Pro, and Team tiers, replacing DALL-E 3. It’s also accessible via Sora.
Why it matters: GPT-4o now lets users move from idea to image inside a single chat loop, without switching tools or models. It remembers visual context, responds to edits, and binds objects and text with precision. That turns image generation from a one-shot gimmick into a fluid medium for visual thinking, design, and communication.

DEEPSEEK
DeepSeek V3-0324 new frontier scale open-source model
The Summary: DeepSeek has quietly released DeepSeek V3-0324, a 641GB open-weights model that sets a new benchmark for non-reasoning AI systems. It can run at 20+ tokens/sec on a Mac Studio with maxxed RAM. The model leapfrogs Claude 3.7 Sonnet and Gemini 2.0 Pro in key benchmarks, it has an open-source MIT license and an MoE architecture that activates just 37B of its 685B parameters.
Key details:
V3-0324 appears to beat all proprietary non-reasoning models in the Artificial Analysis Intelligence Index, passing GPT-4.5 and Sonnet 3.7
Can run locally at 20+ tokens/sec on a 4-bit quantized version using a $9,499 Mac Studio (M3 Ultra, 512GB RAM)—no datacenter, no API
Uses 37B active parameters out of 685B total, thanks to MoE routing
Open weight, fully MIT-licensed release
Why it matters: DeepSeek’s new release cuts through the noise: frontier-scale performance, full transparency, and real local inference with no login required. It challenges not just the capabilities of closed models, but the assumptions around how and where AI will run.

QUICK NEWS
Quick news
ARC-AGI-2 hard benchmark launches to test AI reasoning
Open-Sora 2.0 open source video AI on-par with HunyuanVideo 11B
Reve Image new state of the art image model


That’s all for today!
If you liked the newsletter, share it with your friends and colleagues by sending them this link: https://thesummary.ai/