🚀 ChatGPT Images 2.0 Sets Record

PLUS: Gemini Robotics Goes Industrial

Welcome back!

ChatGPT Images 2.0 claims first place across all Image Arena leaderboards with a 242-point lead over Google Nano Banana 2, the widest margin ever recorded. The breakthrough is rendering dense multilingual text inside complex infographics and charts. Let’s unpack…

Today’s Summary:

  • 🚀 ChatGPT Images 2.0 dominates benchmarks

  • ⚙️ Google launches deep research agents

  • đź§  Gemini Robotics-ER powers industrial vision

  • ⚡ SpaceX eyes Cursor acquisition deal

  • 🛡️ OpenAI unveils GPT-5.4-Cyber defense model

  • đź’» Kimi K2.6 challenges coding giants

  • 🛠️ 2 new tools

TOP STORY

ChatGPT Images 2.0 crushes benchmarks

The Summary: OpenAI launched ChatGPT Images 2.0, an image generation model that claims first place in Arena leaderboards with a massive 242-point lead over Google Nano Banana 2, the widest margin ever recorded. The breakthrough is its ability to handle very complex infographics and flawless typography without the gibberish that plagued previous image generators.

Key details:

  • Dominates all Image Arena categories with unprecedented margins

  • Reliably generates Korean, Japanese, Chinese, Hindi, and Bengali text within complex layouts like educational diagrams

  • Produces up to 8 consistent images from a single prompt

  • Supports aspect ratios from 3:1 ultrawide to 1:3 vertical and up to 2K resolution (4K in API beta)

  • Available now to all ChatGPT, Codex, and API users

Why it matters: Google spent months positioning Nano Banana as the thinking model for images; OpenAI has now matched the feature and crushed the benchmark. The lead is so wide it suggests OpenAI solved something architectural, placing the model in a league of its own. The typography fix removes the last friction in professional workflows. The model is also extremely good at interpreting intent and making complex charts and diagrams.

FROM OUR PARTNERS

The Builders Conference

Built for builders. Not buzzwords. San José 2026

500+ speakers. 18 content tracks. Workshops, masterclasses, and the people actually shipping the tools you use every day. WeAreDevelopers World Congress — September 23–25. Use code GITPUSH26 for 10% off.

GOOGLE

Google ships research agents that can use public and private data

The Summary: Google launched Deep Research and Deep Research Max for developers, two autonomous agents built on Gemini 3.1 Pro that can use both web search and private data. The standard version optimizes for speed, while Max uses extended compute time for deep background analysis. Both agents can generate charts and connect to external data sources via Model Context Protocol, targeting finance, life sciences, and market research workflows.

Key details:

  • Scores 93.3% on DeepSearchQA (up from 66.1% in December)

  • Model Context Protocol supports querying private databases, internal repositories, and third-party data

  • Generates HTML charts and infographics inline within reports

  • Powers research features in Gemini App and NotebookLM

Why it matters: Google Deep Research only searched the public web. Now developers can connect it to private company data like internal financial records, customer databases or proprietary market research, and build research tools that pull from both sources at once. Google is betting that deep research agents will become a new category of enterprise software.

FROM OUR PARTNERS

AI Agents for Market Research

Accio Work: the AI Agent team that runs your business

Meet Accio Work—the agentic workspace for business owners and solopreneurs. Our smart agents handle sourcing, supplier negotiation, store management, and marketing on autopilot. Powered by Alibaba.com data, we turn ideas into action instantly. No setup, no hassle—just seamless execution while you stay in control and focus on growing your business.

GOOGLE

Google teaches robots to see

The Summary: Google DeepMind released Gemini Robotics-ER 1.6, a model that gives robots industrial-grade perception and planning abilities. The system reads pressure gauges, counts objects with precision, and determines when tasks are complete. The model works by zooming into images, executing code to measure proportions, and applying world knowledge to interpret readings.

Key details:

  • Achieves 93% accuracy reading industrial instruments

  • Handles multi-camera reasoning to determine task completion across different viewpoints simultaneously

  • Functions as a high-level planner that calls tools like Google Search, vision-language-action models, or custom functions

  • First users report it "thinks" much faster than Gemini 3.1 Pro

  • Available through Gemini API and Google AI Studio

Why it matters: Industrial facilities generate massive amounts of visual data from gauges, meters, and indicators that go unexamined. Gemini Robotics-ER 1.6 solves questions like "is this gauge reading normal" without needing explicit rules for each gauge type, and helps turn that dormant information into actionable maintenance alerts.

TOOLS

🥇 New tools

That’s all for today!

If you liked the newsletter, share it with your friends and colleagues by sending them this link: https://thesummary.ai/