🚀 All OpenAI DevDay Releases

PLUS: Microsoft Copilot Upgrade

Welcome back!

OpenAI launched the Realtime API, bringing voice mode to developers. This feature allows the creation of applications that speak naturally and respond instantly. With low-latency interactions and six distinct voices, it’s a big move for the next generation of AI tools. Let’s explore…

Today’s Summary:

  • 🎙️ OpenAI launches real-time voice API

  • 👁️ GPT-4o adds Vision fine-tuning

  • 💻 Microsoft upgrades Copilot

  • 🔊 OpenAI extends voice mode to more users

  • 🚪 OpenAI co-founder Durk Kingma joins Anthropic

  • 🎥 Pika 1.5 video generator with special effects

  • 🛠️ 2 new tools

TOP STORY

OpenAI launches Realtime API for voice apps

The Summary: At DevDay, OpenAI introduced the Realtime API, enabling developers to integrate advanced voice mode into their apps. The API supports natural conversations with six preset voices, and offers low-latency, multimodal interactions. The API simplifies the development of advanced voice-based apps, for language learning, customer support, and more.

Key details:

  • Supports natural speech-to-speech conversations with 6 voices

  • Handles text + audio inputs and outputs in a single API call

  • Uses WebSocket for persistent, low-latency communication

  • Pricing: $0.06 per minute of audio input, $0.24 per minute of audio output

Why it matters: The Realtime API gives developers the tools to create advanced voice experiences in their apps. It unlocks new possibilities for interactive AI, which could soon lead to broader adoption of AI voice in everyday applications.

OPENAI

OpenAI adds vision to GPT-4o fine-tuning

The Summary: OpenAI introduced several developer-focused updates, including vision fine-tuning for GPT-4o, cost-saving features like Prompt Caching and Model Distillation, and broader access to the o1 API.

Key details:

  • Vision fine-tuning allows customization of GPT-4o vision capabilities, with as few as 100 image examples

  • Prompt Caching offers 50% discount on tokens repeated across prompts

  • Model Distillation suite helps create efficient mini models from o1 or 4o outputs

  • o1 API access now available to Tier 3 developers

  • New Playground features auto-generate prompts and schemas

Why it matters: These updates give developers better tools for efficient model training and deployment. This will make it easier to develop more advanced and cost-effective AI solutions.

MICROSOFT

Microsoft upgrades Copilot with Voice, Vision and Deep Thinking

The Summary: Microsoft has started rolling out updated Copilot with Voice and Vision, powered by OpenAI latest models. New Copilot features include voice interaction, image analysis, and more sophisticated reasoning powered by o1. Windows 11 is also getting AI-powered search, photo editing tools, and enhanced accessibility options.

Key details:

  • Copilot Voice enables natural language conversations with 4 voices

  • Copilot Vision analyzes web content and images to answer questions

  • New "Think Deeper" feature supports more complex reasoning

  • Windows 11 introduces AI-powered search to find photos and files using natural language

Why it matters: Microsoft’s aggressive AI integration positions Windows PCs to compete with offerings like Apple Intelligence and Google Gemini. The reintroduction of the Recall feature, now opt-in with enhanced security, shows Microsoft’s caution after previous privacy missteps.

QUICK NEWS

Quick news

TOOLS

🥇 New tools

  • Rows - AI spreadsheet, without scripts or add-ons

  • Graphite - AI code review companion

That’s all for today!

If you liked the newsletter, share it with your friends and colleagues by sending them this link: https://thesummary.ai/