- The Summary AI
- Posts
- 🚀 All OpenAI DevDay Releases
🚀 All OpenAI DevDay Releases
PLUS: Microsoft Copilot Upgrade
Welcome back!
OpenAI launched the Realtime API, bringing voice mode to developers. This feature allows the creation of applications that speak naturally and respond instantly. With low-latency interactions and six distinct voices, it’s a big move for the next generation of AI tools. Let’s explore…
Today’s Summary:
🎙️ OpenAI launches real-time voice API
👁️ GPT-4o adds Vision fine-tuning
💻 Microsoft upgrades Copilot
🔊 OpenAI extends voice mode to more users
🚪 OpenAI co-founder Durk Kingma joins Anthropic
🎥 Pika 1.5 video generator with special effects
🛠️ 2 new tools
TOP STORY
OpenAI launches Realtime API for voice apps
The Summary: At DevDay, OpenAI introduced the Realtime API, enabling developers to integrate advanced voice mode into their apps. The API supports natural conversations with six preset voices, and offers low-latency, multimodal interactions. The API simplifies the development of advanced voice-based apps, for language learning, customer support, and more.
Key details:
Supports natural speech-to-speech conversations with 6 voices
Handles text + audio inputs and outputs in a single API call
Uses WebSocket for persistent, low-latency communication
Pricing: $0.06 per minute of audio input, $0.24 per minute of audio output
Why it matters: The Realtime API gives developers the tools to create advanced voice experiences in their apps. It unlocks new possibilities for interactive AI, which could soon lead to broader adoption of AI voice in everyday applications.
OPENAI
OpenAI adds vision to GPT-4o fine-tuning
The Summary: OpenAI introduced several developer-focused updates, including vision fine-tuning for GPT-4o, cost-saving features like Prompt Caching and Model Distillation, and broader access to the o1 API.
Key details:
Vision fine-tuning allows customization of GPT-4o vision capabilities, with as few as 100 image examples
Prompt Caching offers 50% discount on tokens repeated across prompts
Model Distillation suite helps create efficient mini models from o1 or 4o outputs
o1 API access now available to Tier 3 developers
New Playground features auto-generate prompts and schemas
Why it matters: These updates give developers better tools for efficient model training and deployment. This will make it easier to develop more advanced and cost-effective AI solutions.
MICROSOFT
Microsoft upgrades Copilot with Voice, Vision and Deep Thinking
The Summary: Microsoft has started rolling out updated Copilot with Voice and Vision, powered by OpenAI latest models. New Copilot features include voice interaction, image analysis, and more sophisticated reasoning powered by o1. Windows 11 is also getting AI-powered search, photo editing tools, and enhanced accessibility options.
Key details:
Copilot Voice enables natural language conversations with 4 voices
Copilot Vision analyzes web content and images to answer questions
New "Think Deeper" feature supports more complex reasoning
Windows 11 introduces AI-powered search to find photos and files using natural language
Why it matters: Microsoft’s aggressive AI integration positions Windows PCs to compete with offerings like Apple Intelligence and Google Gemini. The reintroduction of the Recall feature, now opt-in with enhanced security, shows Microsoft’s caution after previous privacy missteps.
QUICK NEWS
Quick news
OpenAI extends voice mode access to more users
OpenAI co-founder Durk Kingma, inventor of VAE, joins Anthropic
Pika 1.5 video generator launches with enhanced special effects
TOOLS
🥇 New tools
That’s all for today!
If you liked the newsletter, share it with your friends and colleagues by sending them this link: https://thesummary.ai/