The Summary AI
Posts
🚀 All OpenAI DevDay Releases

🚀 All OpenAI DevDay Releases

PLUS: Microsoft Copilot Upgrade

The Summary AI
October 02, 2024

Welcome back!

OpenAI launched the Realtime API, bringing voice mode to developers. This feature allows the creation of applications that speak naturally and respond instantly. With low-latency interactions and six distinct voices, it’s a big move for the next generation of AI tools. Let’s explore…

Today’s Summary:

🎙️ OpenAI launches real-time voice API
👁️ GPT-4o adds Vision fine-tuning
💻 Microsoft upgrades Copilot
🔊 OpenAI extends voice mode to more users
🚪 OpenAI co-founder Durk Kingma joins Anthropic
🎥 Pika 1.5 video generator with special effects
🛠️ 2 new tools

TOP STORY

OpenAI launches Realtime API for voice apps

The Summary: At DevDay, OpenAI introduced the Realtime API, enabling developers to integrate advanced voice mode into their apps. The API supports natural conversations with six preset voices, and offers low-latency, multimodal interactions. The API simplifies the development of advanced voice-based apps, for language learning, customer support, and more.

Source: OpenAI

Key details:

Supports natural speech-to-speech conversations with 6 voices
Handles text + audio inputs and outputs in a single API call
Uses WebSocket for persistent, low-latency communication
Pricing: $0.06 per minute of audio input, $0.24 per minute of audio output

Why it matters: The Realtime API gives developers the tools to create advanced voice experiences in their apps. It unlocks new possibilities for interactive AI, which could soon lead to broader adoption of AI voice in everyday applications.

API docs

OPENAI

OpenAI adds vision to GPT-4o fine-tuning

The Summary: OpenAI introduced several developer-focused updates, including vision fine-tuning for GPT-4o, cost-saving features like Prompt Caching and Model Distillation, and broader access to the o1 API.

Source: OpenAI

Key details:

Vision fine-tuning allows customization of GPT-4o vision capabilities, with as few as 100 image examples
Prompt Caching offers 50% discount on tokens repeated across prompts
Model Distillation suite helps create efficient mini models from o1 or 4o outputs
o1 API access now available to Tier 3 developers
New Playground features auto-generate prompts and schemas

Why it matters: These updates give developers better tools for efficient model training and deployment. This will make it easier to develop more advanced and cost-effective AI solutions.

OpenAI DevDay Documentations

MICROSOFT

Microsoft upgrades Copilot with Voice, Vision and Deep Thinking

The Summary: Microsoft has started rolling out updated Copilot with Voice and Vision, powered by OpenAI latest models. New Copilot features include voice interaction, image analysis, and more sophisticated reasoning powered by o1. Windows 11 is also getting AI-powered search, photo editing tools, and enhanced accessibility options.

Key details:

Copilot Voice enables natural language conversations with 4 voices
Copilot Vision analyzes web content and images to answer questions
New "Think Deeper" feature supports more complex reasoning
Windows 11 introduces AI-powered search to find photos and files using natural language

Why it matters: Microsoft’s aggressive AI integration positions Windows PCs to compete with offerings like Apple Intelligence and Google Gemini. The reintroduction of the Recall feature, now opt-in with enhanced security, shows Microsoft’s caution after previous privacy missteps.

Q&A

QUICK NEWS

Quick news

OpenAI extends voice mode access to more users
OpenAI co-founder Durk Kingma, inventor of VAE, joins Anthropic
Pika 1.5 video generator launches with enhanced special effects

TOOLS

🥇 New tools

Rows - AI spreadsheet, without scripts or add-ons
Graphite - AI code review companion

That’s all for today!

If you liked the newsletter, share it with your friends and colleagues by sending them this link: https://thesummary.ai/