OpenAI has introduced three new real-time audio models to its API: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. These models are now accessible in the Realtime API and Playground, allowing developers to incorporate them into existing applications via Codex.
The new tools expand voice functionalities from basic turn-based interactions to include real-time reasoning, multi-language translation, and live streaming transcription.
OpenAI’s New Realtime Audio Models: GPT-Realtime-2, Translate, and Whisper
GPT-Realtime-2 is OpenAI's first live voice model with reasoning capabilities comparable to GPT-5. It is designed to handle complex requests, call tools, and recover from interruptions during ongoing conversations. Key updates from GPT-Realtime-1.5 include an adjustable reasoning effort with settings for minimal, low, medium, high, and very high, with low as the default.
Its context window has been expanded from 32,000 to 128,000 tokens, supporting longer workflows. The model can call multiple tools in parallel, providing audible status updates such as "checking your calendar" or "looking that up now." It also includes preambles that allow it to say short phrases like "let me check that" before completing a request.
Improvements have been made to its understanding of domain-specific vocabulary, including proper nouns and healthcare terminology. Additionally, the model offers more controllable tone and delivery.
GPT-Realtime-Translate offers live translation from over 70 input languages into 13 output languages, keeping pace with the speaker. It is intended for use in cross-border customer support, live events, education platforms, and creator tools serving global audiences. Deutsche Telekom is testing the model for multilingual customer support, while Vimeo is experimenting with translating product education videos in real time as they are played.
GPT-Realtime-Whisper is a streaming speech-to-text model designed for low-latency transcription. It transcribes audio as it is spoken, making it suitable for applications such as live captioning, meeting notes that update during conversations, voice assistants that require ongoing understanding, and post-call workflows in sectors like customer support, healthcare, and sales.
Pricing, Safety, and Compliance for OpenAI’s Realtime Audio API
The pricing details include several options:
GPT-Realtime-2, the cost is $32 per million audio input tokens, $0.40 per million cached input tokens, and $64 per million audio output tokens.
GPT-Realtime-Translate charges $0.034 per minute.
GPT-Realtime-Whisper costs $0.017 per minute.
The Realtime API features active classifiers that can stop conversations that violate OpenAI's content policies. Developers can enhance safety by adding extra guardrails using the Agents SDK. The API also supports EU Data Residency for applications based in the EU and complies with OpenAI's enterprise privacy standards.
According to OpenAI's usage policies, developers are required to inform users when they are interacting with AI, unless the context clearly indicates this.
Thank you for being a Ghacks reader. The post OpenAI Releases Three New Realtime Voice Models for the API With GPT-5-Class Reasoning appeared first on gHacks.
0 Commentaires