OpenAI Releases Three New Realtime Voice Models for the API With GPT-5-Class Reasoning

OpenAI has introduced three new real-time audio models to its API: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. These models are now accessible in the Realtime API and Playground, allowing developers to incorporate them into existing applications via Codex.

The new tools expand voice functionalities from basic turn-based interactions to include real-time reasoning, multi-language translation, and live streaming transcription.

OpenAI’s New Realtime Audio Models: GPT-Realtime-2, Translate, and Whisper

GPT-Realtime-2 is OpenAI's first live voice model with reasoning capabilities comparable to GPT-5. It is designed to handle complex requests, call tools, and recover from interruptions during ongoing conversations. Key updates from GPT-Realtime-1.5 include an adjustable reasoning effort with settings for minimal, low, medium, high, and very high, with low as the default.

Its context window has been expanded from 32,000 to 128,000 tokens, supporting longer workflows. The model can call multiple tools in parallel, providing audible status updates such as "checking your calendar" or "looking that up now." It also includes preambles that allow it to say short phrases like "let me check that" before completing a request.

Improvements have been made to its understanding of domain-specific vocabulary, including proper nouns and healthcare terminology. Additionally, the model offers more controllable tone and delivery.

GPT-Realtime-Translate offers live translation from over 70 input languages into 13 output languages, keeping pace with the speaker. It is intended for use in cross-border customer support, live events, education platforms, and creator tools serving global audiences. Deutsche Telekom is testing the model for multilingual customer support, while Vimeo is experimenting with translating product education videos in real time as they are played.

GPT-Realtime-Whisper is a streaming speech-to-text model designed for low-latency transcription. It transcribes audio as it is spoken, making it suitable for applications such as live captioning, meeting notes that update during conversations, voice assistants that require ongoing understanding, and post-call workflows in sectors like customer support, healthcare, and sales.

Pricing, Safety, and Compliance for OpenAI’s Realtime Audio API

The pricing details include several options:

GPT-Realtime-2, the cost is $32 per million audio input tokens, $0.40 per million cached input tokens, and $64 per million audio output tokens.

GPT-Realtime-Translate charges $0.034 per minute.

GPT-Realtime-Whisper costs $0.017 per minute.

The Realtime API features active classifiers that can stop conversations that violate OpenAI's content policies. Developers can enhance safety by adding extra guardrails using the Agents SDK. The API also supports EU Data Residency for applications based in the EU and complies with OpenAI's enterprise privacy standards.

According to OpenAI's usage policies, developers are required to inform users when they are interacting with AI, unless the context clearly indicates this.

Thank you for being a Ghacks reader. The post OpenAI Releases Three New Realtime Voice Models for the API With GPT-5-Class Reasoning appeared first on gHacks.

hacks Technology

Ticker

OpenAI Releases Three New Realtime Voice Models for the API With GPT-5-Class Reasoning

OpenAI’s New Realtime Audio Models: GPT-Realtime-2, Translate, and Whisper

Pricing, Safety, and Compliance for OpenAI’s Realtime Audio API

Enregistrer un commentaire

0 Commentaires

Subscribe Us

Popular Posts

Disneyland Deploys Facial Recognition at Select Entrance Lanes to Prevent Pass Fraud

Xbox Dynamic Background Editor Gets an Update, Indicates Microsoft is Still Committed to the Project

Microsoft Pushes Second Emergency Windows 11 Update After Patch Tuesday Fallout

Here's everything we know about Forza Horizon 5 on Xbox and PC

PlayStation Plus May 2026 Games Include EA Sports FC 26, Wuchang: Fallen Feathers, and Nine Sols

Mozilla reverse engineers Microsoft Edge's default browser setting behavior

Google Pixel Tablet Images & Specifications Surfaces Online, “Pro” Variant Reportedly Cancelled

Microsoft Registers New Patent For AI-Generated Audio in Games and Movies

Mac Mini Sold Out for Several Months as AI Developers Buy Them for Local Agent Workloads

Mozilla makes it easy to set Firefox as your default browser on Windows 10

Random Posts

Recent in Sports

Popular Posts

Disneyland Deploys Facial Recognition at Select Entrance Lanes to Prevent Pass Fraud

Xbox Dynamic Background Editor Gets an Update, Indicates Microsoft is Still Committed to the Project

Microsoft Pushes Second Emergency Windows 11 Update After Patch Tuesday Fallout

Footer Menu Widget

Ticker

Ad Code

OpenAI Releases Three New Realtime Voice Models for the API With GPT-5-Class Reasoning

OpenAI’s New Realtime Audio Models: GPT-Realtime-2, Translate, and Whisper

Pricing, Safety, and Compliance for OpenAI’s Realtime Audio API

Ces posts pourraient vous intéresser

Enregistrer un commentaire

0 Commentaires

Social Plugin

Subscribe Us

Popular Posts

Random Posts

Recent in Sports

Popular Posts

Footer Menu Widget