OpenAI launches advanced speech AI models for developers By Elets News Network - 21 March 2025

OpenAI speech AI

OpenAI has introduced new speech-to-text and text-to-speech models in its API, equipping developers with enhanced tools to create sophisticated voice agents. These latest models improve transcription accuracy, introduce greater customisation options for speech generation, and pave the way for advanced real-time applications.

The newly launched gpt-4o-transcribe and gpt-4o-mini-transcribe models outperform previous Whisper models in word error rate and language recognition. OpenAI attributes these improvements to reinforcement learning and training on diverse audio datasets. The models are designed to provide reliable transcriptions even in noisy environments, across different speech speeds, and for various accents.


Developers can now exercise greater control over how the text-to-speech model generates speech. The gpt-4o-mini-tts model enables users to direct the AI to adopt specific speaking styles, such as simulating a customer service agent, opening up new possibilities in customer interactions and digital storytelling. However, OpenAI clarified that voice outputs are limited to synthetic preset options.

Also Read :- Meta AI expands to the European Union after regulatory delays

The company credits advancements in its audio models to extensive pretraining, advanced distillation techniques, and reinforcement learning. These innovations allow smaller models to maintain high conversational quality while reducing computational demands.

Available through OpenAI’s API, these models are also integrated with the Agents SDK, simplifying the development of interactive AI applications. For real-time, low-latency speech processing, OpenAI recommends leveraging its Realtime API.

Looking ahead, OpenAI plans to further refine the intelligence and accuracy of its audio models while exploring custom voice options. The company is also engaging with policymakers, researchers, and developers to address the broader implications of synthetic voices. Additionally, OpenAI aims to expand into video technology, advancing multimodal AI experiences.

Be a part of Elets Collaborative Initiatives. Join Us for Upcoming Events and explore business opportunities. Like us on Facebook , connect with us on LinkedIn and follow us on Twitter.

"Exciting news! Elets technomedia is now on WhatsApp Channels Subscribe today by clicking the link and stay updated with the latest insights!" Click here!

Related News