Published By: Admin

Open AI Unveils Voice Cloning Tool: The Implications in Spoken Audio Market

Move over text generation, video or imagery, OpenAI is going to introduce Artificial Intelligence (AI) models for audio, more specifically into voice cloning.

Recently, OpenAI introduced a special Voice Engine for voice replication. With a potential to revolutionize human-computer interaction in future, this is OpenAI has ventured into voice technology, revealing Voice Engine, a cutting-edge tool to replicate one individual’s voice with great accuracy. Most surprisingly, thus technology requires only 15 seconds of recorded speech of that person. Amazing, isn't it?

In this article, we will discuss about the potential implications of this technology in spoken audio market.

The Working Principle of Voice Engine:

To start with, a person need to record his/her own voice for at least 15 seconds via computer microphone or even a phone. Then OpenAI’s Voice Engine will generate “natural-sounding speech that closely resembles the original speaker.” Meanwhile, this voice can be used for future reference.

Potential Benefits for Spoken Audio Market:

This innovative technology has undoubtedly massive implications for voice over artists, podcasters, advertising narrators, customer service agents, salespersons, audiobook, streamers, gamers, and so on.

The company further highlighted the capability of Voice Engine to support non-verbal individuals. This might be beneficial for educational programs and speech impairments.

Age of Learning: This is an Edtech company that will be using Voice Engine and GPT-4 for creating pre-scripted and real-time customized voice content. This will expand reading assistance as well as interactivity among diverse students.

HeyGen: This is an AI visual storytelling platform that allows creators as well as businesses to translate their original content into different languages. This also creates customised human-like avatars with multilingual voices while preserving main speaker’s accent.

Dimagi: This software company is set to develop tools for community health workers, using Voice Engine and GPT-4 for interactive feedback in different languages.

Livox: This AI app for Augmentative and Alternative Communication (AAC) device help individuals with speech and hearing difficulties. They are set to incorporate Voice Engine to render non-robotic voices in different languages for them.

Safety Concerns:

There are already voice-cloning startup companies. But, what sets OpenAI apart is prioritizing ethical considerations. Amidst controversy, the company emphasized on responsible use and ethical guidelines.

“We have implemented a set of safety measures, including watermarking to trace the origin of any audio generated by Voice Engine, as well as proactive monitoring of how it’s being used,” OpenAI said.

“We recognise that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year. We are engaging with US and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build,” further said by OpenAI.

Disinformation researchers are afraid of misuse of AI-powered applications in a year where India and USA will be conducting elections.

Acknowledging these problems, OpenAI said it was “taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse.”

The launch of Voice Engine by OpenAI comes immediately after filing a ‘trademark application’ for the name, indicating the company's plan to revolutionize voice-related technologies.

Inspite of its remarkable potentials, OpenAI has decided to limit the release of this Voice Engine to a selected group of testers only. This will help them realise about potential misuse and the consequent risks.