OpenAI Unveils Audio Tool That Recreates Human Voices
First, OpenAI offered a tool that allowed people to create digital images simply by describing what they wanted to see. Then, it built similar technology that generated full-motion video like something from a Hollywood movie.
Now, it has unveiled technology that can recreate someone’s voice.
The high-profile A.I. start-up said on Friday that a small group of businesses was testing a new OpenAI system, Voice Engine, that can recreate a person’s voice from a 15-second recording. If you upload a recording of yourself and a paragraph of text, it can read the text using a synthetic voice that sounds like yours.
The text does not have to be in your native language. If you are an English speaker, for example, it can recreate your voice in Spanish, French, Chinese or many other languages.
OpenAI is not sharing the technology more widely because it is still trying to understand its potential dangers. Like image and video generators, a voice generator could help spread disinformation across social media. It could also allow criminals to impersonate people online or during phone calls.
The company said it was particularly worried that this kind of technology could be used to break voice authenticators that control access to online banking accounts and other personal applications.
“This is a sensitive thing, and it is important to get it right,” an OpenAI product manager, Jeff Harris, said in an interview.
The company is exploring ways of watermarking synthetic voices or adding controls that prevent people from using the technology with the voices of politicians or other prominent figures.
Last month, OpenAI took a similar approach when it unveiled its video generator, Sora. It showed off the technology but did not publicly release it.
OpenAI is among the many companies that have developed a new breed of A.I. technology that can quickly and easily generate synthetic voices. They include tech giants like Google as well as start-ups like the New York-based ElevenLabs. (The New York Times has sued OpenAI and its partner, Microsoft, on claims of copyright infringement involving artificial intelligence systems that generate text.)
Businesses can use these technologies to generate audiobooks, give voice to online chatbots or even build an automated radio station DJ. Since last year, OpenAI has used its technology to power a version of ChatGPT that speaks. And it has long offered businesses an array of voices that can be used for similar applications. All of them were built from clips provided by voice actors.
But the company has not yet offered a public tool that would allow individuals and businesses to recreate voices from a short clip as Voice Engine does. The ability to recreate any voice in this way, Mr. Harris said, is what makes the technology dangerous. The technology could be particularly dangerous in an election year, he said.
In January, New Hampshire residents received robocall messages that dissuaded them from voting in the state primary in a voice that was most likely artificially generated to sound like President Biden. The Federal Communications Commission later outlawed such calls.
Mr. Harris said OpenAI had no immediate plans to make money from the technology. He said the tool could be particularly useful to people who lost their voices through illness or accident.
He demonstrated how the technology had been used to recreate a woman’s voice after brain cancer damaged it. She could now speak, he said, after providing a brief recording of a presentation she had once made as a high schooler.