Google’s Next Magic Trick is Making AI Sound For Your Generative Videos

June 18, 2024

57 2 minutes read

Google’s Next Magic Trick is Making AI Sound For Your Generative Videos — lzihw4f171dqesgz9q0muonzbt1bkbk1sognqvlav3auiqq1uj4niexotgwiiykzrjacpe4q6apwv8rrqj7a86 2ydlbiv6wuzd6s gu2mjuzdyvwqow2144 h1206 n nu.jpg

Content creators might never have a shortage of b-roll again thanks to Google DeepMind’s latest AI tool.

Google’s AI laboratory showed off its video-to-audio tech, shortened to V2A, and like generative video, V2A uses AI to create audio that can match what’s being played in a video. In the first few demos, the new AI tech is capable of delivering convincing audio like steady footsteps or precise drum strokes that line up with the videos’ timing.

The idea would theoretically solve a gap in generative video as we know it. OpenAI’s Sora, the more recent Dream Machine from Luma AI, and Google’s own Veo still lack audio despite being able to generate impressive visuals.

Unlocking AI-Made Audio

V2A is still under development, but the first few samples that Google DeepMind presented show a lot of promise. As seen in the demos, V2A can add anything from dramatic background music to realistic sound effects. Google DeepMind said that V2A can even generate soundtracks for older video samples like archival footage or silent films.

Impressively, Google DeepMind said that V2A can “generate an unlimited number of soundtracks for any video input,” meaning you get a ton of audio samples to play around with until you land on one that perfectly suits your video. Beyond that, you can tweak your initial prompt by telling V2A if it’s getting colder or warmer to what you were looking for.

There are downsides, though. Google DeepMind said that V2A works by understanding the raw pixels of the source video. That means if your video has artifacts or distortion, you might run into some quality issues with the generated audio. Also, V2A struggles to sync up things like a provided transcript to match someone talking in a video. As you can see below, the video and audio don’t match up, which shatters the illusion of this AI-generated clip.

Not Ready For Release Yet

As impressive as these demos look, Google DeepMind said it’s not ready to release this technology to the masses yet. Before any official release, the AI lab said it would conduct “rigorous safety assessments and testing,” but didn’t detail the exact extent of those tests.

Considering the pace at which generative AI is evolving, it makes sense for Google DeepMind to be more cautious when it comes to releasing new powerful tools like V2A. We’ve already seen some bad actors take advantage of safeguards put in place for generative AI tools and V2A could open up another can of worms. On the other hand, it could also be another AI game-changer for content creators.