Google takes on Sora with new AI video generator Veo

May 14, 2024

141 3 minutes read

Google takes on Sora with new AI video generator Veo — Screen Shot 2024 05 14 at 3.36.31 PM.png

Join us in returning to NYC on June 5th to collaborate with executive leaders in exploring comprehensive methods for auditing AI models regarding bias, performance, and ethical compliance across diverse organizations. Find out how you can attend here.

Since OpenAI unveiled its Sora generative AI video creation model earlier this year, nothing has come close in terms of sheer realism and quality of AI generated motion visuals — until now.

Amid the flurry of announcements at its annual I/O developer conference, Google today unveiled a new generative AI video model called Veo made by its researchers at its famed DeepMind AI division.

Google Veo is a generative AI video model capable of creating “high-quality, 1080p clips that can go beyond 60 seconds,” Google posted from its DeepMind account on the social network X. “From photorealism to surrealism and animation, it can tackle a range of cinematic styles.”

On its product page, Google says its goal with Veo is to “help create tools that make video production accessible to everyone. Whether you’re a seasoned filmmaker, aspiring creator, or educator looking to share knowledge, Veo unlocks new possibilities for storytelling, education and more.” The model supports text-to-video, video-to-video, and image-to-video transformations.

VB Event

The AI Impact Tour: The AI Audit

Join us as we return to NYC on June 5th to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.

Request an invite

Google partnered with polymath artist Donald Glover a.k.a Childish Gambino, creator of the hit FX series Atlanta and a film and TV star to boot, to test some new capabilities through his creative studio, Gilga, using Google’s new Veo AI video generator.

As a further testament to the notion that Google Veo is capable of generating stunning videos from its underlying AI model, DeepMind posted a number of them and the prompts on its YouTube page and X account, including a neon city, realistic jellyfish swimming in the ocean…

✍️ Prompt: “Many spotted jellyfish pulsating under water. Their bodies are transparent and glowing in deep ocean.” pic.twitter.com/y9SmNd8NK0

— Google DeepMind (@GoogleDeepMind) May 14, 2024

Cowboys riding horses, spaceships traversing the void, and lifelike human scenes…

✍️ Prompt: “A lone cowboy rides his horse across an open plain at beautiful sunset, soft light, warm colors.” pic.twitter.com/D8uKDZVWto

— Google DeepMind (@GoogleDeepMind) May 14, 2024

✍️ Prompt: “A woman sitting alone in a dimly lit cafe, a half-finished novel open in front of her. Film noir aesthetic, mysterious atmosphere. Black and white.” pic.twitter.com/vFVXr4Cvxi

— Google DeepMind (@GoogleDeepMind) May 14, 2024

The results are nearly indistinguishable from live action or skilled computer generated animations, all made with text prompts.

According to a blog post by Google VP, Product Management Eli Collins and Senior Research Director Douglas Eck, Veo “provides an unprecedented level of creative control, and understands cinematic terms like ‘timelapse’ or ‘aerial shots of a landscape.’”

In addition, Veo can easily, quickly make high-quality edits to AI generated videos or a user’s uploaded clips — even pre-recorded live action footage — from text prompts, according to Google’s Veo product page.

“When given both an input video and editing command, like adding kayaks to an aerial shot of a coastline, Veo can apply this command to the initial video and create a new, edited video,” the company writes.

Further, Google says that Veo can achieve consistency between video frames, avoiding some of the bizarre and unsettling transformations and artifacts seen even in Sora, and that Veo does this by relying on “cutting-edge latent diffusion transformers” which “reduce the appearance of these inconsistencies, keeping characters, objects and styles in place, as they would in real life.”

Google “added more details to the captions of each video in its training data,” to improve the results. “And to further improve performance, the model uses high-quality, compressed representations of video (also known as latents) so it’s more efficient too. These steps improve overall quality and reduce the time it takes to generate videos.”

Google says all Veo videos are embedded with SynthID, its content credentials tracking watermarking, ensuring they can be detected by discerning parties as AI generated.

The model is said to be the culmination of years of research at DeepMind building upon earlier advances including Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet and Lumiere.

Unfortunately, Google is not making it public just yet. Instead, following in the mold set by OpenAI with Sora (which still remains unreleased to the public), Google wrote that it is “available to select creators in private preview in VideoFX by joining our waitlist. In the future, we’ll also bring some of Veo’s capabilities to YouTube Shorts and other products.”

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat’s Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source

May 14, 2024

141 3 minutes read