February 6, 2025

Google's Veo 2.0 AI Video Generator

Veo 2.0 comes with some really cool new features, including 4K resolution, improved camera controls, and far greater overall quality.

by

Jim Clyde Monge

Merely days after OpenAI rolled out Sora to the public, Google responded by dropping its latest and most advanced AI video model yet, Veo 2.0. This new version of Veo is packed with some really cool new features, including 4K resolution, improved camera controls, and far greater overall quality compared to its predecessor.

The timing of Veo 2.0’s release makes everyone wonder: Is Veo 2.0 better than Sora?

If it’s the first time you’ve heard about Veo, it’s Google’s AI video model capable of generating videos from text descriptions. The first version of Veo was introduced in May 2024 but was never made publicly available. Now, Google has unveiled Veo 2.0 with significant enhancements and broader functionality.

What’s New in Veo 2.0?

Google introduces three new features in Veo 2.0.

Enhanced realism and fidelity
Advanced motion capabilities
Greater camera control options

To demonstrate the capabilities of Veo 2.0, Google conducted human evaluations against other leading video generation models like Meta’s Movie Gen, Kling v1.5, Minimax, and Sora Turbo.Evaluators viewed 1,003 video samples created using prompts from Meta’s MovieGenBench dataset. Videos were compared at a resolution of 720p with varying durations: Veo’s samples were 8 seconds long, VideoGen’s samples were 10 seconds, and other models produced 5-second outputs.

Veo 2.0. Participants viewed 1003 prompts and respective videos on MovieGenBench, a benchmark dataset released by Meta. All comparisons were done at 720p resolution. Veo sample duration is 8s, VideoGen’s sample duration is 10s, and other models’ durations are 5s. We show the full video duration to raters. — Image credit: Google

Looking at the tables above, you can see that Veo 2 performs best on overall preference and for its capability to follow prompts accurately.Of course, knowing about Google’s not so pretty track record when it comes to product announcements, you need to take these benchmarks with a grain of salt. It’s always important to get your hands on these AI video generators before making any conclusions.X user Blaine Brown performed a nice experiment where he asked various video models to generate videos of a chef’s hand slicing a steak. This is very challenging for AI models. Hands, consecutive slicing physics & movement, interpretation of ‘steak done perfectly’, steam, juices, etc.Here’s the prompt and the final results.

You can see from these results that only Veo 2.0 was able to render a convincing meat-slicing video.Key Features of Veo 2.0Let’s take a closer look at the new features, starting with the enhanced realism and fidelity.According to Google, Veo 2.0 takes a huge step forward in detail, realism, and artifact reduction. The model can generate videos with highly accurate textures, natural movements, and a cinematic quality than its predecessor.Prompt: An extreme close-up shot focuses on the face of a female DJ, her beautiful, voluminous black curly hair framing her features as she becomes completely absorbed in the music. Her eyes are closed, lost in the rhythm, and a slight smile plays on her lips. The camera captures the subtle movements of her head as she nods and sways to the beat, her body instinctively responding to the music pulsating through her headphones and out into the crowd. The shallow depth of field blurs the background. She’s surrounded by vibrant neon colors. The close-up emphasizes her captivating presence and the power of music to transport and transcend.

Veo 2.0. Prompt: An extreme close-up shot focuses on the face of a female DJ, her beautiful, voluminous black curly hair framing her features as she becomes completely absorbed in the music. Her eyes are closed, lost in the rhythm, and a slight smile plays on her lips. The camera captures the subtle movements of her head as she nods and sways to the beat, her body instinctively responding to the music pulsating through her headphones and out into the crowd. The shallow depth of field blurs the b — Video credit: Google

I am honestly floored by the quality of this video. At first glance, you wouldn’t even think this is an AI-generated video. The skin texture is detailed, the head movement is fluid, and even subtle camera shakes add to the realism of the scene.This realism extends to textures and materials as well. Take, for instance, a video of a transparent rock created by Veo 2.0.

Veo 2.0 is also good at generating textures and materials. Check out the example video of a transparent rock below. — Video credit: Joey Babcock

The AI accurately simulates how light bounces and refracts through the translucent surface. This is something many video models still struggle to achieve.Okay, now let’s take a look at Veo 2’s advanced motion capabilities. According to Google’s blog, the new model excels at understanding physics and its ability to follow detailed instructions.

The resulting video feels natural, with the knife slicing through the tomato seamlessly. The physics of motion—how the tomato shifts slightly on impact and how the knife moves, is handled with surprising accuracy. Using the same prompt, here’s how OpenAI’s Sora interprets it:

OpenAI Sora: Prompt: A pair of hands skillfully slicing a ripe tomato on a wooden cutting board — Video from X

As you can see, Sora still struggles to represent real-world physics.This is going to disrupt 3D animation. Look at the hair of the subject. Each strand behaves as it would in a real-world scenario, reacting naturally to the character’s movements.Finally, Veo 2 has a brand new camera control feature that lets it interpret instructions precisely to create a wide range of shot styles, angles, movements, and combinations of all of these.Here’s an interesting example shared by Jerrod Lew on X where he showed Veo doing prompt cuts in the scene to give a more cinematic output.

Notice how the camera shifts between scenes? This is incredibly useful if you want to generate multiple scenes in one prompt. Such capability is not available in other AI video generator tools, not even OpenAI’s Sora.For filmmakers, marketers, and content creators, these tools open the door to more sophisticated AI-generated storytelling. Instead of stitching together separate scenes, Veo can now handle complex, multi-angle video production with a single prompt.How To Generate Videos with Veo 2.0Head over to Google Labs and select “VideoFx” from the list of available AI tools.

How To Generate Videos with Veo 2.0 Head over to Google Labs and select “VideoFx” from the list of available AI tools. — Image by Jim Clyde Monge

If you’re one of the lucky users to get early access to Veo 2.0 via VideoFx, you should see a prompt box on the left side where you describe the video you want to generate.When you click on the “Create videos” button, VideoFx will generate four variations at a time. You can regenerate to get more variations or download the videos to your local disk.

Veo 2.0, When you click on the “Create videos” button, VideoFx will generate four variations at a time. You can regenerate to get more variations or download the videos to your local disk. — Image by Jim Clyde Monge

Some users have also observed a “Text to Image to Video” feature, which lets you generate an image with Imagen 3 and turn that image into video using Veo 2.0.

Things You Should Know About Veo 2

Veo still “hallucinate” and sometimes produce unwanted details like extra fingers or unexpected objects.
Veo 2 outputs include an invisible SynthID watermark that helps identify them as AI-generated, helping reduce the chances of misinformation and misattribution.
Veo can generate videos that are 4x the resolution and over 6x the duration of what Sora can achieve.
Google or Deepmind has not disclosed any information on what data or where the data was taken to be used in Veo 2’s training. YouTube, being owned by Google, is one likely to be the source.

According to Google’s VP of product at DeepMin, Eli Collins, despite the promising results, there’s still work to be done.“Veo can consistently adhere to a prompt for a couple minutes, but [it can’t] adhere to complex prompts over long horizons. Similarly, character consistency can be a challenge. There’s also room to improve in generating intricate details, fast and complex motions, and continuing to push the boundaries of realism.” — Eli CollinsHow to Access Veo 2.0?Google is slowly rolling it out via VideoFX, YouTube, and Vertex AI. You can join the waitlist by going into Video FX and clicking on the “Join the waitlist” button.

You should be notified via email once you get access. Unfortunately, it’s not clear how long it takes or how Google selects the users who can access Veo 2.0.Final ThoughtsI honestly thought OpenAI would crush Google with their 12 days of Christmas rollout, but the messy Sora launch gave Google the perfect moment to take the spotlight with Veo 2.0. The level of realism here is seriously impressive. The physics and consistency are miles ahead, and the fact that it can generate 4K videos up to a minute long is already a huge achievement.I’m really glad Google released this model. I’ve been waiting for more options beyond Kling and Runway for months now. Competition like this is exactly what we need. That said, Google or DeepMind hasn’t mentioned pricing yet. I really hope they don’t pull an OpenAI and charge $200 a month for the best settings.If I had a wishlist for Google, it’d be this: bundle Veo with a Gemini subscription, add more creative controls like varying aspect ratios, resolutions, and video length, and throw in a commercial license. That would be perfect

Stay ahead. Stay updated.