July 31, 2024

Chinese AI Video Generator, Vidu, Competes with OpenAI's Sora and Kling AI

Vidu is another AI video generator that challenges Sora, Kling, and Gen-3 Alpha.

by

Jim Clyde Monge

T

hree months after its initial preview, Vidu by Shengshu Technology is now live and accessible to the public. This new AI video generator aims to compete with OpenAI’s popular but unreleased Sora.

In case you missed it, another Chinese AI video tool, Kling, became publicly accessible a few days ago. You can sign up with an email address and get free credits upon every login. Watch my review of Kling here.

What is Vidu?

Vidu is an AI-powered tool that can generate videos from text descriptions or existing images. Announced on April 27, 2024, Vidu is designed to generate high-definition, 4-second videos in less than 30 seconds. It can render videos in both anime and realistic styles.

The Vidu AI model is built on a proprietary visual transformation model architecture called the Universal Vision Transformer (U-ViT). This integrates two text-to-video AI models: the Diffusion and the Transformer. This architecture enables the creation of high-quality videos with dynamic camera movements, intricate facial expressions, and authentic lighting and shadow effects.

This is what the website’s dashboard looks like:

This is what the Vidu AI video generator website’s dashboard looks like: — Vidu website. Image by Jim Clyde Monge

On sign-up, users get 80 free points per month and produce good-quality output, albeit with slightly lower resolution for the free version. Each session is limited to generating 4 seconds (the paid version allows for 8 seconds).

How it Works

Head over to the Vidu website and sign up through email. On the top navigation bar, click on the “Create Video” button.

This is what the Vidu AI video generation dashboard looks like: — Image by Jim Clyde Monge

Here’s an example:

Prompt: A Chinese man sitting at a table, eating noodles with chopsticks

Vidu Prompt: A Chinese man sitting at a table, eating noodles with chopsticks — Image by Jim Clyde Monge

Below the output video file, you can choose to upscale or reuse the prompt by clicking on the ‘ConfigCopy’ button. Here’s the final result:

This video is a 4-second, 688 × 384 file. Because of the small size, the generation took less than a minute. Note that other AI video tools that generate 1080p resolution files take at least 2–3 minutes per video. Each generation costs 4 credits.

The settings page is quite simple. You can change the video style between general and animation. Note that the video style applies to text-to-video only, and the 8-second duration option is exclusive to paying customers.

Vidu AI settings page — Image by Jim Clyde Monge

Let’s try this prompt in animated style:

Prompt: In a softly lit bathroom, a teddy bear styled like an American animated character is taking a bath. The bear, partially submerged in a bubble-filled bathtub, holds a phone to its ear with one paw while scrubbing itself with the other. The ambient lighting is gentle and refreshing, casting a warm and inviting glow over the scene. The bathroom tiles are a soothing pastel color, complementing the cozy and whimsical atmosphere. The teddy bear’s expressive face shows concentration as it multitasks, combining the mundane act of bathing with the casual activity of a phone conversation.

Vidu AI Prompt: In a softly lit bathroom, a teddy bear styled like an American animated character is taking a bath. The bear, partially submerged in a bubble-filled bathtub, holds a phone to its ear with one paw while scrubbing itself with the other. The ambient lighting is gentle and refreshing, casting a warm and inviting glow over the scene. The bathroom tiles are a soothing pastel color, complementing the cozy and whimsical atmosphere. The teddy bear’s expressive face shows concentration as — GIF by Jim Clyde Monge

Oh wow. I was very impressed with the quality of the output video. It looks like it came out of an animated film from Studio Ghibli. However, you may notice that the AI model struggles with coherence. In the prompt, the bear is supposed to be holding a phone to its ear with one paw while scrubbing itself with the other.

Image to Video

Now let’s see how the image-to-video feature performs. After you upload the image, specify whether you want it to be used as the first frame or character reference of the video.

Vidu AI sample prompt — Image by Jim Clyde Monge

Here’s the reference image from Midjourney:

An image of a runner in paris. AI image with Midjourney — Image by Jim Clyde Monge

Prompt: Triumphant Marathon Runner Approaching the Finish Line, Eiffel Tower in Festive Atmosphere

This looks super cool. I was surprised to see Vidu deliberately adding more subjects to the scene with legible text on the runner’s bib number.

Text Rendering

One area where most of the AI video generators struggle is text rendering. Let’s see how Vidu handles this prompt:

Prompt: A wall with a graffiti that says “Vidu is cool”

Vidu AI Prompt: A wall with a graffiti that says “Vidu is cool” — GIF by Jim Clyde Monge

The texts are not accurate, but the letters are legible. Looking at these results, it seems to be better than Kling at generating text in videos.

How Much Does it Cost?

Here’s a summary of the subscription plans:

Free: 80 credits monthly, generate 4-second video, upscale resolution, no commercial use, 1 task at a time.
Standard: $9.99 per month (50% off, usually $19.99), 320 credits monthly, generate 4-second and 8-second video, upscale resolution, commercial use, remove watermark after upscaling, 2 tasks at a time.
Advanced: $29.99 per month (50% off, usually $59.99), 880 credits monthly, generate 4-second and 8-second video, upscale resolution, commercial use, remove watermark after upscaling, 3 tasks at a time, priority for new features.
Premium: $99.99 per month (50% off, usually $199.99), 2960 credits monthly, generate 4-second and 8-second video, upscale resolution, commercial use, remove watermark after upscaling, 4 tasks at a time, priority for new features.

Users can also opt for annual subscriptions and get a 20% discount.

And before I end this piece, Kling just launched a subscription plan starting at $5 and up to $46 per month.

The pro tier gives you the following benefits:

Exclusive Professional Mode for Members:
Evaluated to have better instruction following, higher image quality, and stronger dynamic consistency—a significant overall improvement in text-to-video and image-to-video generation compared to the Standard Mode.
Other membership features:
Watermark removal, camera control, extended video length, and advanced shot composition tools.

For free users like me, the daily login bonus of 66 credits is still in place. Visit klingai.com for more information about the subscription plans.

Final Thoughts

Overall, Vidu is a great addition to the short list of publicly available AI video generators. In terms of quality, it is ahead of Runway Gen-3 Alpha but a bit behind OpenAI’s Sora. I appreciate that free users get free monthly credits, although it would be better if they were provided daily.

Also, text rendering and coherence to the prompt are still some of the hardest areas to solve in AI videos. While Vidu still struggles with this, there’s already a big difference compared to how it was a few years ago.

I am glad that video generation is finally catching up with text and image generation in 2024. In the coming months, we could see more AI video generators released with improved quality and cheaper subscriptions.

‍

Stay ahead. Stay updated.