Meta AI announces its first AI model that generates video from text description.
In the fast-paced world of AI, while text-to-image models have been rapidly advancing, text-to-video AI tools have lagged behind. There are a few video generator tools available, but only Runway’s Gen2 and Pika Labs have managed to produce truly compelling results.
Today, Meta AI announced their own version of an AI video generator called Emu Video, and it looks amazing.
Emu Video, an extension of the Emu model known for image generation, brings an innovative approach to text-to-video generation. It leverages diffusion models in a way that’s both simple and highly effective.
The model is trained on the largest dataset of its kind—10 million synthesized samples with an input image, task description, and target output image. This makes it the largest dataset of its kind to date.
Here's an example:
What do you think of these videos? I love how smooth the transitions between frames are. Meta has done an excellent job with this model.
Generating videos involves two steps:
According to Meta AI, this “factorized” or split approach to video generation lets them train video generation models efficiently.
The result is a 4-second, 16-fps, 512x512-pixel video.
However, the researchers said that it is possible to extend the video and still get a decent result.
They demonstrated a model that generates plausible continuations of original videos conditioned on new prompts.
Meta researchers used human raters to compare Emu Video’s results against state-of-the-art text-to-video generation models on a variety of prompts based on quality and faithfulness.
Emu Video performed well according to Meta’s own evaluation, showcasing their progress in text-to-video generation. However, this is only based on their internal testing; I can’t fully attest to these results or draw any definitive conclusions about Emu Video’s capabilities until I get hands-on experience with the tool myself.
Right now, Emu Video is fundamental research and is not a real product yet. Meta has released a demo website here for you to check out a collection of videos generated by Emu Video.
Don’t get me wrong—the tech behind Emu Video looks seriously impressive. But as eager as I am to try these new AI tools from Meta, I know that real-world use doesn’t always live up to lab tests. I hope they release a publicly accessible tool soon.
Still, I’m thrilled Meta is pushing boundaries in AI innovation. We need companies thinking big to keep technology moving forward. At the same time, I hope Meta considers open-sourcing these tools.
‍
Software engineer, writer, solopreneur