Google’s new approach to text-to-video AI models.
Lumiere is a new AI model from Google which generates beautiful, smooth, and high quality video.
It uses this particular architecture they call a Space-Time U-Net:
Okay, so what??
Alright, chill. Let me explain, simply.
What previous AI video generation approaches did, in simple terms, was to generate images showing what the video would look like at certain points in time. After that, it would use a different model to create a transition from image to image, by creating multiple images/frames between the images.
For example, the AI can create the image at the start of the video and another image at the end of the video. Then another AI will fill in the gap in between to complete the video.
It’s like one of those flip books!
A lot of times, this results in rough and awkward transitions.
Lumiere, on the other hand, does it differently — coz if it did it the same way, what’s the point of this article, yeah?
The biggest change is this: instead of creating images then filling in the gap in between them,
Lumiere creates all of the frames all at once.
This creates a noticeably smoother and higher quality video.
Check out the demo video below to see what I mean.
This new approach works great not only on text-to-video, but also:
Based on their user testing (not sure if this can be trusted though), their new approach outperforms the baseline AI models, in terms of Video Quality of Image-to-Video and Text-to-Video modes, as well as Text Alignment.
Currently it can only generate 5 seconds of video, but that length is good enough for most single video shots.
Honestly, this approach is pretty interesting. I’m just a tad bit disappointed that they did not release the model itself because I would LOVE to play around with it. I’ve already been burned by video-to-video AI that had rough and awkward transitions 🥲
They also use a pre-trained text-to-image model as part of their process, so I’m thinking maybe they can try out the more compute-efficient Hourglass architecture.
GOOGLE, PLEASE RELEASE THE PRETRAINED MODEL ALREADY 🥲
Data/AI Engineer