VideoGigaGAN is a novel generative model for video super-resolution (VSR)
You know, it always excites me when a huge tech company announces a brand-new AI tool. I have this tendency to anticipate something fun or a jaw-dropping tool to play with or even build a product upon.
Today is no different as Adobe dropped a research preview of VideoGigaGAN, which can transform a blurry video into a highly detailed super-resolution output.
VideoGigaGAN is a novel generative model for video super-resolution (VSR) that aims to upsample low-resolution videos to high resolution while maintaining both high-frequency details and temporal consistency across frames.
Check out the before and after examples below:
The example video above is 512x512 which was upscaled to 1028x1028. Aside from getting upscaled four times, the final video gets detail-rich improvements. The results are truly impressive, showcasing the power of AI in enhancing video quality.
Just look at those skin textures and fine brow details. It’s mind-blowing.
VideoGigaGAN extends the asymmetric U-Net architecture of the image-based GigaGAN upsampler to handle video data. The model incorporates several key components to enforce temporal consistency across video frames.
Firstly, the image upsampler is inflated into a video upsampler by integrating temporal attention layers within the decoder blocks. This enables the model to capture and propagate temporal information effectively.
Secondly, a flow-guided propagation module is introduced to further enhance temporal consistency. This module provides temporally-aware features to the inflated GigaGAN by employing optical flow estimation and a recurrent neural network to align and propagate features across frames.
Thirdly, to mitigate aliasing artifacts resulting from the downsampling operations in the encoder, anti-aliasing blocks are utilized in place of standard downsampling layers. These blocks apply a low-pass filter followed by subsampling, which helps suppress aliasing and reduce temporal flickering in the output video.
Lastly, to compensate for the loss of high-frequency details caused by the anti-aliasing operation, a high-frequency feature shuttle is employed. This mechanism directly transfers high-frequency features from the encoder to the decoder via skip connections, bypassing the BlurPool process in the anti-aliasing blocks. This ensures that the output video retains sharp details and textures while benefiting from improved temporal consistency.
If you want to learn more about how it works, check out the whitepaper here.
Here are the main limitations of VideoGigaGAN:
Here are more examples of 128x128 videos upscaled to 512x512.
From low-resolution, pixelated footage to crisp, high-definition videos, it’s fascinating to see how the AI model can infer and generate missing details.
It looks like videos are now getting more love from tech companies. In the past years, video generation or manipulation with AI wasn’t getting that much attention and mind-blowing progress.
Soon, these AI-powered video generators or editing tools will be efficient enough to run on your smartphones. Cameras no longer need to have ultra-resolution hardware to capture high-quality videos.
Speaking of AI in videos, today, things are changing and we are seeing more and more AI in the video field. Take Sora from OpenAI for example, it made a lot of buzz when it was released. Another example is the recently announced VASA-1 from Microsoft, which can manipulate a single image into a talking or singing video in real-time.
The advent of AI in the video domain is an exciting development that promises to transform the way we create, edit, and consume video content.
Software engineer, writer, solopreneur