April 24, 2024

Adobe Introduces VideoGigaGAN: Upscale Videos Up to 8X With Richer Details

VideoGigaGAN is a novel generative model for video super-resolution (VSR)

by

Jim Clyde Monge

You know, it always excites me when a huge tech company announces a brand-new AI tool. I have this tendency to anticipate something fun or a jaw-dropping tool to play with or even build a product upon.

Today is no different as Adobe dropped a research preview of VideoGigaGAN, which can transform a blurry video into a highly detailed super-resolution output.

What is VideoGigaGAN?

VideoGigaGAN is a novel generative model for video super-resolution (VSR) that aims to upsample low-resolution videos to high resolution while maintaining both high-frequency details and temporal consistency across frames.

Check out the before and after examples below:

Before (left) and after (right) video upscaling

The example video above is 512x512 which was upscaled to 1028x1028. Aside from getting upscaled four times, the final video gets detail-rich improvements. The results are truly impressive, showcasing the power of AI in enhancing video quality.

Just look at those skin textures and fine brow details. It’s mind-blowing.

How does it work

VideoGigaGAN extends the asymmetric U-Net architecture of the image-based GigaGAN upsampler to handle video data. The model incorporates several key components to enforce temporal consistency across video frames.

Firstly, the image upsampler is inflated into a video upsampler by integrating temporal attention layers within the decoder blocks. This enables the model to capture and propagate temporal information effectively.

Secondly, a flow-guided propagation module is introduced to further enhance temporal consistency. This module provides temporally-aware features to the inflated GigaGAN by employing optical flow estimation and a recurrent neural network to align and propagate features across frames.

Thirdly, to mitigate aliasing artifacts resulting from the downsampling operations in the encoder, anti-aliasing blocks are utilized in place of standard downsampling layers. These blocks apply a low-pass filter followed by subsampling, which helps suppress aliasing and reduce temporal flickering in the output video.

Lastly, to compensate for the loss of high-frequency details caused by the anti-aliasing operation, a high-frequency feature shuttle is employed. This mechanism directly transfers high-frequency features from the encoder to the decoder via skip connections, bypassing the BlurPool process in the anti-aliasing blocks. This ensures that the output video retains sharp details and textures while benefiting from improved temporal consistency.

If you want to learn more about how it works, check out the whitepaper here.

Limitations

Here are the main limitations of VideoGigaGAN:

Handling extremely long videos: VideoGigaGAN may encounter difficulties when processing videos with a very large number of frames, such as those exceeding 200 frames.
Poor performance on small objects: VideoGigaGAN struggles to effectively super-resolve small objects within the video frames, particularly those containing intricate details like text or fine patterns.
Large model size: Compared to previous VSR approaches, VideoGigaGAN has a notably larger model size due to its incorporation of additional components such as the flow-guided propagation module and the expanded U-Net architecture.
Dependence on optical flow accuracy: The effectiveness of VideoGigaGAN’s flow-guided propagation module heavily relies on the accuracy of the estimated optical flow between video frames. In cases where the optical flow estimation is inaccurate, such as in the presence of large motions, occlusions, or complex scene dynamics, the model’s ability to maintain temporal consistency may be compromised, potentially leading to artifacts or inconsistencies in the super-resolved output.

Another example

Here are more examples of 128x128 videos upscaled to 512x512.

From low-resolution, pixelated footage to crisp, high-definition videos, it’s fascinating to see how the AI model can infer and generate missing details.

Final Thoughts

It looks like videos are now getting more love from tech companies. In the past years, video generation or manipulation with AI wasn’t getting that much attention and mind-blowing progress.

Soon, these AI-powered video generators or editing tools will be efficient enough to run on your smartphones. Cameras no longer need to have ultra-resolution hardware to capture high-quality videos.

Speaking of AI in videos, today, things are changing and we are seeing more and more AI in the video field. Take Sora from OpenAI for example, it made a lot of buzz when it was released. Another example is the recently announced VASA-1 from Microsoft, which can manipulate a single image into a talking or singing video in real-time.

The advent of AI in the video domain is an exciting development that promises to transform the way we create, edit, and consume video content.

‍

Stay ahead. Stay updated.