January 25, 2024

Hourglass Architecture: To Better and Faster AI-Generated Images

HDiT creates HD images in a compute-efficient manner.

by

John Paul Ada

Image generative tasks like text-to-image are either slow or resource-intensive.

HDiT attempts to solve this issue.

HDiT Project Page Header — From the HDiT project page

⏳ Hourglass Diffusion Transformers (HDiT)

The folks at Stability.ai, along with some other researchers, developed a variation on the vision transformer architecture, which they call the Hourglass Diffusion Transformer (⏳ HDiT). Not sure about you but it does kinda look like an hourglass lying on its side.

HDiT Architecture Diagram — From the HDiT project page

With this new architecture, we get linear scaling, instead of quadratic scaling.

What does that mean for us mere mortals?

This means we get to generate high resolution images, with less time and computing resources.

Examples

Here are some images generated using the new HDiT models:

Try it out!

GitHub - crowsonkb/k-diffusion: Karras et al. (2022) diffusion models for PyTorch

Karras et al. (2022) diffusion models for PyTorch. Contribute to crowsonkb/k-diffusion development by creating an…

github.com

My Thoughts

I’ve heard some people say that transformers are a dead-end when it comes to generative AI given their computational complexity, so it’s interesting to see researchers trying out hybrid approaches to the transformer architecture or doing away with them altogether, like what RWKV and Mamba did.

And then eventually, some day, maybe — we’d be able to run tasks like text-to-image, superresolution (low res to high res), or even text-to-video quickly on our laptops or even on our phones!

Definitely looking forward to that!