December 3, 2023

SDXL Turbo Allows Real-Time AI Image Generation - This Is Huge!

SDXL Turbo can generate a 512x512 image in 207 millisecond on an A100 GPU.

by

Jim Clyde Monge

After days of teasing by Stability AI’s CEO, Emad, the company has finally unveiled SDXL Turbo—an AI model that can generate images from simple text descriptions. As the name suggests, its main focus is speed, as it’s able to generate images in real-time.

What is SDXL Turbo?

SDXL Turbo is built on top of Stability AI’s SDXL model, which is already one of the most powerful image generation models available.

It achieves state-of-the-art performance with a new distillation technology, enabling single-step image generation with unprecedented quality, reducing the required step count from 50 to just one.

In short, SDXL-Turbo is a distilled version of SDXL 1.0, trained for real-time synthesis.

SDXL turbo example images from Stability AI — Stability AI

How does SDXL Turbo work?

The technical details under the hood are complex, but essentially, SDXL Turbo uses a novel distillation technique called Adversarial Diffusion Distillation (ADD) that enables the model to synthesize high-quality image outputs in a single step, significantly reducing the computational time required compared to traditional diffusion models.

On an A100 GPU, SDXL Turbo generates a 512x512 image in 207 milliseconds. That is incredibly fast compared to other image generation models.

Adversarial Diffusion Distillation (ADD)

The ADD-student is trained as a denoiser that receives diffused input images xs and outputs samples xˆθ(xs, s) and optimizes two objectives: a) adversarial loss: the model aims to fool a discriminator which is trained to distinguish the generated samples xˆθ from real images x0. b) distillation loss: the model is trained to match the denoised targets xˆψ of a frozen DM teacher.

If you want to delve into the details of how ADD works, check out this paper.

Comparing the results to other diffusion models

To determine how well SDXL Turbo compares to other diffusion models, Stability AI employed human evaluators to assess the quality of images generated by each model.

They used two factors to evaluate the images: how closely the generated image matched the prompt provided and the overall quality of the image.

SDXL Turbo compared to other diffusion models

Overall, these experiments demonstrate SDXL Turbo’s capabilities as a powerful and versatile diffusion model suitable for a wide range of tasks, particularly those requiring both high prompt accuracy and image quality.

This combination of speed and quality is unprecedented. But of course, it remains to be seen how its capabilities hold up to rigorous real-world testing across diverse use cases, but the initial results seem very promising.

Example images

Here are some examples of images released by Stability AI in their announcement.

The images below are images I generated myself using ClipDrop.

The example images released show impressive prompt accuracy while capturing intricate details and convincing textures—especially the example images provided by the company.

However, some flaws were visible when I tried it myself, suggesting there is still room for improvement. But again, the key selling point here is blazing speed while retaining decent quality.

How to try SDXL Turbo

There are several ways to try SDXL Turbo, since this is open-source.

ClipDrop
Run locally with ComfyUI (weights and models are here)

If you want to quickly try it out, I recommend going to ClipDrop, selecting the Stable Diffusion XL Turbo tool, and starting to type your prompt.

Limitations of SDXL Turbo

While SDXL Turbo represents a significant step forward in real-time AI image generation, it is important to acknowledge its limitations.

Fixed resolution output: Currently, SDXL Turbo produces images at a 512×512 pixel resolution.
Imperfect photorealism: While the generated images are often impressive, they may still exhibit subtle artifacts or imperfections.
Rendering text: SDXL Turbo, like many diffusion models, struggles with rendering legible text.
Lossy autoencoding: The autoencoding component of SDXL Turbo is lossy, meaning some information gets lost in the process of encoding and decoding images.

Can you use the images for commercial purposes?

Unfortunately, no.

Stability.ai has shared SDXL Turbo’s code and model on HuggingFace and GitHub. However, there are restrictions—it can only be used for non-commercial purposes right now. So researchers and hobbyists can experiment with it freely, but companies can’t use it to sell products or services.

Summary

Announcement from Stability AI
Paper — Adversarial Diffusion Distillationa
Image model — HuggingFace
Demo — ClipDrop

Final Thoughts

Overall, I am impressed by its performance. The quality of the images is not at all impressive, though. Again, the key selling point here is blazing speed while retaining decent quality.

Real-time AI image generation seemed like a distant dream just a couple months ago. Now, models like the SDXL Turbo are making it a practical reality. Let’s see what kind of creative possibilities SDXL Turbo unlocks.

‍

Stay ahead. Stay updated.