SDXL Turbo can generate a 512x512 image in 207 millisecond on an A100 GPU.
After days of teasing by Stability AI’s CEO, Emad, the company has finally unveiled SDXL Turbo—an AI model that can generate images from simple text descriptions. As the name suggests, its main focus is speed, as it’s able to generate images in real-time.
SDXL Turbo is built on top of Stability AI’s SDXL model, which is already one of the most powerful image generation models available.
It achieves state-of-the-art performance with a new distillation technology, enabling single-step image generation with unprecedented quality, reducing the required step count from 50 to just one.
In short, SDXL-Turbo is a distilled version of SDXL 1.0, trained for real-time synthesis.
The technical details under the hood are complex, but essentially, SDXL Turbo uses a novel distillation technique called Adversarial Diffusion Distillation (ADD) that enables the model to synthesize high-quality image outputs in a single step, significantly reducing the computational time required compared to traditional diffusion models.
On an A100 GPU, SDXL Turbo generates a 512x512 image in 207 milliseconds. That is incredibly fast compared to other image generation models.
The ADD-student is trained as a denoiser that receives diffused input images xs and outputs samples xˆθ(xs, s) and optimizes two objectives: a) adversarial loss: the model aims to fool a discriminator which is trained to distinguish the generated samples xˆθ from real images x0. b) distillation loss: the model is trained to match the denoised targets xˆψ of a frozen DM teacher.
If you want to delve into the details of how ADD works, check out this paper.
To determine how well SDXL Turbo compares to other diffusion models, Stability AI employed human evaluators to assess the quality of images generated by each model.
They used two factors to evaluate the images: how closely the generated image matched the prompt provided and the overall quality of the image.
Overall, these experiments demonstrate SDXL Turbo’s capabilities as a powerful and versatile diffusion model suitable for a wide range of tasks, particularly those requiring both high prompt accuracy and image quality.
This combination of speed and quality is unprecedented. But of course, it remains to be seen how its capabilities hold up to rigorous real-world testing across diverse use cases, but the initial results seem very promising.
Here are some examples of images released by Stability AI in their announcement.
The images below are images I generated myself using ClipDrop.
The example images released show impressive prompt accuracy while capturing intricate details and convincing textures—especially the example images provided by the company.
However, some flaws were visible when I tried it myself, suggesting there is still room for improvement. But again, the key selling point here is blazing speed while retaining decent quality.
There are several ways to try SDXL Turbo, since this is open-source.
If you want to quickly try it out, I recommend going to ClipDrop, selecting the Stable Diffusion XL Turbo tool, and starting to type your prompt.
While SDXL Turbo represents a significant step forward in real-time AI image generation, it is important to acknowledge its limitations.
Unfortunately, no.
Stability.ai has shared SDXL Turbo’s code and model on HuggingFace and GitHub. However, there are restrictions—it can only be used for non-commercial purposes right now. So researchers and hobbyists can experiment with it freely, but companies can’t use it to sell products or services.
Overall, I am impressed by its performance. The quality of the images is not at all impressive, though. Again, the key selling point here is blazing speed while retaining decent quality.
Real-time AI image generation seemed like a distant dream just a couple months ago. Now, models like the SDXL Turbo are making it a practical reality. Let’s see what kind of creative possibilities SDXL Turbo unlocks.
Software engineer, writer, solopreneur