June 13, 2024

Stable Diffusion 3 Weights Are Now Available To Download

SD3 Medium is a 2-billion-parameter model specifically designed to excel in areas where previous models struggled.

by

Jim Clyde Monge

The wait is over. Stability AI has released the weights of Stable Diffusion 3, its most advanced text-to-image open model to date. SD3 Medium is a 2-billion-parameter model specifically designed to excel in areas where previous models struggled.

To be clear, there will be two versions of SD3: one with 2 billion parameters and another with 8 billion parameters.

It is not yet clear when Stability AI will release the larger model. Surprisingly, the smaller model is already impressive.

What is Stable Diffusion 3 Medium?

Stable Diffusion 3 Medium is designed to run efficiently on consumer GPUs without sacrificing performance, thanks to its low VRAM footprint. It is also highly customizable and capable of fine-tuning to capture intricate details from small datasets.

Stability AI collaborated with NVIDIA to optimize all Stable Diffusion models, including Stable Diffusion 3 Medium, for NVIDIA RTX GPUs using TensorRT2. This collaboration resulted in a 50% performance boost, delivering best-in-class performance.

Take a look at these example images generated with the new SD3 Meidum model:

The first thing I noticed is the improved text rendering and improved photorealism.

Images generated with Stable Diffusion 3 Medium

The first thing I noticed is the improved text rendering and photorealism. Look at the details on the ground, the trees, and the beard in this sample image.

What’s New in SD3 Medium?

These are some of the most notable new features:

Photorealism: Eliminates common artifacts in hands and faces, producing high-quality images without requiring intricate workflows.
Typography: Delivers high-quality results in typography, surpassing the capabilities of larger, state-of-the-art models.
Performance: Optimized for both consumer systems and enterprise workloads with superior efficiency and size.
Fine-tuning: Excels at capturing subtle details from small datasets, making it ideal for customization and creative applications.

SD3 Medium Model Details

Stable Diffusion 3 utilizes Rectified Flow, a generative model that connects data and noise in a straight line. This approach improves upon traditional diffusion models by simplifying the forward process and potentially increasing sampling efficiency.

SD3 medium architecture — Concatenation is indicated by ⊙ and element-wise multiplication by ∗. The RMS-Norm for Q and K.

The model employs new noise samplers that emphasize perceptually relevant scales, leading to superior performance over traditional diffusion methods.

Additionally, it supports varying resolutions and aspect ratios through adaptable positional encodings.

The largest models, with up to 8 billion parameters, outperform state-of-the-art competitors like SDXL and DALL-E 3 in both automatic evaluations and human preference ratings.

For more information about the SD3 Medium architecture, check out the whitepaper here.

How to Access SD3 Medium

The HuggingFace demo is still not accessible at the moment, but you can try it on Stability AI’s Stable Assistant. It is not free, though. Here’s the pricing info:

You can also download the models and run them on your local PC via ComfyUI and StableSwarmUI workflows.

ComfyUI: https://github.com/comfyanonymous/ComfyUI
StableSwarmUI: https://github.com/Stability-AI/StableSwarmUI

The model sizes vary from 4 GB to 11 GB.

If you’re interested in a step-by-step guide on the SD3 Medium installation, let me know in the comments.

Example Images

Here are some example images:

Prompt: an old rusted robot wearing pants and a jacket riding skis in a supermarket.

SD3 example: an old rusted robot wearing pants and a jacket riding skis in a supermarket. — Image generated with SD3 medium

Prompt: A crab made of cheese on a plate

SD3 example: A crab made of cheese on a plate — Image generated with SD3 medium

Prompt: Dystopia of thousand of workers picking cherries and feeding them into a machine that runs on steam and is as large as a skyscraper. Written on the side of the machine: ”SD3 Paper”

SD3 example:Dystopia of thousand of workers picking cherries and feeding them into a machine that runs on steam and is as large as a skyscraper. Written on the side of the machine: “SD3 Paper” — Image generated with SD3 medium

Licensing

Before you start thinking about using SD3 for commercial purposes, note that SD3 Medium weights and code will be available for non-commercial use only.

If you would like to discuss a self-hosting license for the commercial use of Stable Diffusion 3 Medium, contact Stability AI here.

For large-scale commercial users and enterprises, you can contact Stability and obtain an Enterprise License.

Final Thoughts

Overall, it is great to see Stability AI release the SD3 model for free despite the recent internal turmoil the company has faced.

The initial sample images that Stability AI released look very good and almost comparable to Google’s Imagen 3 and Midjourney V6. Text rendering, in particular, is one of the most difficult features to solve in AI image generators.

It’s interesting that Stability AI did not release both the 2B and 8B models on the same day. According to a Stability AI staff member, the 8B model needs more training.

[SD3 8B model] needs a lot more training still — the current 2B pending release looks better than the 8B Beta on the initial API does in some direct comparisons, which means the 8B has be trained a lot more to actually look way better before it’s worth it.

It will be interesting to see how well the larger model performs against the smaller one and how the results compare to Midjourney and Dall-E 3.

------

This post was originally published here.

‍

Stay ahead. Stay updated.