SD3 Medium is a 2-billion-parameter model specifically designed to excel in areas where previous models struggled.
The wait is over. Stability AI has released the weights of Stable Diffusion 3, its most advanced text-to-image open model to date. SD3 Medium is a 2-billion-parameter model specifically designed to excel in areas where previous models struggled.
To be clear, there will be two versions of SD3: one with 2 billion parameters and another with 8 billion parameters.
It is not yet clear when Stability AI will release the larger model. Surprisingly, the smaller model is already impressive.
Stable Diffusion 3 Medium is designed to run efficiently on consumer GPUs without sacrificing performance, thanks to its low VRAM footprint. It is also highly customizable and capable of fine-tuning to capture intricate details from small datasets.
Stability AI collaborated with NVIDIA to optimize all Stable Diffusion models, including Stable Diffusion 3 Medium, for NVIDIA RTX GPUs using TensorRT2. This collaboration resulted in a 50% performance boost, delivering best-in-class performance.
Take a look at these example images generated with the new SD3 Meidum model:
The first thing I noticed is the improved text rendering and improved photorealism.
The first thing I noticed is the improved text rendering and photorealism. Look at the details on the ground, the trees, and the beard in this sample image.
These are some of the most notable new features:
Stable Diffusion 3 utilizes Rectified Flow, a generative model that connects data and noise in a straight line. This approach improves upon traditional diffusion models by simplifying the forward process and potentially increasing sampling efficiency.
The model employs new noise samplers that emphasize perceptually relevant scales, leading to superior performance over traditional diffusion methods.
Additionally, it supports varying resolutions and aspect ratios through adaptable positional encodings.
The largest models, with up to 8 billion parameters, outperform state-of-the-art competitors like SDXL and DALL-E 3 in both automatic evaluations and human preference ratings.
For more information about the SD3 Medium architecture, check out the whitepaper here.
The HuggingFace demo is still not accessible at the moment, but you can try it on Stability AI’s Stable Assistant. It is not free, though. Here’s the pricing info:
You can also download the models and run them on your local PC via ComfyUI and StableSwarmUI workflows.
The model sizes vary from 4 GB to 11 GB.
If you’re interested in a step-by-step guide on the SD3 Medium installation, let me know in the comments.
Here are some example images:
Prompt: an old rusted robot wearing pants and a jacket riding skis in a supermarket.
Prompt: A crab made of cheese on a plate
Prompt: Dystopia of thousand of workers picking cherries and feeding them into a machine that runs on steam and is as large as a skyscraper. Written on the side of the machine: ”SD3 Paper”
Before you start thinking about using SD3 for commercial purposes, note that SD3 Medium weights and code will be available for non-commercial use only.
If you would like to discuss a self-hosting license for the commercial use of Stable Diffusion 3 Medium, contact Stability AI here.
For large-scale commercial users and enterprises, you can contact Stability and obtain an Enterprise License.
Overall, it is great to see Stability AI release the SD3 model for free despite the recent internal turmoil the company has faced.
The initial sample images that Stability AI released look very good and almost comparable to Google’s Imagen 3 and Midjourney V6. Text rendering, in particular, is one of the most difficult features to solve in AI image generators.
It’s interesting that Stability AI did not release both the 2B and 8B models on the same day. According to a Stability AI staff member, the 8B model needs more training.
[SD3 8B model] needs a lot more training still — the current 2B pending release looks better than the 8B Beta on the initial API does in some direct comparisons, which means the 8B has be trained a lot more to actually look way better before it’s worth it.
It will be interesting to see how well the larger model performs against the smaller one and how the results compare to Midjourney and Dall-E 3.
------
This post was originally published here.
Software engineer, writer, solopreneur