August 10, 2024

Meet Stable Fast 3D: A New Way To Generate 3D Assets In Less Than A Second

Stability AI introduces a novel approach in generating 3D objects in less than a second.

by

Jim Clyde Monge

Stability AI just announced Stable Fast 3D (SF3D), a new product that can generate textured UV-unwrapped 3D assets from an image in under a second. This is incredibly fast because the last AI 3D generators I tried, like Meta AI’s 3D Gen (text-to-3D) or Unique3D (image-to-3D) from Tsinghua University, render 3D assets in 50–60 seconds.

In this article, we’ll look at what this new Stable Fast 3D generator is, how it works, and how you can try it out.

What is Stable Fast 3D?

Stable Fast 3D (SF3D) is a new method for rapid and high-quality textured object mesh reconstruction from a single image in just 0.5 seconds.

Unlike most existing approaches, SF3D is explicitly trained for mesh generation and incorporates a fast UV unwrapping technique that enables swift texture generation, avoiding reliance on vertex colors. This method also predicts material parameters and normal maps to enhance the visual quality of the reconstructed 3D meshes.

Additionally, SF3D integrates a delighting step to effectively remove low-frequency illumination effects, ensuring that the reconstructed meshes can be easily used in various lighting conditions.

Stability AI’s New Stable Fast 3D Architecture — Image from Stability AI

If you want to know more details about Stable Fast 3D, check out the white paper here.

How It Works

You start by uploading a single image of an object. Stable Fast 3D then rapidly generates a complete 3D asset, including:

UV unwrapped mesh
Material parameters
Albedo colors with reduced illumination bake-in
Optional quad or triangle remeshing (adding only 100–200ms to processing time)

To generate a 3D object using SF3D, the process begins with an input image processed through a DINO v2 encoder, which generates image tokens representing the features of the image.

Stable Fast 3D how it works — Image from Stability AI

These image tokens, along with camera tokens, are fed into a large transformer model to predict a triplane volumetric representation that encodes the 3D structure and appearance of the scene.

Instead of relying on differentiable volumetric rendering, SF3D employs a differentiable mesh renderer and mesh representation. The mesh is extracted from the predicted density field, and vertex offsets are added to produce a smoother and more accurate geometry.

Next, the method extracts the albedo color, which is the intrinsic color of the object’s surface without lighting, and the tangent-space normals, which provide surface details and textures. This ensures the surface appears visually smooth and detailed. An additional network processes the input image to predict material parameters such as roughness and metallic properties, crucial for realistic rendering and providing the object with appropriate reflective properties.

During training, the model also predicts the scene’s illumination as a set of Spherical Gaussians, extracted from the triplanes, which contain the necessary 3D information.

The final image is rendered differentiably, combining the mesh, albedo color, tangent-space normals, and material parameters to ensure all components work together to produce a high-fidelity 3D asset.

Stable fast 3D huggingface example — Image from Stability AI

Use Cases

Stable Fast 3D offers multiple applications in the both gaming and movie production.

Leverage its rapid inference time during pre-production phases, where extensive experimentation is crucial.
Create static assets for games, such as background objects, clutter, and furniture.
Generate 3D models for e-commerce platforms.
Quickly produce models for AR/VR experiences.

Try it Yourself

Currently, there are three ways you can try Stable Fast 3D.

Try it on your local machine by downloading the Stable Fast 3D model code on Github and downloading the model weights.
The model is also accessible via API and on Stable Assistant.
Try it with this HuggingFace demo version.

Let me show you the Gradio Demo on HuggingFace. Simply upload a sample image to the drop zone and adjust the foreground to control the size of the foreground object.

After you upload the test input image, the background will be automatically removed. If the image already has an alpha channel, you can skip the background removal step. Click on the “Run” button.

True to its promise, the processing time is incredibly fast. After only a few milliseconds, the result will be displayed on the right section of the dashboard.

Great! The result is a high-quality 3D asset that you can download as a GLB file. Here are more examples:

I noticed the AI struggles to render objects that have smaller details. Look at the example below:

In such cases, you may need to lower the foreground ratio value to fix the issue.

Note that Stable Fast 3D is released under a Stability AI Community License that allows non-commercial use and commercial use for individuals or organizations with up to $1 million in annual revenue.

Final Thoughts

Overall, Stable Fast 3D is a novel approach to 3D asset generation. The speed, in particular, is where I am most impressed. In terms of quality, though it may already be good enough for basic shaped objects, it still struggles to handle complex objects with intricate details.

Nevertheless, this is good progress in 3D technology, and as Dr. Károly Zsolnai-Fehér says, let’s not look at where we are now but where we’ll be two papers down the line.

Despite the recent issues faced by the company, particularly the recent SD3 model release and the internal management problems, I am still glad to see them announcing new and quite good products.

‍

Stay ahead. Stay updated.