Trellis is a tool for artists, developers, and designers to produce amazing 3D content efficiently.
A few weeks ago, Microsoft unveiled a novel 3D generation method for versatile and high-quality 3D asset creation called Trellis. The model uses a unified structured latent representation (SLAT) to decode into various formats, such as Radiance Fields, 3D Gaussians, and meshes, by integrating sparse 3D grids with multiview visual features.
Okay, that sounds like a mouthful, but in simple terms, Trellis is really good at creating high-quality 3D models that look realistic and match the descriptions or pictures you provide. It’s an incredible tool for artists, developers, and designers to produce amazing 3D content efficiently.
I’ve talked about AI-powered 3D object generators in the past, but this one is particularly more impressive in terms of speed and quality.
The method uses rectified flow transformers and achieves superior results compared to existing approaches, exhibiting flexible editing capabilities.
The model is trained on a large 3D asset dataset (500K objects) and surpasses existing methods in quality and versatility, as demonstrated through extensive experiments and user studies.
The 3D object generation in Trellis is a two-stage process that uses a special code called “Structured LATent” (SLAT).
Here’s how it works:
This two-stage process allows Trellis to create high-quality 3D models efficiently. It leverages the power of artificial intelligence and computer vision to understand and recreate complex 3D objects from text descriptions or pictures.
Trellis can convert this SLAT representation into various 3D model formats, like:
Check out these high-quality examples:
The way Trellis compresses the structure and adds details is reminiscent of how professional 3D artists work—starting with a base mesh and then layering details.
However, unlike human artists, Trellis does it in a fraction of the time.
You can try Trellis on HuggingFace.
Upload an image and click “Generate” to create a 3D asset. If the image has an alpha channel, it will be used as the mask. You can play with the generation settings or the GLB extraction settings or leave them as default.
Here’s the sample 3D output:
If you feel satisfied with the 3D asset, click “Extract GLB” to extract the GLB file and download it. You can also view the 3D asset in online tools like GLTF Viewer.
Note: Gaussian file can be very large (~50MB), it will take a while to display and download.
Here are more examples:
Prompt: Spherical robot with gold and silver design.
The result looks pretty decent overall. The gold and silver textures add a nice touch, and from a distance, it looks great. But if you zoom in, you’ll notice it’s still a bit low-poly. The details aren’t as refined as they could be, with some edges looking rough. That said, for something generated this quickly, it’s hard to complain. It’s a solid result if you’re after speed and good enough for most use cases.
Here’s another example using an image as an input.
I really like how close the 3D model gets to the original reference image. The overall shape and structure feel spot-on, which is super impressive. But when you focus on the smaller details, like the ropes or the intricate textures on the sides, they’re not perfect. Even so, from a regular viewing distance, it still looks pretty good. For something generated in seconds, it’s honestly better than I expected. If you’re okay with minor imperfections, this is a fantastic starting point.
Trellis is also great at creating multiple variants of a single 3D object based on text prompts. This feature is great for iterating on designs quickly.
It doesn’t stop there because Trellis is also crazy good at composing complex and vibrant 3D art designs.
I am very impressed by the quality of this 3D scene. Microsoft is setting a new standard in 3D generation with this high-quality, scalable model. In fact, some users have already 3D-printed models created with Trellis, which is super cool!
You can find Trellis’ code on GitHub and run it locally by following the steps below:
Clone the repository:
git clone --recurse-submodules https://github.com/microsoft/TRELLIS.gitcd TRELLIS
2. Create a new conda environment named trellis
and install the dependencies:
Usage: setup.sh [OPTIONS]
Options:
-h, --help Display this help message
--new-env Create a new conda environment
--basic Install basic dependencies
--xformers Install xformers
--flash-attn Install flash-attn
--diffoctreerast Install diffoctreerast
--vox2seq Install vox2seq
--spconv Install spconv
--mipgaussian Install mip-splatting
--kaolin Install kaolin
--nvdiffrast Install nvdiffrast
--demo Install all dependencies for demo
Here is an example of how to use the pretrained models for 3D asset generation.
import os
# os.environ['ATTN_BACKEND'] = 'xformers' # Can be 'flash-attn' or 'xformers', default is 'flash-attn'
os.environ['SPCONV_ALGO'] = 'native' # Can be 'native' or 'auto', default is 'auto'.
# 'auto' is faster but will do benchmarking at the beginning.
# Recommended to set to 'native' if run only once.
import imageio
from PIL import Image
from trellis.pipelines import TrellisImageTo3DPipeline
from trellis.utils import render_utils, postprocessing_utils
# Load a pipeline from a model folder or a Hugging Face model hub.
pipeline = TrellisImageTo3DPipeline.from_pretrained("JeffreyXiang/TRELLIS-image-large")
pipeline.cuda()
# Load an image
image = Image.open("assets/example_image/T.png")
# Run the pipeline
outputs = pipeline.run(
image,
seed=1,
# Optional parameters
# sparse_structure_sampler_params={
# "steps": 12,
# "cfg_strength": 7.5,
# },
# slat_sampler_params={
# "steps": 12,
# "cfg_strength": 3,
# },
)
# outputs is a dictionary containing generated 3D assets in different formats:
# - outputs['gaussian']: a list of 3D Gaussians
# - outputs['radiance_field']: a list of radiance fields
# - outputs['mesh']: a list of meshes
# Render the outputs
video = render_utils.render_video(outputs['gaussian'][0])['color']
imageio.mimsave("sample_gs.mp4", video, fps=30)
video = render_utils.render_video(outputs['radiance_field'][0])['color']
imageio.mimsave("sample_rf.mp4", video, fps=30)
video = render_utils.render_video(outputs['mesh'][0])['normal']
imageio.mimsave("sample_mesh.mp4", video, fps=30)
# GLB files can be extracted from the outputs
glb = postprocessing_utils.to_glb(
outputs['gaussian'][0],
outputs['mesh'][0],
# Optional parameters
simplify=0.95, # Ratio of triangles to remove in the simplification process
texture_size=1024, # Size of the texture used for the GLB
)
glb.export("sample.glb")
# Save Gaussians as PLY files
outputs['gaussian'][0].save_ply("sample.ply")
Here’s a list of what you’ll get as a result:
I’m not just impressed by reasonable quality but also the speed at which you can generate a splat. To do all this in seconds is a big step forward and shows huge promise for future iterations.
It’s not perfect, though. Complex models, especially those involving human features, can trip it up. But for people who don’t know or want to learn 3D modeling, it’s still an amazing tool.
As a developer, I can’t wait to get API access. The ability to quickly create 3D assets opens up so many possibilities. Game developers and animators are going to find this incredibly useful, and I’m excited to see what the community creates with it.
‍
Software engineer, writer, solopreneur