| | --- |
| | license: mit |
| | tags: |
| | - inference |
| | - gpu |
| | - a100 |
| | datasets: |
| | - your-dataset-name |
| | base_model: |
| | - JeffreyXiang/TRELLIS-image-large |
| | library_name: custom |
| | --- |
| | |
| | <img src="assets/logo.webp" width="100%" align="center"> |
| | <h1 align="center">Structured 3D Latents<br>for Scalable and Versatile 3D Generation</h1> |
| | <p align="center"><a href="https://arxiv.org/abs/2412.01506"><img src='https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv&logoColor=white' alt='arXiv'></a> |
| | <a href='https://trellis3d.github.io'><img src='https://img.shields.io/badge/Project_Page-Website-green?logo=googlechrome&logoColor=white' alt='Project Page'></a> |
| | <a href='https://huggingface.co/spaces/JeffreyXiang/TRELLIS'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Live_Demo-blue'></a> |
| | </p> |
| | <p align="center"><img src="assets/teaser.png" width="100%"></p> |
| |
|
| | <span style="font-size: 16px; font-weight: 600;">T</span><span style="font-size: 12px; font-weight: 700;">RELLIS</span> is a large 3D asset generation model. It takes in text or image prompts and generates high-quality 3D assets in various formats, such as Radiance Fields, 3D Gaussians, and meshes. The cornerstone of <span style="font-size: 16px; font-weight: 600;">T</span><span style="font-size: 12px; font-weight: 700;">RELLIS</span> is a unified Structured LATent (<span style="font-size: 16px; font-weight: 600;">SL</span><span style="font-size: 12px; font-weight: 700;">AT</span>) representation that allows decoding to different output formats and Rectified Flow Transformers tailored for <span style="font-size: 16px; font-weight: 600;">SL</span><span style="font-size: 12px; font-weight: 700;">AT</span> as the powerful backbones. We provide large-scale pre-trained models with up to 2 billion parameters on a large 3D asset dataset of 500K diverse objects. <span style="font-size: 16px; font-weight: 600;">T</span><span style="font-size: 12px; font-weight: 700;">RELLIS</span> significantly surpasses existing methods, including recent ones at similar scales, and showcases flexible output format selection and local 3D editing capabilities which were not offered by previous models. |
| |
|
| | ***Check out our [Project Page](https://trellis3d.github.io) for more videos and interactive demos!*** |
| |
|
| | <!-- Features --> |
| | ## 🌟 Features |
| | - **High Quality**: It produces diverse 3D assets at high quality with intricate shape and texture details. |
| | - **Versatility**: It takes text or image prompts and can generate various final 3D representations including but not limited to *Radiance Fields*, *3D Gaussians*, and *meshes*, accommodating diverse downstream requirements. |
| | - **Flexible Editing**: It allows for easy editings of generated 3D assets, such as generating variants of the same object or local editing of the 3D asset. |
| |
|
| | <!-- Updates --> |
| | ## ⏩ Updates |
| |
|
| | **12/26/2024** |
| | - Release [**TRELLIS-500K**](https://github.com/microsoft/TRELLIS#-dataset) dataset and toolkits for data preparation. |
| |
|
| | **12/18/2024** |
| | - Implementation of multi-image conditioning for TRELLIS-image model. ([#7](https://github.com/microsoft/TRELLIS/issues/7)). This is based on tuning-free algorithm without training a specialized model, so it may not give the best results for all input images. |
| | - Add Gaussian export in `app.py` and `example.py`. ([#40](https://github.com/microsoft/TRELLIS/issues/40)) |
| |
|
| | <!-- TODO List --> |
| | ## 🚧 TODO List |
| | - [x] Release inference code and TRELLIS-image-large model |
| | - [x] Release dataset and dataset toolkits |
| | - [ ] Release TRELLIS-text model series |
| | - [ ] Release training code |
| |
|
| | <!-- Installation --> |
| | ## 📦 Installation |
| |
|
| | ### Prerequisites |
| | - **System**: The code is currently tested only on **Linux**. For windows setup, you may refer to [#3](https://github.com/microsoft/TRELLIS/issues/3) (not fully tested). |
| | - **Hardware**: An NVIDIA GPU with at least 16GB of memory is necessary. The code has been verified on NVIDIA A100 and A6000 GPUs. |
| | - **Software**: |
| | - The [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit-archive) is needed to compile certain submodules. The code has been tested with CUDA versions 11.8 and 12.2. |
| | - [Conda](https://docs.anaconda.com/miniconda/install/#quick-command-line-install) is recommended for managing dependencies. |
| | - Python version 3.8 or higher is required. |
| |
|
| | ### Installation Steps |
| | 1. Clone the repo: |
| | ```sh |
| | git clone --recurse-submodules https://github.com/microsoft/TRELLIS.git |
| | cd TRELLIS |
| | ``` |
| | |
| | 2. Install the dependencies: |
| | |
| | **Before running the following command there are somethings to note:** |
| | - By adding `--new-env`, a new conda environment named `trellis` will be created. If you want to use an existing conda environment, please remove this flag. |
| | - By default the `trellis` environment will use pytorch 2.4.0 with CUDA 11.8. If you want to use a different version of CUDA (e.g., if you have CUDA Toolkit 12.2 installed and do not want to install another 11.8 version for submodule compilation), you can remove the `--new-env` flag and manually install the required dependencies. Refer to [PyTorch](https://pytorch.org/get-started/previous-versions/) for the installation command. |
| | - If you have multiple CUDA Toolkit versions installed, `PATH` should be set to the correct version before running the command. For example, if you have CUDA Toolkit 11.8 and 12.2 installed, you should run `export PATH=/usr/local/cuda-11.8/bin:$PATH` before running the command. |
| | - By default, the code uses the `flash-attn` backend for attention. For GPUs do not support `flash-attn` (e.g., NVIDIA V100), you can remove the `--flash-attn` flag to install `xformers` only and set the `ATTN_BACKEND` environment variable to `xformers` before running the code. See the [Minimal Example](#minimal-example) for more details. |
| | - The installation may take a while due to the large number of dependencies. Please be patient. If you encounter any issues, you can try to install the dependencies one by one, specifying one flag at a time. |
| | - If you encounter any issues during the installation, feel free to open an issue or contact us. |
| | |
| | Create a new conda environment named `trellis` and install the dependencies: |
| | ```sh |
| | . ./setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast |
| | ``` |
| | The detailed usage of `setup.sh` can be found by running `. ./setup.sh --help`. |
| | ```sh |
| | Usage: setup.sh [OPTIONS] |
| | Options: |
| | -h, --help Display this help message |
| | --new-env Create a new conda environment |
| | --basic Install basic dependencies |
| | --xformers Install xformers |
| | --flash-attn Install flash-attn |
| | --diffoctreerast Install diffoctreerast |
| | --vox2seq Install vox2seq |
| | --spconv Install spconv |
| | --mipgaussian Install mip-splatting |
| | --kaolin Install kaolin |
| | --nvdiffrast Install nvdiffrast |
| | --demo Install all dependencies for demo |
| | ``` |
| | |
| | <!-- Pretrained Models --> |
| | ## 🤖 Pretrained Models |
| |
|
| | We provide the following pretrained models: |
| |
|
| | | Model | Description | #Params | Download | |
| | | --- | --- | --- | --- | |
| | | TRELLIS-image-large | Large image-to-3D model | 1.2B | [Download](https://huggingface.co/JeffreyXiang/TRELLIS-image-large) | |
| | | TRELLIS-text-base | Base text-to-3D model | 342M | Coming Soon | |
| | | TRELLIS-text-large | Large text-to-3D model | 1.1B | Coming Soon | |
| | | TRELLIS-text-xlarge | Extra-large text-to-3D model | 2.0B | Coming Soon | |
| |
|
| | The models are hosted on Hugging Face. You can directly load the models with their repository names in the code: |
| | ```python |
| | TrellisImageTo3DPipeline.from_pretrained("JeffreyXiang/TRELLIS-image-large") |
| | ``` |
| |
|
| | If you prefer loading the model from local, you can download the model files from the links above and load the model with the folder path (folder structure should be maintained): |
| | ```python |
| | TrellisImageTo3DPipeline.from_pretrained("/path/to/TRELLIS-image-large") |
| | ``` |
| |
|
| | <!-- Usage --> |
| | ## 💡 Usage |
| |
|
| | ### Minimal Example |
| |
|
| | Here is an [example](example.py) of how to use the pretrained models for 3D asset generation. |
| |
|
| | ```python |
| | import os |
| | # os.environ['ATTN_BACKEND'] = 'xformers' # Can be 'flash-attn' or 'xformers', default is 'flash-attn' |
| | os.environ['SPCONV_ALGO'] = 'native' # Can be 'native' or 'auto', default is 'auto'. |
| | # 'auto' is faster but will do benchmarking at the beginning. |
| | # Recommended to set to 'native' if run only once. |
| | |
| | import imageio |
| | from PIL import Image |
| | from trellis.pipelines import TrellisImageTo3DPipeline |
| | from trellis.utils import render_utils, postprocessing_utils |
| | |
| | # Load a pipeline from a model folder or a Hugging Face model hub. |
| | pipeline = TrellisImageTo3DPipeline.from_pretrained("JeffreyXiang/TRELLIS-image-large") |
| | pipeline.cuda() |
| | |
| | # Load an image |
| | image = Image.open("assets/example_image/T.png") |
| | |
| | # Run the pipeline |
| | outputs = pipeline.run( |
| | image, |
| | seed=1, |
| | # Optional parameters |
| | # sparse_structure_sampler_params={ |
| | # "steps": 12, |
| | # "cfg_strength": 7.5, |
| | # }, |
| | # slat_sampler_params={ |
| | # "steps": 12, |
| | # "cfg_strength": 3, |
| | # }, |
| | ) |
| | # outputs is a dictionary containing generated 3D assets in different formats: |
| | # - outputs['gaussian']: a list of 3D Gaussians |
| | # - outputs['radiance_field']: a list of radiance fields |
| | # - outputs['mesh']: a list of meshes |
| | |
| | # Render the outputs |
| | video = render_utils.render_video(outputs['gaussian'][0])['color'] |
| | imageio.mimsave("sample_gs.mp4", video, fps=30) |
| | video = render_utils.render_video(outputs['radiance_field'][0])['color'] |
| | imageio.mimsave("sample_rf.mp4", video, fps=30) |
| | video = render_utils.render_video(outputs['mesh'][0])['normal'] |
| | imageio.mimsave("sample_mesh.mp4", video, fps=30) |
| | |
| | # GLB files can be extracted from the outputs |
| | glb = postprocessing_utils.to_glb( |
| | outputs['gaussian'][0], |
| | outputs['mesh'][0], |
| | # Optional parameters |
| | simplify=0.95, # Ratio of triangles to remove in the simplification process |
| | texture_size=1024, # Size of the texture used for the GLB |
| | ) |
| | glb.export("sample.glb") |
| | |
| | # Save Gaussians as PLY files |
| | outputs['gaussian'][0].save_ply("sample.ply") |
| | ``` |
| |
|
| | After running the code, you will get the following files: |
| | - `sample_gs.mp4`: a video showing the 3D Gaussian representation |
| | - `sample_rf.mp4`: a video showing the Radiance Field representation |
| | - `sample_mesh.mp4`: a video showing the mesh representation |
| | - `sample.glb`: a GLB file containing the extracted textured mesh |
| | - `sample.ply`: a PLY file containing the 3D Gaussian representation |
| |
|
| |
|
| | ### Web Demo |
| |
|
| | [app.py](app.py) provides a simple web demo for 3D asset generation. Since this demo is based on [Gradio](https://gradio.app/), additional dependencies are required: |
| | ```sh |
| | . ./setup.sh --demo |
| | ``` |
| |
|
| | After installing the dependencies, you can run the demo with the following command: |
| | ```sh |
| | python app.py |
| | ``` |
| |
|
| | Then, you can access the demo at the address shown in the terminal. |
| |
|
| | ***The web demo is also available on [Hugging Face Spaces](https://huggingface.co/spaces/JeffreyXiang/TRELLIS)!*** |
| |
|
| |
|
| | <!-- Dataset --> |
| | ## 📚 Dataset |
| |
|
| | We provide **TRELLIS-500K**, a large-scale dataset containing 500K 3D assets curated from [Objaverse(XL)](https://objaverse.allenai.org/), [ABO](https://amazon-berkeley-objects.s3.amazonaws.com/index.html), [3D-FUTURE](https://tianchi.aliyun.com/specials/promotion/alibaba-3d-future), [HSSD](https://huggingface.co/datasets/hssd/hssd-models), and [Toys4k](https://github.com/rehg-lab/lowshot-shapebias/tree/main/toys4k), filtered based on aesthetic scores. Please refer to the [dataset README](DATASET.md) for more details. |
| |
|
| | <!-- License --> |
| | ## ⚖️ License |
| |
|
| | TRELLIS models and the majority of the code are licensed under the [MIT License](LICENSE). The following submodules may have different licenses: |
| | - [**diffoctreerast**](https://github.com/JeffreyXiang/diffoctreerast): We developed a CUDA-based real-time differentiable octree renderer for rendering radiance fields as part of this project. This renderer is derived from the [diff-gaussian-rasterization](https://github.com/graphdeco-inria/diff-gaussian-rasterization) project and is available under the [LICENSE](https://github.com/JeffreyXiang/diffoctreerast/blob/master/LICENSE). |
| |
|
| |
|
| | - [**Modified Flexicubes**](https://github.com/MaxtirError/FlexiCubes): In this project, we used a modified version of [Flexicubes](https://github.com/nv-tlabs/FlexiCubes) to support vertex attributes. This modified version is licensed under the [LICENSE](https://github.com/nv-tlabs/FlexiCubes/blob/main/LICENSE.txt). |
| |
|
| |
|
| |
|
| |
|
| | <!-- Citation --> |
| | ## 📜 Citation |
| |
|
| | If you find this work helpful, please consider citing our paper: |
| |
|
| | ```bibtex |
| | @article{xiang2024structured, |
| | title = {Structured 3D Latents for Scalable and Versatile 3D Generation}, |
| | author = {Xiang, Jianfeng and Lv, Zelong and Xu, Sicheng and Deng, Yu and Wang, Ruicheng and Zhang, Bowen and Chen, Dong and Tong, Xin and Yang, Jiaolong}, |
| | journal = {arXiv preprint arXiv:2412.01506}, |
| | year = {2024} |
| | } |
| | ``` |