tao-hunter commited on 7 days ago

Commit

2ecd3b2

verified ·

1 Parent(s): 9cbe5b0

Upload folder using huggingface_hub

Browse files

Files changed (20) hide show

README.md +139 -0
ckpts/shape_dec_next_dc_f16c32_fp16.json +24 -0
ckpts/shape_dec_next_dc_f16c32_fp16.safetensors +3 -0
ckpts/shape_enc_next_dc_f16c32_fp16.json +24 -0
ckpts/shape_enc_next_dc_f16c32_fp16.safetensors +3 -0
ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16.json +19 -0
ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16.safetensors +3 -0
ckpts/slat_flow_img2shape_dit_1_3B_512_bf16.json +19 -0
ckpts/slat_flow_img2shape_dit_1_3B_512_bf16.safetensors +3 -0
ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16.json +19 -0
ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16.safetensors +3 -0
ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16.json +19 -0
ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16.safetensors +3 -0
ckpts/ss_flow_img_dit_1_3B_64_bf16.json +19 -0
ckpts/ss_flow_img_dit_1_3B_64_bf16.safetensors +3 -0
ckpts/tex_dec_next_dc_f16c32_fp16.json +25 -0
ckpts/tex_dec_next_dc_f16c32_fp16.safetensors +3 -0
ckpts/tex_enc_next_dc_f16c32_fp16.json +24 -0
ckpts/tex_enc_next_dc_f16c32_fp16.safetensors +3 -0
pipeline.json +95 -0

README.md ADDED Viewed

	@@ -0,0 +1,139 @@

+---
+license: mit
+pipeline_tag: image-to-3d
+library_name: trellis2
+language:
+- en
+---
+# TRELLIS.2: Native and Compact Structured Latents for 3D Generation
+**Model Name:** TRELLIS.2-4B
+**Paper:** [https://arxiv.org/abs/2512.14692](https://arxiv.org/abs/2512.14692)
+**Repository:** [https://github.com/microsoft/TRELLIS.2](https://github.com/microsoft/TRELLIS.2)
+**Project Page:** [https://microsoft.github.io/trellis.2](https://microsoft.github.io/trellis.2)
+## Introduction
+**TRELLIS.2** is a state-of-the-art large 3D generative model designed for high-fidelity **image-to-3D** generation. It leverages a novel "field-free" sparse voxel structure termed **O-Voxel** and a large-scale flow-matching transformer (4 Billion parameters).
+Unlike previous methods that rely on iso-surface fields (e.g., SDF, Flexicubes) which struggle with open surfaces or non-manifold geometry, TRELLIS can reconstruct and generate **arbitrary 3D assets** with complex topologies, sharp features, and full Physical-Based Rendering (PBR) materials—including transparency/translucency.
+## Model Details
+*   **Developed by:** Jianfeng Xiang, Xiaoxue Chen, Sicheng Xu, Ruicheng Wang, Zelong Lv, Yu Deng, Hongyuan Zhu, Yue Dong, Hao Zhao, Nicholas Jing Yuan, Jiaolong Yang
+*   **Model Type:** Flow-Matching Transformers with Sparse Voxel based 3D VAE
+*   **Parameters:** 4 Billion
+*   **Input:** Single Image
+*   **Output:** 3D Asset (Mesh with PBR Materials)
+*   **Resolution:** Varies from 512³ to 1536³ (Voxel Grid Resolution)
+## Key Features
+*   **O-Voxel Representation:** An omni-voxel structure that encodes both geometry and appearance. It supports:
+    *   **Arbitrary Topology:** Handles open surfaces, non-manifold geometry, and fully-enclosed structures without lossy conversion.
+    *   **Rich Appearance:** Captures PBR attributes (including opacity for translucent surfaces) aligned with geometry.
+    *   **Efficiency:** Instant optimization-free bidirectional conversion between meshes and O-Voxels (ms to seconds).
+*   **High-Resolution Generation:** The model is trained to generate fully textured assets at **up to 1536³ resolution**.
+*   **High-Fidelity while Compact Latent Space:** Utilizes a Sparse 3D VAE with **16× spatial downsampling**, encoding a 1024³ asset into only ~9.6K latent tokens with negligible perceptual degradation.
+*   **Shape-conditioned Texture Generation:** Generates textures for input 3D meshes and reference images.
+*   **State-of-the-Art Speed:** Inference is highly efficient; see table below.
+## Inference Speed (NVIDIA H100 GPU)
+| Resolution | Time |
+| :--- | :--- |
+| 512³ | ~3 seconds |
+| 1024³ | ~17 seconds |
+| 1536³ | ~60 seconds |
+## Requirements
+- **System**: The model is currently tested only on **Linux**.
+- **Hardware**: An NVIDIA GPU with at least 24GB of memory is necessary. The code has been verified on NVIDIA A100 and H100 GPUs.
+- **Software**:
+  - The [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit-archive) is needed to compile certain packages. Recommended version is 12.4.
+  - [Conda](https://docs.anaconda.com/miniconda/install/#quick-command-line-install) is recommended for managing dependencies.
+  - Python version 3.8 or higher is required.
+## Known Limitations
+*   **Geometric Artifacts (Small Holes):** While O-Voxels handle complex topology well, the generated raw meshes may occasionally contain small holes or minor topological discontinuities. For applications requiring strictly watertight geometry (e.g., 3D printing), we provide accompanying mesh post-processing scripts, such as hole-filling algorithms.
+*   **Base Model w/o Alignment:** TRELLIS.2-4B is a pre-trained foundation model. It has **not** been aligned with human preferences (e.g., via RLHF) or fine-tuned for specific aesthetic standards. Consequently, the outputs reflect the distribution of the training data and may vary in style; users may need to experiment with inputs to achieve the desired artistic result.
+We are actively working on improving the model and addressing these limitations.
+## Usage
+*Note: Please refer to the official [GitHub Repository](https://github.com/microsoft/TRELLIS.2) for installation instructions and dependencies.*
+```python
+import os
+os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'
+os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"  # Can save GPU memory
+import cv2
+import imageio
+from PIL import Image
+import torch
+from trellis2.pipelines import Trellis2ImageTo3DPipeline
+from trellis2.utils import render_utils
+from trellis2.renderers import EnvMap
+import o_voxel
+# 1. Setup Environment Map
+envmap = EnvMap(torch.tensor(
+    cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB),
+    dtype=torch.float32, device='cuda'
+))
+# 2. Load Pipeline
+pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B")
+pipeline.cuda()
+# 3. Load Image & Run
+image = Image.open("assets/example_image/T.png")
+mesh = pipeline.run(image)[0]
+mesh.simplify(16777216) # nvdiffrast limit
+# 4. Render Video
+video = render_utils.make_pbr_vis_frames(render_utils.render_video(mesh, envmap=envmap))
+imageio.mimsave("sample.mp4", video, fps=15)
+# 5. Export to GLB
+glb = o_voxel.postprocess.to_glb(
+    vertices            =   mesh.vertices,
+    faces               =   mesh.faces,
+    attr_volume         =   mesh.attrs,
+    coords              =   mesh.coords,
+    attr_layout         =   mesh.layout,
+    voxel_size          =   mesh.voxel_size,
+    aabb                =   [[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
+    decimation_target   =   1000000,
+    texture_size        =   4096,
+    remesh              =   True,
+    remesh_band         =   1,
+    remesh_project      =   0,
+    verbose             =   True
+)
+glb.export("sample.glb", extension_webp=True)
+```
+## Citation
+If you find this model useful for your research, please cite our work:
+```
+@article{
+    xiang2025trellis2,
+    title={Native and Compact Structured Latents for 3D Generation},
+    author={Xiang, Jianfeng and Chen, Xiaoxue and Xu, Sicheng and Wang, Ruicheng and Lv, Zelong and Deng, Yu and Zhu, Hongyuan and Dong, Yue and Zhao, Hao and Yuan, Nicholas Jing and Yang, Jiaolong},
+    journal={Tech report},
+    year={2025}
+}
+```
+## License
+This model is released under the MIT License. The code and dataset are publicly released to facilitate reproduction and further research.

ckpts/shape_dec_next_dc_f16c32_fp16.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+    "name": "FlexiDualGridVaeDecoder",
+    "args": {
+        "resolution": 256,
+        "model_channels": [1024, 512, 256, 128, 64],
+        "latent_channels": 32,
+        "num_blocks": [4, 16, 8, 4, 0],
+        "block_type": [
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d"
+        ],
+        "up_block_type": [
+            "SparseResBlockC2S3d",
+            "SparseResBlockC2S3d",
+            "SparseResBlockC2S3d",
+            "SparseResBlockC2S3d"
+        ],
+        "block_args": [{}, {}, {}, {}, {}],
+        "use_fp16": true
+    }
+}

ckpts/shape_dec_next_dc_f16c32_fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e3b718d3e43e4f8780e9a24ac6fff231811a67e3b058e336e10fe654c911d581
+size 948490494

ckpts/shape_enc_next_dc_f16c32_fp16.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+    "name": "FlexiDualGridVaeEncoder",
+    "args": {
+        "resolution": 256,
+        "model_channels": [64, 128, 256, 512, 1024],
+        "latent_channels": 32,
+        "num_blocks": [0, 4, 8, 16, 4],
+        "block_type": [
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d"
+        ],
+        "up_block_type": [
+            "SparseResBlockS2C3d",
+            "SparseResBlockS2C3d",
+            "SparseResBlockS2C3d",
+            "SparseResBlockS2C3d"
+        ],
+        "block_args": [{}, {}, {}, {}, {}],
+        "use_fp16": true
+    }
+}

ckpts/shape_enc_next_dc_f16c32_fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f37c5ff5b983b68e9946060000f09bc131f3e84318a2c8b7430a81e4b4636c41
+size 708797208

ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+    "name": "SLatFlowModel",
+    "args": {
+        "resolution": 64,
+        "in_channels": 32,
+        "out_channels": 32,
+        "model_channels": 1536,
+        "cond_channels": 1024,
+        "num_blocks": 30,
+        "num_heads": 12,
+        "mlp_ratio": 5.3334,
+        "pe_mode": "rope",
+        "share_mod": true,
+        "initialization": "scaled",
+        "qk_rms_norm": true,
+        "qk_rms_norm_cross": true,
+        "dtype": "bfloat16"
+    }
+}

ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:07cd0596f634c5adc1890023d16023afc5eed02fb84b22bb23aff5bf0030fbbd
+size 2584574424

ckpts/slat_flow_img2shape_dit_1_3B_512_bf16.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+    "name": "SLatFlowModel",
+    "args": {
+        "resolution": 32,
+        "in_channels": 32,
+        "out_channels": 32,
+        "model_channels": 1536,
+        "cond_channels": 1024,
+        "num_blocks": 30,
+        "num_heads": 12,
+        "mlp_ratio": 5.3334,
+        "pe_mode": "rope",
+        "share_mod": true,
+        "initialization": "scaled",
+        "qk_rms_norm": true,
+        "qk_rms_norm_cross": true,
+        "dtype": "bfloat16"
+    }
+}

ckpts/slat_flow_img2shape_dit_1_3B_512_bf16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec5e0917ef9b7e25ad51dffc7d19687a42019871f94239f2fa7f86264c55b70f
+size 2584574424

ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+    "name": "SLatFlowModel",
+    "args": {
+        "resolution": 64,
+        "in_channels": 64,
+        "out_channels": 32,
+        "model_channels": 1536,
+        "cond_channels": 1024,
+        "num_blocks": 30,
+        "num_heads": 12,
+        "mlp_ratio": 5.3334,
+        "pe_mode": "rope",
+        "share_mod": true,
+        "initialization": "scaled",
+        "qk_rms_norm": true,
+        "qk_rms_norm_cross": true,
+        "dtype": "bfloat16"
+    }
+}

ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:580401269059a339b8318ab9ced459a13ba63391721c83a6c383198c29e77686
+size 2584672728

ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+    "name": "SLatFlowModel",
+    "args": {
+        "resolution": 32,
+        "in_channels": 64,
+        "out_channels": 32,
+        "model_channels": 1536,
+        "cond_channels": 1024,
+        "num_blocks": 30,
+        "num_heads": 12,
+        "mlp_ratio": 5.3334,
+        "pe_mode": "rope",
+        "share_mod": true,
+        "initialization": "scaled",
+        "qk_rms_norm": true,
+        "qk_rms_norm_cross": true,
+        "dtype": "bfloat16"
+    }
+}

ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8371aa1c5d13be79dcd5ddfd2cf3835e902e204dc34427169a1c702828e1a94d
+size 2584672728

ckpts/ss_flow_img_dit_1_3B_64_bf16.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+    "name": "SparseStructureFlowModel",
+    "args": {
+        "resolution": 16,
+        "in_channels": 8,
+        "out_channels": 8,
+        "model_channels": 1536,
+        "cond_channels": 1024,
+        "num_blocks": 30,
+        "num_heads": 12,
+        "mlp_ratio": 5.3334,
+        "pe_mode": "rope",
+        "share_mod": true,
+        "initialization": "scaled",
+        "qk_rms_norm": true,
+        "qk_rms_norm_cross": true,
+        "dtype": "bfloat16"
+    }
+}

ckpts/ss_flow_img_dit_1_3B_64_bf16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ca01377c485bec418076d38ee80166d32dc776d744f2553b835cba1e97a7abf6
+size 2584426920

ckpts/tex_dec_next_dc_f16c32_fp16.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+    "name": "SparseUnetVaeDecoder",
+    "args": {
+        "out_channels": 6,
+        "model_channels": [1024, 512, 256, 128, 64],
+        "latent_channels": 32,
+        "num_blocks": [4, 16, 8, 4, 0],
+        "block_type": [
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d"
+        ],
+        "up_block_type": [
+            "SparseResBlockC2S3d",
+            "SparseResBlockC2S3d",
+            "SparseResBlockC2S3d",
+            "SparseResBlockC2S3d"
+        ],
+        "block_args": [{}, {}, {}, {}, {}],
+        "pred_subdiv": false,
+        "use_fp16": true
+    }
+}

ckpts/tex_dec_next_dc_f16c32_fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:97ea69addea2ecd9312910f5f548234665eef51c088386180b7cd5b258645e3c
+size 948458812

ckpts/tex_enc_next_dc_f16c32_fp16.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+    "name": "SparseUnetVaeEncoder",
+    "args": {
+        "in_channels": 6,
+        "model_channels": [64, 128, 256, 512, 1024],
+        "latent_channels": 32,
+        "num_blocks": [0, 4, 8, 16, 4],
+        "block_type": [
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d"
+        ],
+        "up_block_type": [
+            "SparseResBlockS2C3d",
+            "SparseResBlockS2C3d",
+            "SparseResBlockS2C3d",
+            "SparseResBlockS2C3d"
+        ],
+        "block_args": [{}, {}, {}, {}, {}],
+        "use_fp16": true
+    }
+}

ckpts/tex_enc_next_dc_f16c32_fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd109f75f84b90fa411554ed6b0e4a87f430841163156fc0ebda2ebdc4752493
+size 708797208

pipeline.json ADDED Viewed

	@@ -0,0 +1,95 @@

+{
+    "name": "Trellis2ImageTo3DPipeline",
+    "args": {
+        "models": {
+            "sparse_structure_decoder": "microsoft/TRELLIS-image-large/ckpts/ss_dec_conv3d_16l8_fp16",
+            "sparse_structure_flow_model": "ckpts/ss_flow_img_dit_1_3B_64_bf16",
+            "shape_slat_decoder": "ckpts/shape_dec_next_dc_f16c32_fp16",
+            "shape_slat_flow_model_512": "ckpts/slat_flow_img2shape_dit_1_3B_512_bf16",
+            "shape_slat_flow_model_1024": "ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16",
+            "tex_slat_decoder": "ckpts/tex_dec_next_dc_f16c32_fp16",
+            "tex_slat_flow_model_512": "ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16",
+            "tex_slat_flow_model_1024": "ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16"
+        },
+        "sparse_structure_sampler": {
+            "name": "FlowEulerGuidanceIntervalSampler",
+            "args": {
+                "sigma_min": 1e-5
+            },
+            "params": {
+                "steps": 12,
+                "guidance_strength": 7.5,
+                "guidance_rescale": 0.7,
+                "guidance_interval": [0.6, 1.0],
+                "rescale_t": 5.0
+            }
+        },
+        "shape_slat_sampler": {
+            "name": "FlowEulerGuidanceIntervalSampler",
+            "args": {
+                "sigma_min": 1e-5
+            },
+            "params": {
+                "steps": 12,
+                "guidance_strength": 7.5,
+                "guidance_rescale": 0.5,
+                "guidance_interval": [0.6, 1.0],
+                "rescale_t": 3.0
+            }
+        },
+        "shape_slat_normalization": {
+            "mean": [
+                0.781296, 0.018091, -0.495192, -0.558457, 1.060530, 0.093252, 1.518149, -0.933218,
+                -0.732996, 2.604095, -0.118341, -2.143904, 0.495076, -2.179512, -2.130751, -0.996944,
+                0.261421, -2.217463, 1.260067, -0.150213, 3.790713, 1.481266, -1.046058, -1.523667,
+                -0.059621, 2.220780, 1.621212, 0.877230, 0.567247, -3.175944, -3.186688, 1.578665
+            ],
+            "std": [
+                5.972266, 4.706852, 5.445010, 5.209927, 5.320220, 4.547237, 5.020802, 5.444004,
+                5.226681, 5.683095, 4.831436, 5.286469, 5.652043, 5.367606, 5.525084, 4.730578,
+                4.805265, 5.124013, 5.530808, 5.619001, 5.103930, 5.417670, 5.269677, 5.547194,
+                5.634698, 5.235274, 6.110351, 5.511298, 6.237273, 4.879207, 5.347008, 5.405691
+            ]
+        },
+        "tex_slat_sampler": {
+            "name": "FlowEulerGuidanceIntervalSampler",
+            "args": {
+                "sigma_min": 1e-5
+            },
+            "params": {
+                "steps": 12,
+                "guidance_strength": 1.0,
+                "guidance_rescale": 0.0,
+                "guidance_interval": [0.6, 0.9],
+                "rescale_t": 3.0
+            }
+        },
+        "tex_slat_normalization": {
+            "mean": [
+                3.501659, 2.212398, 2.226094, 0.251093, -0.026248, -0.687364, 0.439898, -0.928075,
+                0.029398, -0.339596, -0.869527, 1.038479, -0.972385, 0.126042, -1.129303, 0.455149,
+                -1.209521, 2.069067, 0.544735, 2.569128, -0.323407, 2.293000, -1.925608, -1.217717,
+                1.213905, 0.971588, -0.023631, 0.106750, 2.021786, 0.250524, -0.662387, -0.768862
+            ],
+            "std": [
+                2.665652, 2.743913, 2.765121, 2.595319, 3.037293, 2.291316, 2.144656, 2.911822,
+                2.969419, 2.501689, 2.154811, 3.163343, 2.621215, 2.381943, 3.186697, 3.021588,
+                2.295916, 3.234985, 3.233086, 2.260140, 2.874801, 2.810596, 3.292720, 2.674999,
+                2.680878, 2.372054, 2.451546, 2.353556, 2.995195, 2.379849, 2.786195, 2.775190
+            ]
+        },
+        "image_cond_model": {
+            "name": "DinoV3FeatureExtractor",
+            "args": {
+                "model_name": "tao-hunter/dinov3-vitl16-pretrain-lvd1689m"
+            }
+        },
+        "rembg_model": {
+            "name": "BiRefNet",
+            "args": {
+                "model_name": "tao-hunter/RMBG-2.0"
+            }
+        },
+        "default_pipeline_type": "1024_cascade"
+    }
+}