tao-hunter commited on
Commit
2ecd3b2
·
verified ·
1 Parent(s): 9cbe5b0

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-to-3d
4
+ library_name: trellis2
5
+ language:
6
+ - en
7
+ ---
8
+
9
+ # TRELLIS.2: Native and Compact Structured Latents for 3D Generation
10
+
11
+ **Model Name:** TRELLIS.2-4B
12
+
13
+ **Paper:** [https://arxiv.org/abs/2512.14692](https://arxiv.org/abs/2512.14692)
14
+
15
+ **Repository:** [https://github.com/microsoft/TRELLIS.2](https://github.com/microsoft/TRELLIS.2)
16
+
17
+ **Project Page:** [https://microsoft.github.io/trellis.2](https://microsoft.github.io/trellis.2)
18
+
19
+ ## Introduction
20
+
21
+ **TRELLIS.2** is a state-of-the-art large 3D generative model designed for high-fidelity **image-to-3D** generation. It leverages a novel "field-free" sparse voxel structure termed **O-Voxel** and a large-scale flow-matching transformer (4 Billion parameters).
22
+
23
+ Unlike previous methods that rely on iso-surface fields (e.g., SDF, Flexicubes) which struggle with open surfaces or non-manifold geometry, TRELLIS can reconstruct and generate **arbitrary 3D assets** with complex topologies, sharp features, and full Physical-Based Rendering (PBR) materials—including transparency/translucency.
24
+
25
+ ## Model Details
26
+
27
+ * **Developed by:** Jianfeng Xiang, Xiaoxue Chen, Sicheng Xu, Ruicheng Wang, Zelong Lv, Yu Deng, Hongyuan Zhu, Yue Dong, Hao Zhao, Nicholas Jing Yuan, Jiaolong Yang
28
+ * **Model Type:** Flow-Matching Transformers with Sparse Voxel based 3D VAE
29
+ * **Parameters:** 4 Billion
30
+ * **Input:** Single Image
31
+ * **Output:** 3D Asset (Mesh with PBR Materials)
32
+ * **Resolution:** Varies from 512³ to 1536³ (Voxel Grid Resolution)
33
+
34
+ ## Key Features
35
+
36
+ * **O-Voxel Representation:** An omni-voxel structure that encodes both geometry and appearance. It supports:
37
+ * **Arbitrary Topology:** Handles open surfaces, non-manifold geometry, and fully-enclosed structures without lossy conversion.
38
+ * **Rich Appearance:** Captures PBR attributes (including opacity for translucent surfaces) aligned with geometry.
39
+ * **Efficiency:** Instant optimization-free bidirectional conversion between meshes and O-Voxels (ms to seconds).
40
+ * **High-Resolution Generation:** The model is trained to generate fully textured assets at **up to 1536³ resolution**.
41
+ * **High-Fidelity while Compact Latent Space:** Utilizes a Sparse 3D VAE with **16× spatial downsampling**, encoding a 1024³ asset into only ~9.6K latent tokens with negligible perceptual degradation.
42
+ * **Shape-conditioned Texture Generation:** Generates textures for input 3D meshes and reference images.
43
+ * **State-of-the-Art Speed:** Inference is highly efficient; see table below.
44
+
45
+ ## Inference Speed (NVIDIA H100 GPU)
46
+
47
+ | Resolution | Time |
48
+ | :--- | :--- |
49
+ | 512³ | ~3 seconds |
50
+ | 1024³ | ~17 seconds |
51
+ | 1536³ | ~60 seconds |
52
+
53
+ ## Requirements
54
+ - **System**: The model is currently tested only on **Linux**.
55
+ - **Hardware**: An NVIDIA GPU with at least 24GB of memory is necessary. The code has been verified on NVIDIA A100 and H100 GPUs.
56
+ - **Software**:
57
+ - The [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit-archive) is needed to compile certain packages. Recommended version is 12.4.
58
+ - [Conda](https://docs.anaconda.com/miniconda/install/#quick-command-line-install) is recommended for managing dependencies.
59
+ - Python version 3.8 or higher is required.
60
+
61
+ ## Known Limitations
62
+
63
+ * **Geometric Artifacts (Small Holes):** While O-Voxels handle complex topology well, the generated raw meshes may occasionally contain small holes or minor topological discontinuities. For applications requiring strictly watertight geometry (e.g., 3D printing), we provide accompanying mesh post-processing scripts, such as hole-filling algorithms.
64
+ * **Base Model w/o Alignment:** TRELLIS.2-4B is a pre-trained foundation model. It has **not** been aligned with human preferences (e.g., via RLHF) or fine-tuned for specific aesthetic standards. Consequently, the outputs reflect the distribution of the training data and may vary in style; users may need to experiment with inputs to achieve the desired artistic result.
65
+
66
+ We are actively working on improving the model and addressing these limitations.
67
+
68
+ ## Usage
69
+
70
+ *Note: Please refer to the official [GitHub Repository](https://github.com/microsoft/TRELLIS.2) for installation instructions and dependencies.*
71
+
72
+ ```python
73
+ import os
74
+ os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'
75
+ os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" # Can save GPU memory
76
+ import cv2
77
+ import imageio
78
+ from PIL import Image
79
+ import torch
80
+ from trellis2.pipelines import Trellis2ImageTo3DPipeline
81
+ from trellis2.utils import render_utils
82
+ from trellis2.renderers import EnvMap
83
+ import o_voxel
84
+
85
+ # 1. Setup Environment Map
86
+ envmap = EnvMap(torch.tensor(
87
+ cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB),
88
+ dtype=torch.float32, device='cuda'
89
+ ))
90
+
91
+ # 2. Load Pipeline
92
+ pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B")
93
+ pipeline.cuda()
94
+
95
+ # 3. Load Image & Run
96
+ image = Image.open("assets/example_image/T.png")
97
+ mesh = pipeline.run(image)[0]
98
+ mesh.simplify(16777216) # nvdiffrast limit
99
+
100
+ # 4. Render Video
101
+ video = render_utils.make_pbr_vis_frames(render_utils.render_video(mesh, envmap=envmap))
102
+ imageio.mimsave("sample.mp4", video, fps=15)
103
+
104
+ # 5. Export to GLB
105
+ glb = o_voxel.postprocess.to_glb(
106
+ vertices = mesh.vertices,
107
+ faces = mesh.faces,
108
+ attr_volume = mesh.attrs,
109
+ coords = mesh.coords,
110
+ attr_layout = mesh.layout,
111
+ voxel_size = mesh.voxel_size,
112
+ aabb = [[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
113
+ decimation_target = 1000000,
114
+ texture_size = 4096,
115
+ remesh = True,
116
+ remesh_band = 1,
117
+ remesh_project = 0,
118
+ verbose = True
119
+ )
120
+ glb.export("sample.glb", extension_webp=True)
121
+ ```
122
+
123
+ ## Citation
124
+
125
+ If you find this model useful for your research, please cite our work:
126
+
127
+ ```
128
+ @article{
129
+ xiang2025trellis2,
130
+ title={Native and Compact Structured Latents for 3D Generation},
131
+ author={Xiang, Jianfeng and Chen, Xiaoxue and Xu, Sicheng and Wang, Ruicheng and Lv, Zelong and Deng, Yu and Zhu, Hongyuan and Dong, Yue and Zhao, Hao and Yuan, Nicholas Jing and Yang, Jiaolong},
132
+ journal={Tech report},
133
+ year={2025}
134
+ }
135
+ ```
136
+
137
+ ## License
138
+
139
+ This model is released under the MIT License. The code and dataset are publicly released to facilitate reproduction and further research.
ckpts/shape_dec_next_dc_f16c32_fp16.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "FlexiDualGridVaeDecoder",
3
+ "args": {
4
+ "resolution": 256,
5
+ "model_channels": [1024, 512, 256, 128, 64],
6
+ "latent_channels": 32,
7
+ "num_blocks": [4, 16, 8, 4, 0],
8
+ "block_type": [
9
+ "SparseConvNeXtBlock3d",
10
+ "SparseConvNeXtBlock3d",
11
+ "SparseConvNeXtBlock3d",
12
+ "SparseConvNeXtBlock3d",
13
+ "SparseConvNeXtBlock3d"
14
+ ],
15
+ "up_block_type": [
16
+ "SparseResBlockC2S3d",
17
+ "SparseResBlockC2S3d",
18
+ "SparseResBlockC2S3d",
19
+ "SparseResBlockC2S3d"
20
+ ],
21
+ "block_args": [{}, {}, {}, {}, {}],
22
+ "use_fp16": true
23
+ }
24
+ }
ckpts/shape_dec_next_dc_f16c32_fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3b718d3e43e4f8780e9a24ac6fff231811a67e3b058e336e10fe654c911d581
3
+ size 948490494
ckpts/shape_enc_next_dc_f16c32_fp16.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "FlexiDualGridVaeEncoder",
3
+ "args": {
4
+ "resolution": 256,
5
+ "model_channels": [64, 128, 256, 512, 1024],
6
+ "latent_channels": 32,
7
+ "num_blocks": [0, 4, 8, 16, 4],
8
+ "block_type": [
9
+ "SparseConvNeXtBlock3d",
10
+ "SparseConvNeXtBlock3d",
11
+ "SparseConvNeXtBlock3d",
12
+ "SparseConvNeXtBlock3d",
13
+ "SparseConvNeXtBlock3d"
14
+ ],
15
+ "up_block_type": [
16
+ "SparseResBlockS2C3d",
17
+ "SparseResBlockS2C3d",
18
+ "SparseResBlockS2C3d",
19
+ "SparseResBlockS2C3d"
20
+ ],
21
+ "block_args": [{}, {}, {}, {}, {}],
22
+ "use_fp16": true
23
+ }
24
+ }
ckpts/shape_enc_next_dc_f16c32_fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f37c5ff5b983b68e9946060000f09bc131f3e84318a2c8b7430a81e4b4636c41
3
+ size 708797208
ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "SLatFlowModel",
3
+ "args": {
4
+ "resolution": 64,
5
+ "in_channels": 32,
6
+ "out_channels": 32,
7
+ "model_channels": 1536,
8
+ "cond_channels": 1024,
9
+ "num_blocks": 30,
10
+ "num_heads": 12,
11
+ "mlp_ratio": 5.3334,
12
+ "pe_mode": "rope",
13
+ "share_mod": true,
14
+ "initialization": "scaled",
15
+ "qk_rms_norm": true,
16
+ "qk_rms_norm_cross": true,
17
+ "dtype": "bfloat16"
18
+ }
19
+ }
ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:07cd0596f634c5adc1890023d16023afc5eed02fb84b22bb23aff5bf0030fbbd
3
+ size 2584574424
ckpts/slat_flow_img2shape_dit_1_3B_512_bf16.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "SLatFlowModel",
3
+ "args": {
4
+ "resolution": 32,
5
+ "in_channels": 32,
6
+ "out_channels": 32,
7
+ "model_channels": 1536,
8
+ "cond_channels": 1024,
9
+ "num_blocks": 30,
10
+ "num_heads": 12,
11
+ "mlp_ratio": 5.3334,
12
+ "pe_mode": "rope",
13
+ "share_mod": true,
14
+ "initialization": "scaled",
15
+ "qk_rms_norm": true,
16
+ "qk_rms_norm_cross": true,
17
+ "dtype": "bfloat16"
18
+ }
19
+ }
ckpts/slat_flow_img2shape_dit_1_3B_512_bf16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec5e0917ef9b7e25ad51dffc7d19687a42019871f94239f2fa7f86264c55b70f
3
+ size 2584574424
ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "SLatFlowModel",
3
+ "args": {
4
+ "resolution": 64,
5
+ "in_channels": 64,
6
+ "out_channels": 32,
7
+ "model_channels": 1536,
8
+ "cond_channels": 1024,
9
+ "num_blocks": 30,
10
+ "num_heads": 12,
11
+ "mlp_ratio": 5.3334,
12
+ "pe_mode": "rope",
13
+ "share_mod": true,
14
+ "initialization": "scaled",
15
+ "qk_rms_norm": true,
16
+ "qk_rms_norm_cross": true,
17
+ "dtype": "bfloat16"
18
+ }
19
+ }
ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:580401269059a339b8318ab9ced459a13ba63391721c83a6c383198c29e77686
3
+ size 2584672728
ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "SLatFlowModel",
3
+ "args": {
4
+ "resolution": 32,
5
+ "in_channels": 64,
6
+ "out_channels": 32,
7
+ "model_channels": 1536,
8
+ "cond_channels": 1024,
9
+ "num_blocks": 30,
10
+ "num_heads": 12,
11
+ "mlp_ratio": 5.3334,
12
+ "pe_mode": "rope",
13
+ "share_mod": true,
14
+ "initialization": "scaled",
15
+ "qk_rms_norm": true,
16
+ "qk_rms_norm_cross": true,
17
+ "dtype": "bfloat16"
18
+ }
19
+ }
ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8371aa1c5d13be79dcd5ddfd2cf3835e902e204dc34427169a1c702828e1a94d
3
+ size 2584672728
ckpts/ss_flow_img_dit_1_3B_64_bf16.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "SparseStructureFlowModel",
3
+ "args": {
4
+ "resolution": 16,
5
+ "in_channels": 8,
6
+ "out_channels": 8,
7
+ "model_channels": 1536,
8
+ "cond_channels": 1024,
9
+ "num_blocks": 30,
10
+ "num_heads": 12,
11
+ "mlp_ratio": 5.3334,
12
+ "pe_mode": "rope",
13
+ "share_mod": true,
14
+ "initialization": "scaled",
15
+ "qk_rms_norm": true,
16
+ "qk_rms_norm_cross": true,
17
+ "dtype": "bfloat16"
18
+ }
19
+ }
ckpts/ss_flow_img_dit_1_3B_64_bf16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca01377c485bec418076d38ee80166d32dc776d744f2553b835cba1e97a7abf6
3
+ size 2584426920
ckpts/tex_dec_next_dc_f16c32_fp16.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "SparseUnetVaeDecoder",
3
+ "args": {
4
+ "out_channels": 6,
5
+ "model_channels": [1024, 512, 256, 128, 64],
6
+ "latent_channels": 32,
7
+ "num_blocks": [4, 16, 8, 4, 0],
8
+ "block_type": [
9
+ "SparseConvNeXtBlock3d",
10
+ "SparseConvNeXtBlock3d",
11
+ "SparseConvNeXtBlock3d",
12
+ "SparseConvNeXtBlock3d",
13
+ "SparseConvNeXtBlock3d"
14
+ ],
15
+ "up_block_type": [
16
+ "SparseResBlockC2S3d",
17
+ "SparseResBlockC2S3d",
18
+ "SparseResBlockC2S3d",
19
+ "SparseResBlockC2S3d"
20
+ ],
21
+ "block_args": [{}, {}, {}, {}, {}],
22
+ "pred_subdiv": false,
23
+ "use_fp16": true
24
+ }
25
+ }
ckpts/tex_dec_next_dc_f16c32_fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97ea69addea2ecd9312910f5f548234665eef51c088386180b7cd5b258645e3c
3
+ size 948458812
ckpts/tex_enc_next_dc_f16c32_fp16.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "SparseUnetVaeEncoder",
3
+ "args": {
4
+ "in_channels": 6,
5
+ "model_channels": [64, 128, 256, 512, 1024],
6
+ "latent_channels": 32,
7
+ "num_blocks": [0, 4, 8, 16, 4],
8
+ "block_type": [
9
+ "SparseConvNeXtBlock3d",
10
+ "SparseConvNeXtBlock3d",
11
+ "SparseConvNeXtBlock3d",
12
+ "SparseConvNeXtBlock3d",
13
+ "SparseConvNeXtBlock3d"
14
+ ],
15
+ "up_block_type": [
16
+ "SparseResBlockS2C3d",
17
+ "SparseResBlockS2C3d",
18
+ "SparseResBlockS2C3d",
19
+ "SparseResBlockS2C3d"
20
+ ],
21
+ "block_args": [{}, {}, {}, {}, {}],
22
+ "use_fp16": true
23
+ }
24
+ }
ckpts/tex_enc_next_dc_f16c32_fp16.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd109f75f84b90fa411554ed6b0e4a87f430841163156fc0ebda2ebdc4752493
3
+ size 708797208
pipeline.json ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "Trellis2ImageTo3DPipeline",
3
+ "args": {
4
+ "models": {
5
+ "sparse_structure_decoder": "microsoft/TRELLIS-image-large/ckpts/ss_dec_conv3d_16l8_fp16",
6
+ "sparse_structure_flow_model": "ckpts/ss_flow_img_dit_1_3B_64_bf16",
7
+ "shape_slat_decoder": "ckpts/shape_dec_next_dc_f16c32_fp16",
8
+ "shape_slat_flow_model_512": "ckpts/slat_flow_img2shape_dit_1_3B_512_bf16",
9
+ "shape_slat_flow_model_1024": "ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16",
10
+ "tex_slat_decoder": "ckpts/tex_dec_next_dc_f16c32_fp16",
11
+ "tex_slat_flow_model_512": "ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16",
12
+ "tex_slat_flow_model_1024": "ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16"
13
+ },
14
+ "sparse_structure_sampler": {
15
+ "name": "FlowEulerGuidanceIntervalSampler",
16
+ "args": {
17
+ "sigma_min": 1e-5
18
+ },
19
+ "params": {
20
+ "steps": 12,
21
+ "guidance_strength": 7.5,
22
+ "guidance_rescale": 0.7,
23
+ "guidance_interval": [0.6, 1.0],
24
+ "rescale_t": 5.0
25
+ }
26
+ },
27
+ "shape_slat_sampler": {
28
+ "name": "FlowEulerGuidanceIntervalSampler",
29
+ "args": {
30
+ "sigma_min": 1e-5
31
+ },
32
+ "params": {
33
+ "steps": 12,
34
+ "guidance_strength": 7.5,
35
+ "guidance_rescale": 0.5,
36
+ "guidance_interval": [0.6, 1.0],
37
+ "rescale_t": 3.0
38
+ }
39
+ },
40
+ "shape_slat_normalization": {
41
+ "mean": [
42
+ 0.781296, 0.018091, -0.495192, -0.558457, 1.060530, 0.093252, 1.518149, -0.933218,
43
+ -0.732996, 2.604095, -0.118341, -2.143904, 0.495076, -2.179512, -2.130751, -0.996944,
44
+ 0.261421, -2.217463, 1.260067, -0.150213, 3.790713, 1.481266, -1.046058, -1.523667,
45
+ -0.059621, 2.220780, 1.621212, 0.877230, 0.567247, -3.175944, -3.186688, 1.578665
46
+ ],
47
+ "std": [
48
+ 5.972266, 4.706852, 5.445010, 5.209927, 5.320220, 4.547237, 5.020802, 5.444004,
49
+ 5.226681, 5.683095, 4.831436, 5.286469, 5.652043, 5.367606, 5.525084, 4.730578,
50
+ 4.805265, 5.124013, 5.530808, 5.619001, 5.103930, 5.417670, 5.269677, 5.547194,
51
+ 5.634698, 5.235274, 6.110351, 5.511298, 6.237273, 4.879207, 5.347008, 5.405691
52
+ ]
53
+ },
54
+ "tex_slat_sampler": {
55
+ "name": "FlowEulerGuidanceIntervalSampler",
56
+ "args": {
57
+ "sigma_min": 1e-5
58
+ },
59
+ "params": {
60
+ "steps": 12,
61
+ "guidance_strength": 1.0,
62
+ "guidance_rescale": 0.0,
63
+ "guidance_interval": [0.6, 0.9],
64
+ "rescale_t": 3.0
65
+ }
66
+ },
67
+ "tex_slat_normalization": {
68
+ "mean": [
69
+ 3.501659, 2.212398, 2.226094, 0.251093, -0.026248, -0.687364, 0.439898, -0.928075,
70
+ 0.029398, -0.339596, -0.869527, 1.038479, -0.972385, 0.126042, -1.129303, 0.455149,
71
+ -1.209521, 2.069067, 0.544735, 2.569128, -0.323407, 2.293000, -1.925608, -1.217717,
72
+ 1.213905, 0.971588, -0.023631, 0.106750, 2.021786, 0.250524, -0.662387, -0.768862
73
+ ],
74
+ "std": [
75
+ 2.665652, 2.743913, 2.765121, 2.595319, 3.037293, 2.291316, 2.144656, 2.911822,
76
+ 2.969419, 2.501689, 2.154811, 3.163343, 2.621215, 2.381943, 3.186697, 3.021588,
77
+ 2.295916, 3.234985, 3.233086, 2.260140, 2.874801, 2.810596, 3.292720, 2.674999,
78
+ 2.680878, 2.372054, 2.451546, 2.353556, 2.995195, 2.379849, 2.786195, 2.775190
79
+ ]
80
+ },
81
+ "image_cond_model": {
82
+ "name": "DinoV3FeatureExtractor",
83
+ "args": {
84
+ "model_name": "tao-hunter/dinov3-vitl16-pretrain-lvd1689m"
85
+ }
86
+ },
87
+ "rembg_model": {
88
+ "name": "BiRefNet",
89
+ "args": {
90
+ "model_name": "tao-hunter/RMBG-2.0"
91
+ }
92
+ },
93
+ "default_pipeline_type": "1024_cascade"
94
+ }
95
+ }