Image-to-3D
Yang2001 commited on
Commit
52618c7
Β·
verified Β·
1 Parent(s): 0005834

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +204 -11
README.md CHANGED
@@ -1,18 +1,18 @@
1
  ---
2
  license: mit
3
- license_name: pixal3d-license
4
  license_link: LICENSE
5
  extra_gated_eu_disallowed: true
6
  pipeline_tag: image-to-3d
7
  ---
8
 
 
9
  <div align="center">
10
 
11
  # Pixal3D: Pixel-Aligned 3D Generation from Images
12
 
13
  <h3>SIGGRAPH 2026</h3>
14
 
15
- [Dong-Yang Li](https://ldyang694.github.io/)ΒΉ Β· [Wang Zhao](https://thuzhaowang.github.io/)Β²* Β· [Yuxin Chen](https://orcid.org/0000-0002-7854-1072)Β² Β· [Wenbo Hu](https://wbhu.github.io/)Β² Β· [Meng-Hao Guo](https://menghaoguo.github.io/)ΒΉ Β· [Fang-Lue Zhang](https://fanglue.github.io/)Β³ Β· [Ying Shan](https://www.linkedin.com/in/YingShanProfile)Β² Β· [Shi-Min Hu](https://cg.cs.tsinghua.edu.cn/shimin.htm)ΒΉβœ‰
16
 
17
  ΒΉTsinghua University (BNRist) &nbsp;&nbsp; Β²Tencent ARC Lab &nbsp;&nbsp; Β³Victoria University of Wellington
18
 
@@ -23,16 +23,19 @@ pipeline_tag: image-to-3d
23
  <div align="center">
24
  <a href="https://ldyang694.github.io/projects/pixal3d/"><img src=https://img.shields.io/badge/Project%20Page-333399.svg?logo=googlehome height=22px></a>
25
  <a href="https://huggingface.co/spaces/TencentARC/Pixal3D"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Demo-276cb4.svg height=22px></a>
26
- <a href="https://github.com/TencentARC/Pixal3D"><img src=https://img.shields.io/badge/Code-Github-black.svg?logo=github height=22px></a>
27
  <a href="https://arxiv.org/abs/2605.10922"><img src=https://img.shields.io/badge/Arxiv-b5212f.svg?logo=arxiv height=22px></a>
 
28
  </div>
29
 
 
30
  **Pixal3D** generates high-fidelity 3D assets from a single image. Unlike previous methods that loosely inject image features via attention, Pixal3D explicitly lifts pixel features into 3D through back-projection, establishing direct pixel-to-3D correspondences. This enables near-reconstruction-level fidelity with detailed geometry and PBR textures.
31
 
32
  ---
33
 
34
  ## ✨ News
35
 
 
36
  - **May 2026**: Release the improved version based on [Trellis.2](https://github.com/microsoft/TRELLIS.2) backbone. πŸ’ͺ
37
  - **May 2026**: Release inference code and online demo. πŸ€—
38
  - **Apr 2026**: Our paper is accepted to SIGGRAPH 2026! πŸŽ‰
@@ -66,12 +69,22 @@ Please first follow the installation guide of [TRELLIS.2](https://github.com/mic
66
  pip install -r requirements.txt
67
  ```
68
 
69
- #### Step 3: Install utils3d
 
 
 
 
 
 
 
 
70
 
71
  ```bash
72
  pip install https://github.com/LDYang694/Storages/releases/download/20260430/utils3d-0.0.2-py3-none-any.whl
73
  ```
74
 
 
 
75
  ### Usage
76
 
77
  #### Inference
@@ -79,9 +92,30 @@ pip install https://github.com/LDYang694/Storages/releases/download/20260430/uti
79
  Generate a GLB mesh from a single image:
80
 
81
  ```bash
82
- python inference.py --image assets/test_image/0.png --output ./output.glb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
  ```
84
 
 
 
 
 
 
85
  ### Web Demo
86
 
87
  We provide a Gradio web demo for Pixal3D, which allows you to generate 3D meshes from images interactively.
@@ -90,9 +124,164 @@ We provide a Gradio web demo for Pixal3D, which allows you to generate 3D meshes
90
  python app.py
91
  ```
92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
  ## πŸ€— Acknowledgements
94
 
95
- This project is heavily built upon [Trellis.2](https://github.com/microsoft/TRELLIS.2) and [Direct3D-S2](https://github.com/DreamTechAI/Direct3D-S2). We also thank the following repos for their great contributions: [Trellis](https://github.com/microsoft/TRELLIS).
 
 
 
 
 
 
96
 
97
  ## πŸ“„ Citation
98
 
@@ -100,9 +289,13 @@ If you find this work useful, please consider citing:
100
 
101
  ```bibtex
102
  @article{li2026pixal3d,
103
- title = {Pixal3D: Pixel-Aligned 3D Generation from Images},
104
- author = {Li, Dong-Yang and Zhao, Wang and Chen, Yuxin and Hu, Wenbo and Guo, Meng-Hao and Zhang, Fang-Lue and Shan, Ying and Hu, Shi-Min},
105
- journal = {arXiv preprint arXiv:2605.10922},
106
- year = {2026}
107
  }
108
- ```
 
 
 
 
 
1
  ---
2
  license: mit
 
3
  license_link: LICENSE
4
  extra_gated_eu_disallowed: true
5
  pipeline_tag: image-to-3d
6
  ---
7
 
8
+
9
  <div align="center">
10
 
11
  # Pixal3D: Pixel-Aligned 3D Generation from Images
12
 
13
  <h3>SIGGRAPH 2026</h3>
14
 
15
+ <small>[Dong-Yang Li](https://ldyang694.github.io/)ΒΉ Β· [Wang Zhao](https://thuzhaowang.github.io/)Β²* Β· [Yuxin Chen](https://orcid.org/0000-0002-7854-1072)Β² Β· [Wenbo Hu](https://wbhu.github.io/)Β² Β· [Meng-Hao Guo](https://menghaoguo.github.io/)ΒΉ Β· [Fang-Lue Zhang](https://fanglue.github.io/)Β³ Β· [Ying Shan](https://www.linkedin.com/in/YingShanProfile)Β² Β· [Shi-Min Hu](https://cg.cs.tsinghua.edu.cn/shimin.htm)ΒΉβœ‰</small>
16
 
17
  ΒΉTsinghua University (BNRist) &nbsp;&nbsp; Β²Tencent ARC Lab &nbsp;&nbsp; Β³Victoria University of Wellington
18
 
 
23
  <div align="center">
24
  <a href="https://ldyang694.github.io/projects/pixal3d/"><img src=https://img.shields.io/badge/Project%20Page-333399.svg?logo=googlehome height=22px></a>
25
  <a href="https://huggingface.co/spaces/TencentARC/Pixal3D"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Demo-276cb4.svg height=22px></a>
26
+ <a href="https://huggingface.co/TencentARC/Pixal3D"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a>
27
  <a href="https://arxiv.org/abs/2605.10922"><img src=https://img.shields.io/badge/Arxiv-b5212f.svg?logo=arxiv height=22px></a>
28
+ <a href="LICENSE"><img src=https://img.shields.io/badge/License-MIT-yellow.svg height=22px></a>
29
  </div>
30
 
31
+
32
  **Pixal3D** generates high-fidelity 3D assets from a single image. Unlike previous methods that loosely inject image features via attention, Pixal3D explicitly lifts pixel features into 3D through back-projection, establishing direct pixel-to-3D correspondences. This enables near-reconstruction-level fidelity with detailed geometry and PBR textures.
33
 
34
  ---
35
 
36
  ## ✨ News
37
 
38
+ - **May 2026**: Release training code and data preparation toolkit. πŸ”§
39
  - **May 2026**: Release the improved version based on [Trellis.2](https://github.com/microsoft/TRELLIS.2) backbone. πŸ’ͺ
40
  - **May 2026**: Release inference code and online demo. πŸ€—
41
  - **Apr 2026**: Our paper is accepted to SIGGRAPH 2026! πŸŽ‰
 
69
  pip install -r requirements.txt
70
  ```
71
 
72
+ #### Step 3: Install natten
73
+
74
+ ```bash
75
+ NATTEN_CUDA_ARCH="xx" NATTEN_N_WORKERS=xx pip install natten==0.21.0 --no-build-isolation
76
+ ```
77
+
78
+ Please replace `xx` with the CUDA architecture and the number of build workers suitable for your machine.
79
+
80
+ #### Step 4: Install utils3d
81
 
82
  ```bash
83
  pip install https://github.com/LDYang694/Storages/releases/download/20260430/utils3d-0.0.2-py3-none-any.whl
84
  ```
85
 
86
+ > **Note**: `requirements-hfdemo.txt` is for the Hugging Face Spaces demo (H-series GPU architecture) and may not be compatible with other architectures.
87
+
88
  ### Usage
89
 
90
  #### Inference
 
92
  Generate a GLB mesh from a single image:
93
 
94
  ```bash
95
+ python inference.py --image assets/images/0_img.png --output ./output.glb
96
+ ```
97
+
98
+ **Low-VRAM mode** (reduces peak VRAM by loading models on-demand):
99
+
100
+ ```bash
101
+ python inference.py --image assets/images/0_img.png --output ./output.glb --low_vram
102
+ ```
103
+
104
+ By default, the pipeline resolution is **1536** (standard mode) or **1024** (low-VRAM mode). You can override this with `--resolution`:
105
+
106
+ ```bash
107
+ # Force 1536 even in low-VRAM mode
108
+ python inference.py --image assets/images/0_img.png --output ./output.glb --low_vram --resolution 1536
109
+
110
+ # Force 1024 in standard mode
111
+ python inference.py --image assets/images/0_img.png --output ./output.glb --resolution 1024
112
  ```
113
 
114
+ **Tip**: If you don't have `flash_attn` installed, you can use PyTorch's built-in SDPA backend instead:
115
+ > ```bash
116
+ > ATTN_BACKEND=sdpa python inference.py --image assets/images/0_img.png --output ./output.glb --low_vram
117
+ > ```
118
+
119
  ### Web Demo
120
 
121
  We provide a Gradio web demo for Pixal3D, which allows you to generate 3D meshes from images interactively.
 
124
  python app.py
125
  ```
126
 
127
+ Low-VRAM mode is also available for the web demo. The frontend default resolution will automatically switch to 1024 in low-VRAM mode (1536 otherwise), but can be changed manually in the UI.
128
+
129
+ ```bash
130
+ python app.py --low_vram
131
+ # or via environment variable:
132
+ LOW_VRAM=1 python app.py
133
+ ```
134
+ ## πŸ”§ Training
135
+
136
+ We provide the full training codebase for reproducing Pixal3D from scratch.
137
+
138
+ ### Data Preparation
139
+
140
+ Prepare view-aligned O-Voxel data and rendered condition images by following the data toolkit instructions:
141
+
142
+ > πŸ“‚ **[data_toolkit/README.md](data_toolkit/README.md)**
143
+
144
+ ### Overview
145
+
146
+ Pixal3D is trained as a three-stage cascade, each progressively increasing resolution:
147
+
148
+ | Stage | Model | Resolutions | Config Prefix |
149
+ |-------|-------|-------------|---------------|
150
+ | 1 | Sparse Structure | 32 β†’ 64 | `ss_flow_img_dit_*_proj_finetune` |
151
+ | 2 | Shape | 256 β†’ 512 β†’ 1024 | `slat_flow_img2shape_*_proj_finetune` |
152
+ | 3 | Texture | 256 β†’ 512 β†’ 1024 | `slat_flow_imgshape2tex_*_proj_finetune` |
153
+
154
+ All stages use **pixel-aligned projection conditioning** and **view-aligned latents** (2 views by default). Within each stage, start from the lowest resolution and progressively fine-tune to higher resolutions by setting `finetune_ckpt` in the config.
155
+
156
+ ### Quick Start
157
+
158
+ ```sh
159
+ python train.py \
160
+ --config <CONFIG_JSON> \
161
+ --output_dir <OUTPUT_DIR> \
162
+ --data_dir '<DATA_DIR_JSON>'
163
+ ```
164
+
165
+ `--data_dir` is a JSON string describing the dataset layout. Different stages require different keys:
166
+
167
+ | Stage | Required keys |
168
+ |-------|---------------|
169
+ | Sparse Structure | `base`, `ss_latent`, `render_cond` |
170
+ | Shape | `base`, `shape_latent`, `render_cond` |
171
+ | Texture | `base`, `shape_latent`, `pbr_latent`, `render_cond` |
172
+
173
+ ### Example: Training All Three Stages
174
+
175
+ Below we show the full training sequence using ObjaverseXL as an example. Each higher-resolution step requires updating `finetune_ckpt` in its config JSON to point to the previous checkpoint.
176
+
177
+ <details>
178
+ <summary><b>Stage 1: Sparse Structure (32 β†’ 64)</b></summary>
179
+
180
+ ```sh
181
+ # Resolution 32
182
+ python train.py \
183
+ --config configs/gen/ss_flow_img_dit_1_3B_32_bf16_proj_finetune.json \
184
+ --output_dir results/ss_32 \
185
+ --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "ss_latent": "datasets/ObjaverseXL_sketchfab/ss_latents/ss_enc_conv3d_16l8_fp16_64_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'
186
+
187
+ # Resolution 64 (set finetune_ckpt β†’ results/ss_32 checkpoint)
188
+ python train.py \
189
+ --config configs/gen/ss_flow_img_dit_1_3B_32_bf16_proj_finetune_ft64.json \
190
+ --output_dir results/ss_ft64 \
191
+ --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "ss_latent": "datasets/ObjaverseXL_sketchfab/ss_latents/ss_enc_conv3d_16l8_fp16_64_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'
192
+ ```
193
+ </details>
194
+
195
+ <details>
196
+ <summary><b>Stage 2: Shape (256 β†’ 512 β†’ 1024)</b></summary>
197
+
198
+ ```sh
199
+ # Resolution 256
200
+ python train.py \
201
+ --config configs/gen/slat_flow_img2shape_dit_1_3B_256_bf16_proj_finetune.json \
202
+ --output_dir results/shape_256 \
203
+ --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_256_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'
204
+
205
+ # Resolution 512
206
+ python train.py \
207
+ --config configs/gen/slat_flow_img2shape_dit_1_3B_256_bf16_proj_finetune_ft512.json \
208
+ --output_dir results/shape_ft512 \
209
+ --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_512_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'
210
+
211
+ # Resolution 1024
212
+ python train.py \
213
+ --config configs/gen/slat_flow_img2shape_dit_1_3B_512_bf16_proj_finetune_ft1024.json \
214
+ --output_dir results/shape_ft1024 \
215
+ --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_1024_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'
216
+ ```
217
+ </details>
218
+
219
+ <details>
220
+ <summary><b>Stage 3: Texture (256 β†’ 512 β†’ 1024)</b></summary>
221
+
222
+ ```sh
223
+ # Resolution 256
224
+ python train.py \
225
+ --config configs/gen/slat_flow_imgshape2tex_dit_1_3B_256_bf16_proj_finetune.json \
226
+ --output_dir results/tex_256 \
227
+ --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_256_view", "pbr_latent": "datasets/ObjaverseXL_sketchfab/pbr_latents/tex_enc_next_dc_f16c32_fp16_256_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'
228
+
229
+ # Resolution 512
230
+ python train.py \
231
+ --config configs/gen/slat_flow_imgshape2tex_dit_1_3B_512_bf16_proj_finetune.json \
232
+ --output_dir results/tex_512 \
233
+ --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_512_view", "pbr_latent": "datasets/ObjaverseXL_sketchfab/pbr_latents/tex_enc_next_dc_f16c32_fp16_512_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'
234
+
235
+ # Resolution 1024
236
+ python train.py \
237
+ --config configs/gen/slat_flow_imgshape2tex_dit_1_3B_512_bf16_proj_finetune_ft1024.json \
238
+ --output_dir results/tex_ft1024 \
239
+ --data_dir '{"ObjaverseXL_sketchfab": {"base": "datasets/ObjaverseXL_sketchfab", "shape_latent": "datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_1024_view", "pbr_latent": "datasets/ObjaverseXL_sketchfab/pbr_latents/tex_enc_next_dc_f16c32_fp16_1024_view", "render_cond": "datasets/ObjaverseXL_sketchfab/renders_cond"}}'
240
+ ```
241
+ </details>
242
+
243
+ ### Additional Options
244
+
245
+ <details>
246
+ <summary><b>All command-line arguments</b></summary>
247
+
248
+ | Argument | Description | Default |
249
+ |----------|-------------|---------|
250
+ | `--config` | Config JSON path | *required* |
251
+ | `--output_dir` | Output directory | *required* |
252
+ | `--data_dir` | Dataset JSON string | `./data/` |
253
+ | `--load_dir` | Checkpoint load directory | `output_dir` |
254
+ | `--ckpt` | Resume from step | `latest` |
255
+ | `--auto_retry` | Retries on failure | `3` |
256
+ | `--tryrun` | Dry run | `false` |
257
+ | `--profile` | Profiling | `false` |
258
+ | `--num_nodes` | Number of nodes | `1` |
259
+ | `--node_rank` | Current node rank | `0` |
260
+ | `--num_gpus` | GPUs per node | all |
261
+ | `--master_addr` | Master address | `localhost` |
262
+ | `--master_port` | Master port | `12666` |
263
+ | `--use_wandb` | Enable W&B logging | `false` |
264
+ | `--wandb_project` | W&B project | `trellis2-training` |
265
+ | `--wandb_name` | W&B run name | basename of `output_dir` |
266
+ | `--wandb_id` | W&B run ID (resume) | β€” |
267
+
268
+ </details>
269
+
270
+ ## 🌐 Community Projects
271
+
272
+ We thank the community for building extensions and deployment guides for Pixal3D!
273
+
274
+ - [Pixal3D-ComfyUI](https://github.com/Saganaki22/Pixal3D-ComfyUI) β€” ComfyUI integration with deployment guides for Windows, WSL, and more.
275
+
276
  ## πŸ€— Acknowledgements
277
 
278
+ This project is heavily built upon [Trellis.2](https://github.com/microsoft/TRELLIS.2) and [Direct3D-S2](https://github.com/DreamTechAI/Direct3D-S2). We sincerely thank the authors for their outstanding work on scalable 3D generation , which serves as the foundation of our codebase and model architecture.
279
+
280
+ We also thank the following repos for their great contributions:
281
+
282
+ - [Direct3D-S2](https://github.com/DreamTechAI/Direct3D-S2)
283
+ - [Trellis](https://github.com/microsoft/TRELLIS)
284
+ - [Trellis.2](https://github.com/microsoft/TRELLIS.2)
285
 
286
  ## πŸ“„ Citation
287
 
 
289
 
290
  ```bibtex
291
  @article{li2026pixal3d,
292
+ title={Pixal3D: Pixel-Aligned 3D Generation from Images},
293
+ author={Li, Dong-Yang and Zhao, Wang and Chen, Yuxin and Hu, Wenbo and Guo, Meng-Hao and Zhang, Fang-Lue and Shan, Ying and Hu, Shi-Min},
294
+ journal={arXiv preprint arXiv:2605.10922},
295
+ year={2026}
296
  }
297
+ ```
298
+
299
+ ## πŸ“œ License
300
+
301
+ This project is released under the [MIT License](LICENSE). The third-party components included in this project remain licensed under their respective original terms; see [NOTICE](NOTICE) for the full list of dependencies and their licenses.