Image-to-3D
Diffusers
Safetensors
SpatialGenDiffusionPipeline
File size: 5,768 Bytes
b78ed69
 
 
db553fa
 
 
 
 
b78ed69
db553fa
 
b78ed69
 
 
 
 
 
fa85ab7
 
 
 
b78ed69
 
 
cdee095
 
 
 
b78ed69
 
 
 
 
 
 
 
db553fa
 
fa85ab7
b78ed69
db553fa
b78ed69
 
 
 
 
db553fa
 
 
b78ed69
db553fa
 
 
b78ed69
 
 
 
 
db553fa
 
 
 
b78ed69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
db553fa
 
b78ed69
 
 
 
 
db553fa
b78ed69
 
 
 
 
db553fa
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
base_model:
- stabilityai/stable-diffusion-2-1
datasets:
- manycore-research/SpatialGen-Testset
license: creativeml-openrail-m
pipeline_tag: image-to-3d
library_name: diffusers
---

# SpatialGen: Layout-guided 3D Indoor Scene Generation

<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->

<div align="center">
  <picture>
    <source srcset="https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/myrWYVNd4m-DuxV39VQZ0.png" media="(prefers-color-scheme: dark)">
    <img src="https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/QQvDtmokH4ZjwH0wppqFC.png" width="60%" alt="SpatialLM""/>
  </picture>
</div>
<hr style="margin-top: 0; margin-bottom: 8px;">
<div align="center" style="margin-top: 0; padding-top: 0; line-height: 1;">
    <a href="https://manycore-research.github.io/SpatialGen" target="_blank" style="margin: 2px;"><img alt="Project"
    src="https://img.shields.io/badge/🌐%20Project-SpatialGen-ffc107?color=42a5f5&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
    <a href="https://arxiv.org/abs/2509.14981" target="_blank" style="margin: 2px;"><img alt="arXiv"
    src="https://img.shields.io/badge/arXiv-SpatialGen-b31b1b?logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
    <a href="https://github.com/manycore-research/SpatialGen" target="_blank" style="margin: 2px;"><img alt="GitHub"
    src="https://img.shields.io/badge/GitHub-SpatialGen-24292e?logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
    <a href="https://huggingface.co/manycore-research/SpatialGen-1.0" target="_blank" style="margin: 2px;"><img alt="Hugging Face"
    src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SpatialGen-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
</div>

<div align="center">

| Image-to-Scene Results                   | Text-to-Scene Results                      |
| :--------------------------------------: | :----------------------------------------: |
| ![Img2Scene](https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/ksN5t8QEu3Iv6KhpsYsk6.png) | ![Text2Scene](https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/waCRa3kp01KAsKgmqS1bb.png) |

<p>TL;DR: Given a 3D semantic layout, SpatialGen can generate a 3D indoor scene conditioned on either a reference image (left) or a textual description (right) using a multi-view, multi-modal diffusion model.</p>
</div>

## ✨ News

- [Aug, 2025] Initial release of SpatialGen-1.0!
- [Sep, 2025] We release the paper of SpatialGen!

## πŸ“‹ Release Plan

- [x] Provide inference code of SpatialGen.
- [ ] Provide training instruction for SpatialGen.
- [ ] Release SpatialGen dataset.

## SpatialGen Models

<div align="center">

| **Model**                | **Download**                                                                        |
| :----------------------: | ----------------------------------------------------------------------------------- |
| SpatialGen-1.0           | [πŸ€— HuggingFace](https://huggingface.co/manycore-research/SpatialGen-1.0)           |
| FLUX.1-Layout-ControlNet | [πŸ€— HuggingFace](https://huggingface.co/manycore-research/FLUX.1-Layout-ControlNet) |

</div>

## Usage

### πŸ”§ Installation

Tested with the following environment:
* Python 3.10
* PyTorch 2.3.1
* CUDA Version 12.1

```bash
# clone the repository
git clone https://github.com/manycore-research/SpatialGen.git
cd SpatialGen

python -m venv .venv
source .venv/bin/activate

pip install -r requirements.txt
# Optional: fix the [flux inference bug](https://github.com/vllm-project/vllm/issues/4392)
pip install nvidia-cublas-cu12==12.4.5.8
```

### πŸ“Š Dataset

We provide [SpatialGen-Testset](https://huggingface.co/datasets/manycore-research/SpatialGen-Testset) with 48 rooms, which labeled with 3D layout and 4.8K rendered images (48 x 100 views, including RGB, normal, depth maps and semantic maps) for MVD inference.

### Inference

```bash
# Single image-to-3D Scene
bash scripts/infer_spatialgen_i2s.sh

# Text-to-image-to-3D Scene
# in captions/spatialgen_testset_captions.jsonl, we provide text prompts of different styles for each room, 
# choose a pair of scene_id and prompt to run the text2scene experiment
bash scripts/infer_spatialgen_t2s.sh
```

## License

[SpatialGen-1.0](https://huggingface.co/manycore-research/SpatialGen-1.0) is derived from [Stable-Diffusion-v2.1](https://github.com/Stability-AI/stablediffusion), which is licensed under the [CreativeML Open RAIL++-M License](https://github.com/Stability-AI/stablediffusion/blob/main/LICENSE-MODEL). [FLUX.1-Layout-ControlNet](https://huggingface.co/manycore-research/FLUX.1-Layout-ControlNet) is licensed under the [FLUX.1-dev Non-Commercial License](https://github.com/black-forest-labs/flux/blob/main/model_licenses/LICENSE-FLUX1-dev).

## Acknowledgements

We would like to thank the following projects that made this work possible:

[DiffSplat](https://github.com/chenguolin/DiffSplat) | [SD 2.1](https://github.com/Stability-AI/stablediffusion) | [TAESD](https://github.com/madebyollin/taesd) | [FLUX](https://github.com/black-forest-labs/flux/) | [SpatialLM](https://github.com/manycore-research/SpatialLM)

## Citation

```bibtex
@article{wu2024spatialgen,
  title={SPATIALGEN: Layout-guided 3D Indoor Scene Generation},
  author={Zhenqing Wu and Zhenxiong Tan and Guolin Chen and Wenbo Zhao and Xingyi Yang and Xiaofeng Wang and Jianmin Li and Bo Dai and Dahua Lin and Xinchao Wang},
  journal={arXiv preprint arXiv:2509.14981},
  year={2025}
}
```