metadata
base_model:
- stabilityai/stable-diffusion-2-1
datasets:
- manycore-research/SpatialGen-Testset
license: creativeml-openrail-m
pipeline_tag: image-to-3d
SpatialGen: Layout-guided 3D Indoor Scene Generation
TL;DR: Given a 3D semantic layout, SpatialGen can generate a 3D indoor scene conditioned on either a reference image (left) or a textual description (right) using a multi-view, multi-modal diffusion model.
β¨ News
- [Sep, 2025] We released the paper of SpatialGen!
- [Aug, 2025] Initial release of SpatialGen-1.0!
π Release Plan
- Provide inference code of SpatialGen.
- Provide training instruction for SpatialGen.
- Release SpatialGen dataset.
SpatialGen Models
| Model | Download |
|---|---|
| SpatialGen-1.0 | π€ HuggingFace |
| FLUX.1-Wireframe-dev-lora | π€ HuggingFace |
Usage
π§ Installation
Tested with the following environment:
- Python 3.10
- PyTorch 2.3.1
- CUDA Version 12.1
# clone the repository
git clone https://github.com/manycore-research/SpatialGen.git
cd SpatialGen
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Optional: fix the [flux inference bug](https://github.com/vllm-project/vllm/issues/4392)
pip install nvidia-cublas-cu12==12.4.5.8
π Dataset
We provide SpatialGen-Testset with 48 rooms, which labeled with 3D layout and 4.8K rendered images (48 x 100 views, including RGB, normal, depth maps and semantic maps) for MVD inference.
Inference
# Single image-to-3D Scene
bash scripts/infer_spatialgen_i2s.sh
# Text-to-image-to-3D Scene
# in captions/spatialgen_testset_captions.jsonl, we provide text prompts of different styles for each room,
# choose a pair of scene_id and prompt to run the text2scene experiment
bash scripts/infer_spatialgen_t2s.sh
License
SpatialGen-1.0 is derived from Stable-Diffusion-v2.1, which is licensed under the CreativeML Open RAIL++-M License.
Citation
@inproceedings{SpatialGen,
title = {SpatialGen: Layout-guided 3D Indoor Scene Generation},
author = {Fang, Chuan and Li, Heng and Liang, Yixu and Zheng, Jia and Mao, Yongsen and Liu, Yuan and Tang, Rui and Zhou, Zihan and Tan, Ping},
booktitle = {International Conference on 3D Vision},
year = {2026}
}
Acknowledgements
We would like to thank the following projects that made this work possible:

