--- language: - en license: mit pipeline_tag: image-to-3d arxiv: 2508.15769 tags: - 3d - scene-generation --- # SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass (3DV 2026) This repository contains the official PyTorch implementation of SceneGen, introduced in [SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass](https://huggingface.co/papers/2508.15769). **Now the Training, Inference Code, and Pretrained Models have all been released! Feel free to reach out for discussions!**
## ๐ŸŒŸ Resources [**Project Page**](https://mengmouxu.github.io/SceneGen/) ยท [**Paper**](https://arxiv.org/abs/2508.15769/) ยท [**Code**](https://github.com/Mengmouxu/SceneGen) ยท [**Checkpoints**](https://huggingface.co/haoningwu/SceneGen/) ## โฉ News - [2025.11] Evaluation code has been released. - [2025.11] Glad to share that SceneGen has been accepted to 3DV 2026. - [2025.9] Our training code and data processing code are released. - [2025.8] The inference code and checkpoints are released. - [2025.8] Our pre-print paper has been released on arXiv. ## ๐Ÿ“ฆ Installation & Pretrained Models ### Prerequisites - **Hardware**: An NVIDIA GPU with at least 16GB of memory is necessary. The code has been verified on NVIDIA A100 and RTX 3090 GPUs. - **Software**: - The [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit-archive) is needed to compile certain submodules. The code has been tested with CUDA versions 12.1. - Python version 3.8 or higher is required. ### Installation Steps 1. Clone the repo: ```sh git clone https://github.com/Mengmouxu/SceneGen.git cd SceneGen ``` 2. Install the dependencies: Create a new conda environment named `scenegen` and install the dependencies: ```sh . ./setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast --demo ``` The detailed usage of `setup.sh` can be found by running `. ./setup.sh --help`. ### Pretrained Models 1. First, create a directory in the SceneGen folder to store the checkpoints: ```sh mkdir -p checkpoints ``` 2. Download the pretrained models for **SAM2-Hiera-Large** and **VGGT-1B** from [SAM2](https://huggingface.co/facebook/sam2-hiera-large/) and [VGGT](https://huggingface.co/facebook/VGGT-1B/), then place them in the `checkpoints` directory. (**SAM2** installation and its checkpoints are required for interactive generation with segmentation.) 3. Download our pretrained SceneGen model from [here](https://huggingface.co/haoningwu/SceneGen/) and place it in the `checkpoints` directory as follows: ``` SceneGen/ โ”œโ”€โ”€ checkpoints/ โ”‚ โ”œโ”€โ”€ sam2-hiera-large โ”‚ โ”œโ”€โ”€ VGGT-1B โ”‚ โ””โ”€โ”€ scenegen | โ”œโ”€โ”€ckpts | โ””โ”€โ”€pipeline.json โ””โ”€โ”€ ... ``` ## ๐Ÿ’ก Inference We provide two scripts for inference: `inference.py` for batch processing and `interactive_demo.py` for an interactive Gradio demo. ### Interactive Demo This script launches a Gradio web interface for interactive scene generation. - **Features**: It uses SAM2 for interactive image segmentation, allows for adjusting various generation parameters, and supports scene generation from single or multiple images. - **Usage**: ```sh python interactive_demo.py ``` > ## ๐Ÿš€ Quick Start Guide > > ### ๐Ÿ“ท Step 1: Input & Segment > 1. **Upload your scene image.** > 2. **Use the mouse to draw bounding boxes** around objects. > 3. Click **"Run Segmentation"** to segment objects. > > *โ€ป For multi-image generation: maintain consistent object annotation order across all images.* > > ### ๐Ÿ—ƒ๏ธ Step 2: Manage Cache > 1. Click **"Add to Cache"** when satisfied with the segmentation. > 2. Repeat Steps 1-2 for multiple images. > 3. Use **"Delete Selected"** or **"Clear All"** to manage cached images. > > ### ๐ŸŽฎ Step 3: Generate Scene > 1. Adjust generation parameters (optional). > 2. Click **"Generate 3D Scene"**. > 3. Download the generated GLB file when ready. [Watch the demo video](https://github.com/user-attachments/assets/d0d53506-70cd-4bd3-a6ab-2f9b5b16f4d8) ### Pre-segmented Image Inference This script processes a directory of pre-segmented images. - **Input**: The input folder structure should be similar to `assets/masked_image_test`, containing segmented scene images. - **Visualization**: For scenes with ground truth data, you can use the `--gradio` flag to launch a Gradio interface that visualizes both the ground truth and the generated model. - **Usage**: ```sh python inference.py --gradio ``` ## ๐Ÿ“š Dataset To train and evaluate SceneGen, we use the [3D-FUTURE](https://tianchi.aliyun.com/dataset/98063) dataset. Please refer to the [GitHub repository](https://github.com/Mengmouxu/SceneGen#dataset) for detailed preprocessing and data handling instructions. ## ๐Ÿ‹๏ธโ€โ™‚๏ธ Training With the processed 3D-FUTURE dataset and the pretrained `ss_flow_img_dit_L_16l8_fp16.safetensors` model checkpoint from [TRELLIS](https://huggingface.co/microsoft/TRELLIS-image-large) correctly placed in the `checkpoints/scenegen/ckpts` directory, you can train SceneGen using the following command: ``` bash scripts/train.sh ``` ## ๐Ÿงช Evaluation To generate the 3D scenes on the 3D-FUTURE test set: ``` bash scenegen_eval.sh ``` To evaluate the trained model on the 3D-FUTURE test set: ``` cd evalscene bash eval_scenegen.sh ``` ## ๐Ÿ“œ Citation If you use this code and data for your research or project, please cite: ```bibtex @inproceedings{meng2026scenegen, author = {Meng, Yanxu and Wu, Haoning and Zhang, Ya and Xie, Weidi}, title = {SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass}, booktitle = {International Conference on 3D Vision 2026}, year = {2026}, } ``` ## Acknowledgements Many thanks to the code bases from [TRELLIS](https://github.com/microsoft/TRELLIS), [DINOv2](https://github.com/facebookresearch/dinov2), and [VGGT](https://github.com/facebookresearch/vggt). ## Contact If you have any questions, please feel free to contact [meng-mou-xu@sjtu.edu.cn](mailto:meng-mou-xu@sjtu.edu.cn) and [haoningwu3639@gmail.com](mailto:haoningwu3639@gmail.com).