Enhance model card with metadata, links, and usage
Browse filesThis PR significantly enhances the model card by:
- Adding `pipeline_tag: text-to-image` for better discoverability on the Hugging Face Hub.
- Specifying `library_name: diffusers` as the model's configuration (`_diffusers_version`, `SD3ControlNetModel`) indicates compatibility with the Diffusers library, enabling the automated 'how to use' widget.
- Linking to the official paper: [SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation](https://huggingface.co/papers/2511.16666).
- Including links to the project page (`https://henghuiding.com/SceneDesigner/`) and the GitHub repository (`https://github.com/FudanCVL/SceneDesigner`).
- Incorporating the full abstract and the "Quick Start" instructions for installation and running the Gradio demo from the original GitHub README.
Please review and merge this PR if everything looks good.
|
@@ -1,3 +1,75 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: text-to-image
|
| 4 |
+
library_name: diffusers
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation
|
| 8 |
+
|
| 9 |
+
This repository contains the model presented in the paper [SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation](https://huggingface.co/papers/2511.16666).
|
| 10 |
+
|
| 11 |
+
Project Page: https://henghuiding.com/SceneDesigner/
|
| 12 |
+
Code: https://github.com/FudanCVL/SceneDesigner
|
| 13 |
+
|
| 14 |
+

|
| 15 |
+
|
| 16 |
+
## Abstract
|
| 17 |
+
Controllable image generation has attracted increasing attention in recent years, enabling users to manipulate visual content such as identity and style. However, achieving simultaneous control over the 9D poses (location, size, and orientation) of multiple objects remains an open challenge. Despite recent progress, existing methods often suffer from limited controllability and degraded quality, falling short of comprehensive multi-object 9D pose control. To address these limitations, we propose SceneDesigner, a method for accurate and flexible multi-object 9-DoF pose manipulation. SceneDesigner incorporates a branched network to the pre-trained base model and leverages a new representation, CNOCS map, which encodes 9D pose information from the camera view. This representation exhibits strong geometric interpretation properties, leading to more efficient and stable training. To support training, we construct a new dataset, ObjectPose9D, which aggregates images from diverse sources along with 9D pose annotations. To further address data imbalance issues, particularly performance degradation on low-frequency poses, we introduce a two-stage training strategy with reinforcement learning, where the second stage fine-tunes the model using a reward-based objective on rebalanced data. At inference time, we propose Disentangled Object Sampling, a technique that mitigates insufficient object generation and concept confusion in complex multi-object scenes. Moreover, by integrating user-specific personalization weights, SceneDesigner enables customized pose control for reference subjects. Extensive qualitative and quantitative experiments demonstrate that SceneDesigner significantly outperforms existing approaches in both controllability and quality. Code is publicly available at this https URL .
|
| 18 |
+
|
| 19 |
+
## ⚙️ Quick Start
|
| 20 |
+
|
| 21 |
+
### 1. Installation
|
| 22 |
+
|
| 23 |
+
1. Install Python environment (recommended to use uv)
|
| 24 |
+
```bash
|
| 25 |
+
uv sync
|
| 26 |
+
```
|
| 27 |
+
Or alternatively:
|
| 28 |
+
```bash
|
| 29 |
+
pip install -r requirements.txt
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
2. Install Blender environment
|
| 33 |
+
```bash
|
| 34 |
+
cd render
|
| 35 |
+
python install.py
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
If the automatic installation script fails, you can install manually:
|
| 39 |
+
* First download [Blender](https://download.blender.org/release/Blender4.2/) and extract it to the `./render` directory
|
| 40 |
+
* Then locate the Blender Python path and install the Python dependencies for Blender, for example:
|
| 41 |
+
```bash
|
| 42 |
+
cd render
|
| 43 |
+
blender-4.2.8-linux-x64/4.2/python/bin/python3.11 -m pip install -r blender_requirements.txt
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
### 2. Download Checkpoints
|
| 47 |
+
|
| 48 |
+
1. Download the [SceneDesigner](https://huggingface.co/FudanCVL/SceneDesigner) weights to the `checkpoints` directory
|
| 49 |
+
2. Download the [Stable Diffusion 3.5](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) base model weights to the `checkpoints` directory
|
| 50 |
+
|
| 51 |
+
### 3. Run Demo
|
| 52 |
+
|
| 53 |
+
Launch the Gradio app:
|
| 54 |
+
```bash
|
| 55 |
+
python app.py \
|
| 56 |
+
--blender_path render/blender/blender \
|
| 57 |
+
--device cuda:0 \
|
| 58 |
+
--port 7861
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
- Adjust the 9D pose of the cube in the **Cube Controls** panel
|
| 62 |
+
- Enter text prompts in the **Generation Config** panel and click the **Generate Images** button to create images
|
| 63 |
+
|
| 64 |
+
## ✒️ Citation
|
| 65 |
+
|
| 66 |
+
If you find our work useful for your research and applications, please kindly cite using this BibTeX:
|
| 67 |
+
|
| 68 |
+
```latex
|
| 69 |
+
@inproceedings{SceneDesigner,
|
| 70 |
+
title={SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation},
|
| 71 |
+
author={Qin, Zhenyuan and Shuai, Xincheng and Ding, Henghui},
|
| 72 |
+
booktitle={NeurIPS},
|
| 73 |
+
year={2025}
|
| 74 |
+
}
|
| 75 |
+
```
|