| --- |
| license: cc-by-nc-4.0 |
| pipeline_tag: text-to-image |
| tags: |
| - depth-estimation |
| - text-to-3d |
| - diffusion |
| - flux |
| library_name: flux_rgbd |
| --- |
| |
| # Modality Forcing for Scalable Spatial Generation |
|
|
| Joint **text β RGB + depth** generation with a single diffusion |
| transformer, built on FLUX.2. Modality Forcing assigns separate noise levels |
| per modality during post-training, so one model supports joint generation |
| (text β RGB-D), image-to-depth, and depth-to-image at inference. |
|
|
| - π Paper: [arXiv:2606.13676](https://arxiv.org/abs/2606.13676) |
| - π» Code: [github.com/Duisterhof/modality-forcing](https://github.com/Duisterhof/modality-forcing) |
| - π Demo: [Hugging Face Space](https://huggingface.co/spaces/bartduis/modality_forcing) |
| - π Project page: [modality-forcing.github.io](https://modality-forcing.github.io/) |
|
|
| ## Files |
|
|
| | File | Description | |
| |------|-------------| |
| | `model.safetensors` | FluxRGBD DiT (12B total β 9B-class FLUX.2 backbone + depth streams, bf16) | |
| | `config.json` | Model variant config (`flux_rgbd_9b_v2`) | |
| | `ae_encoder.safetensors` / `ae_decoder.safetensors` | FLUX.2 autoencoder | |
|
|
| The Qwen3-8B text encoder is pulled separately from |
| [`Qwen/Qwen3-8B`](https://huggingface.co/Qwen/Qwen3-8B). |
|
|
| ## Usage |
|
|
| ```bash |
| git clone https://github.com/Duisterhof/modality-forcing.git |
| cd modality-forcing |
| bash install.sh |
| python scripts/joint.py --prompt "a cozy sunlit kitchen with wooden cabinets" |
| ``` |
|
|
| The scripts download these weights automatically (`bartduis/modality_forcing` |
| is the default `--model`). |
|
|
| ## License |
|
|
| The model weights are released under **CC BY-NC 4.0** (non-commercial). The |
| inference code is Apache-2.0; see the GitHub repository. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{duisterhof2026mofo, |
| title = {Modality Forcing for Scalable Spatial Generation}, |
| author = {Duisterhof, Bardienus Pieter and Ramanan, Deva and Ichnowski, Jeffrey and Johnson, Justin and Park, Keunhong}, |
| journal = {arXiv preprint arXiv:2606.13676}, |
| year = {2026} |
| } |
| ``` |
|
|