File size: 1,994 Bytes
1da4105 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | ---
license: cc-by-nc-4.0
pipeline_tag: text-to-image
tags:
- depth-estimation
- text-to-3d
- diffusion
- flux
library_name: flux_rgbd
---
# Modality Forcing for Scalable Spatial Generation
Joint **text β RGB + depth** generation with a single diffusion
transformer, built on FLUX.2. Modality Forcing assigns separate noise levels
per modality during post-training, so one model supports joint generation
(text β RGB-D), image-to-depth, and depth-to-image at inference.
- π Paper: [arXiv:2606.13676](https://arxiv.org/abs/2606.13676)
- π» Code: [github.com/Duisterhof/modality-forcing](https://github.com/Duisterhof/modality-forcing)
- π Demo: [Hugging Face Space](https://huggingface.co/spaces/bartduis/modality_forcing)
- π Project page: [modality-forcing.github.io](https://modality-forcing.github.io/)
## Files
| File | Description |
|------|-------------|
| `model.safetensors` | FluxRGBD DiT (12B total β 9B-class FLUX.2 backbone + depth streams, bf16) |
| `config.json` | Model variant config (`flux_rgbd_9b_v2`) |
| `ae_encoder.safetensors` / `ae_decoder.safetensors` | FLUX.2 autoencoder |
The Qwen3-8B text encoder is pulled separately from
[`Qwen/Qwen3-8B`](https://huggingface.co/Qwen/Qwen3-8B).
## Usage
```bash
git clone https://github.com/Duisterhof/modality-forcing.git
cd modality-forcing
bash install.sh
python scripts/joint.py --prompt "a cozy sunlit kitchen with wooden cabinets"
```
The scripts download these weights automatically (`bartduis/modality_forcing`
is the default `--model`).
## License
The model weights are released under **CC BY-NC 4.0** (non-commercial). The
inference code is Apache-2.0; see the GitHub repository.
## Citation
```bibtex
@article{duisterhof2026mofo,
title = {Modality Forcing for Scalable Spatial Generation},
author = {Duisterhof, Bardienus Pieter and Ramanan, Deva and Ichnowski, Jeffrey and Johnson, Justin and Park, Keunhong},
journal = {arXiv preprint arXiv:2606.13676},
year = {2026}
}
```
|