modality_forcing / README.md
bartduis's picture
Initial public release
1da4105
|
Raw
History Blame Contribute Delete
1.99 kB
---
license: cc-by-nc-4.0
pipeline_tag: text-to-image
tags:
- depth-estimation
- text-to-3d
- diffusion
- flux
library_name: flux_rgbd
---
# Modality Forcing for Scalable Spatial Generation
Joint **text β†’ RGB + depth** generation with a single diffusion
transformer, built on FLUX.2. Modality Forcing assigns separate noise levels
per modality during post-training, so one model supports joint generation
(text β†’ RGB-D), image-to-depth, and depth-to-image at inference.
- πŸ“„ Paper: [arXiv:2606.13676](https://arxiv.org/abs/2606.13676)
- πŸ’» Code: [github.com/Duisterhof/modality-forcing](https://github.com/Duisterhof/modality-forcing)
- πŸš€ Demo: [Hugging Face Space](https://huggingface.co/spaces/bartduis/modality_forcing)
- 🌐 Project page: [modality-forcing.github.io](https://modality-forcing.github.io/)
## Files
| File | Description |
|------|-------------|
| `model.safetensors` | FluxRGBD DiT (12B total β€” 9B-class FLUX.2 backbone + depth streams, bf16) |
| `config.json` | Model variant config (`flux_rgbd_9b_v2`) |
| `ae_encoder.safetensors` / `ae_decoder.safetensors` | FLUX.2 autoencoder |
The Qwen3-8B text encoder is pulled separately from
[`Qwen/Qwen3-8B`](https://huggingface.co/Qwen/Qwen3-8B).
## Usage
```bash
git clone https://github.com/Duisterhof/modality-forcing.git
cd modality-forcing
bash install.sh
python scripts/joint.py --prompt "a cozy sunlit kitchen with wooden cabinets"
```
The scripts download these weights automatically (`bartduis/modality_forcing`
is the default `--model`).
## License
The model weights are released under **CC BY-NC 4.0** (non-commercial). The
inference code is Apache-2.0; see the GitHub repository.
## Citation
```bibtex
@article{duisterhof2026mofo,
title = {Modality Forcing for Scalable Spatial Generation},
author = {Duisterhof, Bardienus Pieter and Ramanan, Deva and Ichnowski, Jeffrey and Johnson, Justin and Park, Keunhong},
journal = {arXiv preprint arXiv:2606.13676},
year = {2026}
}
```