bartduis commited on
Commit
1da4105
Β·
0 Parent(s):

Initial public release

Browse files
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ pipeline_tag: text-to-image
4
+ tags:
5
+ - depth-estimation
6
+ - text-to-3d
7
+ - diffusion
8
+ - flux
9
+ library_name: flux_rgbd
10
+ ---
11
+
12
+ # Modality Forcing for Scalable Spatial Generation
13
+
14
+ Joint **text β†’ RGB + depth** generation with a single diffusion
15
+ transformer, built on FLUX.2. Modality Forcing assigns separate noise levels
16
+ per modality during post-training, so one model supports joint generation
17
+ (text β†’ RGB-D), image-to-depth, and depth-to-image at inference.
18
+
19
+ - πŸ“„ Paper: [arXiv:2606.13676](https://arxiv.org/abs/2606.13676)
20
+ - πŸ’» Code: [github.com/Duisterhof/modality-forcing](https://github.com/Duisterhof/modality-forcing)
21
+ - πŸš€ Demo: [Hugging Face Space](https://huggingface.co/spaces/bartduis/modality_forcing)
22
+ - 🌐 Project page: [modality-forcing.github.io](https://modality-forcing.github.io/)
23
+
24
+ ## Files
25
+
26
+ | File | Description |
27
+ |------|-------------|
28
+ | `model.safetensors` | FluxRGBD DiT (12B total β€” 9B-class FLUX.2 backbone + depth streams, bf16) |
29
+ | `config.json` | Model variant config (`flux_rgbd_9b_v2`) |
30
+ | `ae_encoder.safetensors` / `ae_decoder.safetensors` | FLUX.2 autoencoder |
31
+
32
+ The Qwen3-8B text encoder is pulled separately from
33
+ [`Qwen/Qwen3-8B`](https://huggingface.co/Qwen/Qwen3-8B).
34
+
35
+ ## Usage
36
+
37
+ ```bash
38
+ git clone https://github.com/Duisterhof/modality-forcing.git
39
+ cd modality-forcing
40
+ bash install.sh
41
+ python scripts/joint.py --prompt "a cozy sunlit kitchen with wooden cabinets"
42
+ ```
43
+
44
+ The scripts download these weights automatically (`bartduis/modality_forcing`
45
+ is the default `--model`).
46
+
47
+ ## License
48
+
49
+ The model weights are released under **CC BY-NC 4.0** (non-commercial). The
50
+ inference code is Apache-2.0; see the GitHub repository.
51
+
52
+ ## Citation
53
+
54
+ ```bibtex
55
+ @article{duisterhof2026mofo,
56
+ title = {Modality Forcing for Scalable Spatial Generation},
57
+ author = {Duisterhof, Bardienus Pieter and Ramanan, Deva and Ichnowski, Jeffrey and Johnson, Justin and Park, Keunhong},
58
+ journal = {arXiv preprint arXiv:2606.13676},
59
+ year = {2026}
60
+ }
61
+ ```
ae_decoder.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:393477cff9c15055e512525c7b90cda02b06e0378c93dfc75b261d10825842b1
3
+ size 198495428
ae_encoder.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fde73aad493e1343b138660c87e6a376dc5b2d6bf9b061206b580088b78ff6e3
3
+ size 137714880
config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "variant": "flux_rgbd_9b_v2",
3
+ "model_type": "flux_rgbd",
4
+ "torch_dtype": "bfloat16"
5
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3da9bbd5e45ba92e22bfb7c338f6bbcce7ba11e9119c409381177396e17b1f0f
3
+ size 23972604108