YHLLEO commited on
Commit
0a2435b
·
verified ·
1 Parent(s): 2454481

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -30
README.md CHANGED
@@ -1,8 +1,8 @@
1
- ---
2
- license: apache-2.0
3
- datasets:
4
- - ILSVRC/imagenet-1k
5
- ---
6
 
7
  # Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe
8
 
@@ -13,34 +13,13 @@ datasets:
13
  </p>
14
 
15
 
16
- ## 1. 🔥 Updates
17
- - __[2025.12.15]__: Release the codes of [DSMoE](./DSMoE) and [JiTMoE](./JiTMoE).
18
-
19
-
20
- ## 2. 📖 Introduction
21
 
22
  We release the MoE Transformer that can be applied to both latent and pixel-space diffusion frameworks, employing DeepSeek-style expert modules, alternative intermediate widths, varying expert counts, and enhanced attention positional encodings. The models are already relased to Huggingface. <br>
23
 
24
- ## 3. Preparation
25
-
26
- ### 3.1 Dataset
27
- Download [ImageNet](http://image-net.org/download) dataset, and place it in your `IMAGENET_PATH`.
28
-
29
- ### 3.2 Installation
30
-
31
- Please follow the installations of [DiffMoE](https://github.com/KlingTeam/DiffMoE) and [JiT](https://github.com/LTH14/JiT), respectively.
32
-
33
- ### 3.3 Training
34
-
35
- See details in [DSMoE](./DSMoE) and [JiTMoE](./JiTMoE) respectively.
36
-
37
- ### 3.4 Evaluation
38
-
39
- We follow the evaluation protocols provided by [DiffMoE](https://github.com/KlingTeam/DiffMoE) and [JiT](https://github.com/LTH14/JiT).
40
-
41
- ## 4. Main results
42
 
43
- ### 4.1 Latent diffusion framework
44
 
45
  - Ours DSMoE v.s. [DiffMoE](https://arxiv.org/pdf/2503.14487) on 700K training steps with CFG = 1.0 (* refers to the reported results in the official paper):
46
 
@@ -67,4 +46,32 @@ We follow the evaluation protocols provided by [DiffMoE](https://github.com/Klin
67
  |DiffMoE-B-E16|130M|4.87|183.43|
68
  |DSMoE-B-E16|132M|4.50|186.79|
69
  |DSMoE-B-E48|118M|4.27|191.03|
70
- |DiffMoE-L-E16|458M|2.84|256.57|
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - ILSVRC/imagenet-1k
5
+ ---
6
 
7
  # Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe
8
 
 
13
  </p>
14
 
15
 
16
+ ## 📖 Introduction
 
 
 
 
17
 
18
  We release the MoE Transformer that can be applied to both latent and pixel-space diffusion frameworks, employing DeepSeek-style expert modules, alternative intermediate widths, varying expert counts, and enhanced attention positional encodings. The models are already relased to Huggingface. <br>
19
 
20
+ ## Main results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
+ ### Latent diffusion framework
23
 
24
  - Ours DSMoE v.s. [DiffMoE](https://arxiv.org/pdf/2503.14487) on 700K training steps with CFG = 1.0 (* refers to the reported results in the official paper):
25
 
 
46
  |DiffMoE-B-E16|130M|4.87|183.43|
47
  |DSMoE-B-E16|132M|4.50|186.79|
48
  |DSMoE-B-E48|118M|4.27|191.03|
49
+ |DiffMoE-L-E16|458M|2.84|256.57|
50
+ |DSMoE-L-E16|465M|2.59|272.55|
51
+ |DSMoE-L-E48|436M|2.55|278.35|
52
+ |DSMoE-3B-E16|965M|2.38|304.93|
53
+
54
+
55
+ ### Pixel-space diffusion framework
56
+
57
+ - Ours JiTMoE v.s. [JiT](https://arxiv.org/pdf/2511.13720) on 200 training epochs with CFG interval (* refers to the reported results in the official paper):
58
+
59
+ | Model Name | # Act. Params | FID-50K↓ | Inception Score↑ |
60
+ |----------------------------|-------------------------|---------|----------------|
61
+ |JiT-B/16|131M|4.81 (4.37*)| 222.32 (-)|
62
+ |JiTMoE-B/16-E16|133M|4.23| 245.53|
63
+ |JiT-L/16|459M| 3.19 (2.79*)| 309.72 (-)|
64
+ |JiTMoE-L/16-E16|465M|3.10| 311.34|
65
+
66
+
67
+ ## 🌟 Citation
68
+
69
+ ```
70
+ @article{liu2025efficient,
71
+ title={Efficient Training of Diffusion Mixture-of-Experts Models: A Practical Recipe},
72
+ author={Liu, Yahui and Yue, Yang and Zhang, Jingyuan and Sun, Chenxi and Zhou, Yang and Zeng, Wencong and Tang, Ruiming and Zhou, Guorui},
73
+ journal={arXiv preprint arXiv:2512.01252},
74
+ year={2025}
75
+ }
76
+ ```
77
+