rhli
/

Image-to-Image
rhli Alex11556666 commited on
Commit
2e012f2
Β·
0 Parent(s):

Duplicate from deepgenteam/DeepGen-1.0

Browse files

Co-authored-by: Alex Wang(SII) <Alex11556666@users.noreply.huggingface.co>

Files changed (5) hide show
  1. .gitattributes +52 -0
  2. README.md +116 -0
  3. docs/arch.png +3 -0
  4. docs/bubble_chart.png +3 -0
  5. docs/teaser.png +3 -0
.gitattributes ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ arch.png filter=lfs diff=lfs merge=lfs -text
37
+ bubble_chart.png filter=lfs diff=lfs merge=lfs -text
38
+ teaser.png filter=lfs diff=lfs merge=lfs -text
39
+ DeepGen_CKPT.zip.part-00000 filter=lfs diff=lfs merge=lfs -text
40
+ DeepGen_CKPT.zip.part-00001 filter=lfs diff=lfs merge=lfs -text
41
+ DeepGen_CKPT.zip.part-00002 filter=lfs diff=lfs merge=lfs -text
42
+ DeepGen_CKPT.zip.part-00003 filter=lfs diff=lfs merge=lfs -text
43
+ DeepGen_CKPT.zip.part-00004 filter=lfs diff=lfs merge=lfs -text
44
+ DeepGen_CKPT.zip.part-00005 filter=lfs diff=lfs merge=lfs -text
45
+ DeepGen_CKPT.zip.part-00006 filter=lfs diff=lfs merge=lfs -text
46
+ DeepGen_CKPT.zip.part-00007 filter=lfs diff=lfs merge=lfs -text
47
+ DeepGen_CKPT.zip.part-00008 filter=lfs diff=lfs merge=lfs -text
48
+ DeepGen_CKPT.zip.part-00009 filter=lfs diff=lfs merge=lfs -text
49
+ DeepGen_CKPT.zip.part-00010 filter=lfs diff=lfs merge=lfs -text
50
+ docs/bubble_chart.png filter=lfs diff=lfs merge=lfs -text
51
+ docs/arch.png filter=lfs diff=lfs merge=lfs -text
52
+ docs/teaser.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Alex11556666/Reason_Tuning
5
+ base_model:
6
+ - Qwen/Qwen2.5-VL-3B-Instruct
7
+ pipeline_tag: image-to-image
8
+ ---
9
+
10
+ > **DeepGen 1.0 Checkpoints**
11
+ >
12
+ > | Stage | Repository | Description |
13
+ > | :--- | :--- | :--- |
14
+ > | Pretrain | [deepgenteam/DeepGen-1.0-Pretrain](https://huggingface.co/deepgenteam/DeepGen-1.0-Pretrain) | Alignment pre-training checkpoint |
15
+ > | SFT | [deepgenteam/DeepGen-1.0-SFT](https://huggingface.co/deepgenteam/DeepGen-1.0-SFT) | Supervised fine-tuning checkpoint |
16
+ > | **RL** | **[deepgenteam/DeepGen-1.0](https://huggingface.co/deepgenteam/DeepGen-1.0)** | Reinforcement learning checkpoint (MR-GDPO) *(this repo)* |
17
+
18
+ # πŸ’‘ DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing
19
+ <p align="left">
20
+ <a href="http://arxiv.org/abs/2602.12205">
21
+ <img
22
+ src="https://img.shields.io/badge/DeepGen 1.0-Paper-red?logo=arxiv&logoColor=red" style="display: inline-block; vertical-align: middle;"
23
+ alt="DeepGen 1.0 Paper on arXiv"
24
+ />
25
+ </a>
26
+ <a href="https://github.com/deepgenteam/deepgen" target="_blank" style="margin: 2px;">
27
+ <img
28
+ alt="Github" src="https://img.shields.io/badge/DeepGen 1.0-Codebase-536af5?color=536af5&logo=github" style="display: inline-block; vertical-align: middle;"
29
+ alt="DeepGen 1.0 Codebase"
30
+ />
31
+ </a>
32
+ <a href="https://deepgenteam.github.io/" target="_blank" style="margin: 2px;">
33
+ <img
34
+ alt="Github" src="https://img.shields.io/badge/Website-project page-orange" style="display: inline-block; vertical-align: middle;"
35
+ alt="DeepGen 1.0 page"
36
+ />
37
+ </a>
38
+ </p>
39
+ DeepGen 1.0 is a lightweight unified multimodal model with only 5B parameters (3B VLM + 2B DiT). It integrates five core capabilitiesβ€”general image generation, general image editing, reasoning image generation, reasoning image editing, and text renderingβ€”within a single model. Across multiple authoritative benchmarks, DeepGen 1.0 is competitive with competitive with or surpassing the state-of-the-art unified multimodal models that are 3Γ— to 16Γ— larger, achieving comprehensive performance, demonstrating that massive scaling is not the sole path to high-performance multimodal generation.
40
+ <p align="left"><img src="docs/bubble_chart.png" width="80%"></p>
41
+
42
+ ## 🧠 Method
43
+ Our core observation is that a lightweight model, when empowered by synergistic architecture design and data-centric training strategies, can achieve comprehensive capabilities competitive with or even surpassing much larger counterparts.
44
+ To overcome the limitations of lightweight models in semantic understanding and fine-grained control, we introduce **Stacked Channel Bridging (SCB)**, a deep alignment framework that extracts hierarchical features from multiple VLM layers and fuses them with learnable ``think tokens'' to provide the generative backbone with structured, reasoning-rich guidance.
45
+ We further design a data-centric training strategy spanning three progressive stages: (1) **Alignment Pre-training** on large-scale image-text pairs and editing triplets to synchronize VLM and DiT representations, (2) **Joint Supervised Fine-tuning** on a high-quality mixture of generation, editing, and reasoning tasks to foster omni-capabilities, and (3) **Reinforcement Learning with MR-GRPO**, which leverages a mixture of reward functions and supervision signals, resulting in substantial gains in generation quality and alignment with human preferences, while maintaining stable training progress and avoiding visual artifacts.
46
+
47
+ <p align="left"><img src="docs/arch.png" width="80%"></p>
48
+
49
+ ## πŸ“Š Benchmarks
50
+
51
+ ### 1. General Image Generation
52
+ | Model | Params | Geneval ↑ | DPGBench ↑ | UniGenBench ↑ |
53
+ | --------------------- | ----------- | ----------- | ------------ | ------------- |
54
+ | OmniGen2 | 3B + 4B | 0.80 | 83.57 | 63.09 |
55
+ | BAGEL | 14B | 0.82 | 85.10 | 61.53 |
56
+ | X-Omni | 7B + 12B | 0.83 | 87.65πŸ₯‰ | 53.77 |
57
+ | Lumina-DiMOO | 8B | 0.88πŸ₯‡ | 86.04 | 71.12 |
58
+ | Hunyuan-Image-3.0 | 80B | 0.72 | 86.10 | β€” |
59
+ | Qwen-Image | 7B + 20B | 0.87 πŸ₯ˆ | 88.32 πŸ₯‡ | 78.81 πŸ₯‡ |
60
+ | LongCat-Image | 7B + 6B | 0.87 πŸ₯ˆ | 86.80 | β€” |
61
+ | Z-Image-Turbo | 4B + 6B | 0.84 | 85.15 | 71.40 |
62
+ | GLM-Image | 9B + 7B | β€” | 84.78 | β€” |
63
+ | **DeepGen 1.0 (SFT)** | **3B + 2B** | 0.86 πŸ₯‰ | 87.05 | 74.18 πŸ₯‰ |
64
+ | **DeepGen 1.0 (RL)** | **3B + 2B** | 0.87 πŸ₯ˆ | 87.90 πŸ₯ˆ | 75.74 πŸ₯ˆ |
65
+
66
+
67
+
68
+ ### 2. General Image Editing
69
+
70
+ | Model | Params | GEdit-EN ↑ | ImgEdit ↑ |
71
+ | :--- | :--- | :--- | :--- |
72
+ | BAGEL | 14B | 6.52 | 3.20 |
73
+ | Qwen-Image-Edit [2509] | 7B + 20B | 7.54 πŸ₯ˆ | 4.35 πŸ₯ˆ |
74
+ | LongCat-Image-Edit | 7B + 6B | 7.60 πŸ₯‡ | 4.50 πŸ₯‡ |
75
+ | Mammoth2 | 8B + 3B + 2B | 6.60 | 4.06 |
76
+ | **DeepGen 1.0 (SFT)** | **3B + 2B** | 7.12 | 4.09 |
77
+ | **DeepGen 1.0 (RL)** | **3B + 2B** | 7.17 πŸ₯‰ | 4.14 πŸ₯‰ |
78
+
79
+ ### 3. Reasoning Image Generation
80
+ | Model | Params | WISE ↑ | T2I-CoREBench ↑ |
81
+ | :--- | :--- | :--- | :--- |
82
+ | OmniGen2 | 3B + 4B | 0.47 | 36.1 |
83
+ | BAGEL | 14B | 0.70 πŸ₯‰ | 41.1 |
84
+ | Hunyuan-Image-3.0 | 80B | 0.57 | 46.0 |
85
+ | Qwen-Image | 7B + 20B | 0.62 | 46.3 πŸ₯‰ |
86
+ | LongCat-Image | 7B + 6B | 0.65 | 52.2 πŸ₯‡ |
87
+ | Z-Image-Turbo | 4B + 6B | - | 43.7 |
88
+ | **DeepGen 1.0 (SFT)** | **3B + 2B** | 0.72 πŸ₯ˆ | 45.7 |
89
+ | **DeepGen 1.0 (RL)** | **3B + 2B** | 0.73 πŸ₯‡ | 46.5 πŸ₯ˆ |
90
+
91
+ ### 4. Reasoning Image Editing
92
+
93
+ | Model | Params | RISE ↑ | UniREditBench ↑ |
94
+ | :--- | :--- | :--- | :--- |
95
+ | OmniGen2 | 3B + 4B | - | 43.4 |
96
+ | BAGEL | 14B | 11.9 πŸ₯ˆ | 51.0 |
97
+ | Qwen-Image-Edit [2509] | 7B + 20B | 8.9 | 56.5 πŸ₯‰ |
98
+ | **DeepGen 1.0 (SFT)** | **3B + 2B** | 13.3 πŸ₯‡ | 77.5 πŸ₯‡ |
99
+ | **DeepGen 1.0 (RL)** | **3B + 2B** | 10.8 πŸ₯‰ | 75.7 πŸ₯ˆ |
100
+
101
+ ## 🎨 Quantitative results
102
+ <p align="left"><img src="docs/teaser.png" width="80%"></p>
103
+
104
+ ## πŸ› οΈ Usage
105
+
106
+ ### Download Checkpoint
107
+ This repository contains the **Reinforcement Learning (MR-GDPO)** checkpoint β€” the final release model.
108
+
109
+ ```bash
110
+ # Using hf CLI
111
+ hf download deepgenteam/DeepGen-1.0 model.pt --local-dir .
112
+
113
+ # Or using Python
114
+ from huggingface_hub import hf_hub_download
115
+ hf_hub_download("deepgenteam/DeepGen-1.0", "model.pt", local_dir=".")
116
+ ```
docs/arch.png ADDED

Git LFS Details

  • SHA256: c8f8ae60d50414f205e4dbd11e6fa5408339cad2498894629c20b0dfe5d95c1e
  • Pointer size: 131 Bytes
  • Size of remote file: 899 kB
docs/bubble_chart.png ADDED

Git LFS Details

  • SHA256: f12af963afaa66a9050e82ec74a18bfe9e67dbea52804b177a1ee709096d3faa
  • Pointer size: 131 Bytes
  • Size of remote file: 187 kB
docs/teaser.png ADDED

Git LFS Details

  • SHA256: e45fb40357997614f659ed5e3de903b978341f959bad7507c70bbb726788d296
  • Pointer size: 132 Bytes
  • Size of remote file: 5.64 MB