Alex11556666 commited on Mar 9

Commit

2ada238

0 Parent(s):

Duplicate from deepgenteam/DeepGen-1.0

Browse files

Co-authored-by: Alex Wang(SII) <Alex11556666@users.noreply.huggingface.co>

Files changed (18) hide show

.gitattributes +49 -0
DeepGen_CKPT.zip.part-00000 +3 -0
DeepGen_CKPT.zip.part-00001 +3 -0
DeepGen_CKPT.zip.part-00002 +3 -0
DeepGen_CKPT.zip.part-00003 +3 -0
DeepGen_CKPT.zip.part-00004 +3 -0
DeepGen_CKPT.zip.part-00005 +3 -0
DeepGen_CKPT.zip.part-00006 +3 -0
DeepGen_CKPT.zip.part-00007 +3 -0
DeepGen_CKPT.zip.part-00008 +3 -0
DeepGen_CKPT.zip.part-00009 +3 -0
DeepGen_CKPT.zip.part-00010 +3 -0
README.md +128 -0
arch.png +3 -0
bubble_chart.png +3 -0
config.json +6 -0
model.pt +3 -0
teaser.png +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,49 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+arch.png filter=lfs diff=lfs merge=lfs -text
+bubble_chart.png filter=lfs diff=lfs merge=lfs -text
+teaser.png filter=lfs diff=lfs merge=lfs -text
+DeepGen_CKPT.zip.part-00000 filter=lfs diff=lfs merge=lfs -text
+DeepGen_CKPT.zip.part-00001 filter=lfs diff=lfs merge=lfs -text
+DeepGen_CKPT.zip.part-00002 filter=lfs diff=lfs merge=lfs -text
+DeepGen_CKPT.zip.part-00003 filter=lfs diff=lfs merge=lfs -text
+DeepGen_CKPT.zip.part-00004 filter=lfs diff=lfs merge=lfs -text
+DeepGen_CKPT.zip.part-00005 filter=lfs diff=lfs merge=lfs -text
+DeepGen_CKPT.zip.part-00006 filter=lfs diff=lfs merge=lfs -text
+DeepGen_CKPT.zip.part-00007 filter=lfs diff=lfs merge=lfs -text
+DeepGen_CKPT.zip.part-00008 filter=lfs diff=lfs merge=lfs -text
+DeepGen_CKPT.zip.part-00009 filter=lfs diff=lfs merge=lfs -text
+DeepGen_CKPT.zip.part-00010 filter=lfs diff=lfs merge=lfs -text

DeepGen_CKPT.zip.part-00000 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:49f94464b6b16f559dc6b09e31c2abe020a7cd7f1de4c4d5b046e307c1814776
+size 5368709120

DeepGen_CKPT.zip.part-00001 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9ca0fd771ae4061dc2d0d36f62b50bca675beab9b3dca424119d828a1b23b28e
+size 5368709120

DeepGen_CKPT.zip.part-00002 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:16671fcd9f92a879b831d2297a281757b473546ab758679a170de792abd787b6
+size 5368709120

DeepGen_CKPT.zip.part-00003 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b8022e2693aa690ad14e184616d7ef0d49dd519fc423a71582aa9c8d10b2136f
+size 5368709120

DeepGen_CKPT.zip.part-00004 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c8fd4a4c59212d45a35006f4b4dcd03f8dec684c46d852f3b720dff0bdddd9a4
+size 5368709120

DeepGen_CKPT.zip.part-00005 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bd0fba2bd69be06c705779a6c1a9a1d41366e0310ef696eae944872083109a2a
+size 5368709120

DeepGen_CKPT.zip.part-00006 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:db9573c3dc0d980450fc8e51e19b9c36971bf16d91ea560a9a9edf54824b672b
+size 5368709120

DeepGen_CKPT.zip.part-00007 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5eb739f5b16b92057487b7e9055119fb532872caf9857c13f76ea49fa45832ea
+size 5368709120

DeepGen_CKPT.zip.part-00008 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ce4eac2c5010cc7067f556344eaa4e93e1b6df1986eeaa9ced1aa4f20e3411f0
+size 5368709120

DeepGen_CKPT.zip.part-00009 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c9cd1e26bcb9ca5dba536c1e86bde3619c3f8f9d0bb80fa705994bbdcb7c4f00
+size 5368709120

DeepGen_CKPT.zip.part-00010 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6ce439a0e4c6dea344e5736e2b09dd17e835047eda9757eb58ebca485f27f506
+size 1993873384

README.md ADDED Viewed

	@@ -0,0 +1,128 @@

+---
+license: apache-2.0
+datasets:
+- Alex11556666/Reason_Tuning
+base_model:
+- Qwen/Qwen2.5-VL-3B-Instruct
+pipeline_tag: text-to-image
+---
+# 💡 DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing
+<p align="left">
+  <a href="http://arxiv.org/abs/2602.12205">
+    <img
+      src="https://img.shields.io/badge/DeepGen 1.0-Paper-red?logo=arxiv&logoColor=red" style="display: inline-block; vertical-align: middle;"
+      alt="DeepGen 1.0 Paper on arXiv"
+    />
+  </a>
+  <a href="https://github.com/deepgenteam/deepgen" target="_blank" style="margin: 2px;">
+      <img
+        alt="Github" src="https://img.shields.io/badge/DeepGen 1.0-Codebase-536af5?color=536af5&logo=github" style="display: inline-block; vertical-align: middle;"
+        alt="DeepGen 1.0 Codebase"
+      />
+  </a>
+    <a href="https://deepgenteam.github.io/" target="_blank" style="margin: 2px;">
+      <img
+        alt="Github" src="https://img.shields.io/badge/Website-project page-orange" style="display: inline-block; vertical-align: middle;"
+        alt="DeepGen 1.0 page"
+      />
+  </a>
+</p>
+DeepGen 1.0 is a lightweight unified multimodal model with only 5B parameters (3B VLM + 2B DiT). It integrates five core capabilities—general image generation, general image editing, reasoning image generation, reasoning image editing, and text rendering—within a single model. Across multiple authoritative benchmarks, DeepGen 1.0 is competitive with competitive with or surpassing the state-of-the-art unified multimodal models that are 3× to 16× larger, achieving comprehensive performance, demonstrating that massive scaling is not the sole path to high-performance multimodal generation.
+<p align="left"><img src="bubble_chart.png" width="80%"></p>
+## 🧠 Method
+Our core observation is that a lightweight model, when empowered by synergistic architecture design and data-centric training strategies, can achieve comprehensive capabilities competitive with or even surpassing much larger counterparts.
+To overcome the limitations of lightweight models in semantic understanding and fine-grained control, we introduce **Stacked Channel Bridging (SCB)**, a deep alignment framework that extracts hierarchical features from multiple VLM layers and fuses them with learnable ``think tokens'' to provide the generative backbone with structured, reasoning-rich guidance.
+We further design a data-centric training strategy spanning three progressive stages: (1) **Alignment Pre-training** on large-scale image-text pairs and editing triplets to synchronize VLM and DiT representations, (2) **Joint Supervised Fine-tuning** on a high-quality mixture of generation, editing, and reasoning tasks to foster omni-capabilities, and (3) **Reinforcement Learning with MR-GRPO**, which leverages a mixture of reward functions and supervision signals, resulting in substantial gains in generation quality and alignment with human preferences, while maintaining stable training progress and avoiding visual artifacts.
+<p align="left"><img src="arch.png" width="80%"></p>
+## 📊 Benchmarks
+### 1. General Image Generation
+| Model                 | Params      | Geneval ↑   | DPGBench ↑   | UniGenBench ↑ |
+| --------------------- | ----------- | ----------- | ------------ | ------------- |
+| OmniGen2                 | 3B + 4B         | 0.80         | 83.57         | 63.09        |
+| BAGEL                 | 14B         | 0.82        | 85.10        | 61.53         |
+| X-Omni                 | 7B + 12B         | 0.83         | 87.65🥉        | 53.77         |
+| Lumina-DiMOO                 | 8B         | 0.88🥇          | 86.04        | 71.12         |
+| Hunyuan-Image-3.0     | 80B         | 0.72        | 86.10        | —             |
+| Qwen-Image            | 7B + 20B    | 0.87 🥈     | 88.32 🥇     | 78.81 🥇      |
+| LongCat-Image         | 7B + 6B     | 0.87 🥈     | 86.80        | —             |
+| Z-Image-Turbo         | 4B + 6B     | 0.84        | 85.15        | 71.40         |
+| GLM-Image             | 9B + 7B     | —           | 84.78        | —             |
+| **DeepGen 1.0 (SFT)** | **3B + 2B** | 0.86 🥉 | 87.05    | 74.18 🥉  |
+| **DeepGen 1.0 (RL)**  | **3B + 2B** | 0.87 🥈 | 87.90 🥈 | 75.74 🥈  |
+### 2. General Image Editing
+| Model | Params | GEdit-EN ↑ | ImgEdit ↑ |
+| :--- | :--- | :--- | :--- |
+| BAGEL | 14B | 6.52 | 3.20 |
+| Qwen-Image-Edit [2509] | 7B + 20B | 7.54 🥈 | 4.35 🥈 |
+| LongCat-Image-Edit | 7B + 6B | 7.60 🥇 | 4.50 🥇 |
+| Mammoth2 | 8B + 3B + 2B | 6.60 | 4.06 |
+| **DeepGen 1.0 (SFT)** | **3B + 2B** | 7.12 | 4.09 |
+| **DeepGen 1.0 (RL)** | **3B + 2B** | 7.17 🥉 | 4.14 🥉 |
+### 3. Reasoning Image Generation
+| Model | Params | WISE ↑ | T2I-CoREBench ↑ |
+| :--- | :--- | :--- | :--- |
+| OmniGen2 | 3B + 4B | 0.47 | 36.1 |
+| BAGEL | 14B | 0.70 🥉 | 41.1 |
+| Hunyuan-Image-3.0 | 80B | 0.57 | 46.0 |
+| Qwen-Image | 7B + 20B | 0.62 | 46.3 🥉 |
+| LongCat-Image | 7B + 6B | 0.65 | 52.2 🥇 |
+| Z-Image-Turbo | 4B + 6B | - | 43.7 |
+| **DeepGen 1.0 (SFT)** | **3B + 2B** | 0.72 🥈 | 45.7 |
+| **DeepGen 1.0 (RL)** | **3B + 2B** | 0.73 🥇 | 46.5 🥈 |
+### 4. Reasoning Image Editing
+| Model | Params | RISE ↑ | UniREditBench ↑ |
+| :--- | :--- | :--- | :--- |
+| OmniGen2 | 3B + 4B | - | 43.4 |
+| BAGEL | 14B | 11.9 🥈 | 51.0 |
+| Qwen-Image-Edit [2509] | 7B + 20B | 8.9 | 56.5 🥉 |
+| **DeepGen 1.0 (SFT)** | **3B + 2B** | 13.3 🥇 | 77.5 🥇 |
+| **DeepGen 1.0 (RL)** | **3B + 2B** | 10.8 🥉 | 75.7 🥈 |
+## 🎨 Quantitative results
+<p align="left"><img src="teaser.png" width="80%"></p>
+## 🛠️ Usage
+### Merge ZIP Files
+To use the DeepGen checkpoints, please merge the sharded model files first. We release Pre-traning, Supervised Fine-Tuning and Reinforcement Learning checkpoints.
+```bash
+# Merge zip
+cat DeepGen_CKPT.zip.part-* > DeepGen_CKPT.zip
+# Unzip DeepGen checkpoints
+unzip DeepGen_CKPT.zip
+```
+```text
+checkpoints/
+├── DeepGen_CKPT
+    ├──Pretrain├──iter_200000.pth
+    ├── SFT├──iter_400000.pth
+    ├──RL├──MR-GDPO_final.pt
+```
+if you want only final model state please use `model.pt` directly , it is same as `MR-GDPO_final.pt`
+the Pretrain├──iter_200000.pth and SFT├──iter_400000.pth can be loaded for continuous training
+## ⭐ Citation
+```bibtex
+@article{wang2026deepgen,
+  title={DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing},
+  author={Wang, Dianyi and Li, Ruihang and Han, Feng and Ma, Chaofan and Song, Wei and Wang, Siyuan and Wang, Yibin and Xin, Yi and Liu, Hongjian and Zhang, Zhixiong and others},
+  journal={arXiv preprint arXiv:2602.12205},
+  year={2026}
+}
+```

arch.png ADDED Viewed

Git LFS Details

SHA256: c8f8ae60d50414f205e4dbd11e6fa5408339cad2498894629c20b0dfe5d95c1e
Pointer size: 131 Bytes
Size of remote file: 899 kB

bubble_chart.png ADDED Viewed

Git LFS Details

SHA256: f12af963afaa66a9050e82ec74a18bfe9e67dbea52804b177a1ee709096d3faa
Pointer size: 131 Bytes
Size of remote file: 187 kB

config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_class_name": "DeepGen-1.0",
+  "_diffusers_version": "0.35.2",
+  "_transformers_version": "4.56.1",
+  "_name_or_path": "deepgenteam/DeepGen-1.0 "
+}

model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9d2a4cc7b12a69bc3d481ba331b4ceec2dfd12f39de9d0985fad6431ef14d578
+size 16380038579

teaser.png ADDED Viewed

Git LFS Details

SHA256: e45fb40357997614f659ed5e3de903b978341f959bad7507c70bbb726788d296
Pointer size: 132 Bytes
Size of remote file: 5.64 MB