Add pipeline tag and citation
#1
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,9 +1,12 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
base_model:
|
| 4 |
- ByteDance-Seed/BAGEL-7B-MoT
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
-
|
|
|
|
|
|
|
| 7 |
<p align="left">
|
| 8 |
<a href="https://arxiv.org/abs/2602.02437">
|
| 9 |
<img
|
|
@@ -19,43 +22,57 @@ base_model:
|
|
| 19 |
</a>
|
| 20 |
</p>
|
| 21 |
|
| 22 |
-
|
| 23 |
|
| 24 |
<p align="left"><img src="unireason.png" width="80%"></p>
|
| 25 |
|
| 26 |
## 🧠 Method
|
| 27 |
-
Our core objective is to equip the unified multimodal
|
| 28 |
-
|
| 29 |
-
and surface-level organization into textual reasoning. This
|
| 30 |
-
process provides explicit and structured guidance for synthesizing an initial visual output, mirroring human conceptual
|
| 31 |
-
planning prior to rendering. The second complementary components is Fine-grained Editing-like Visual Refinement that re-assesses the initial synthesized image considering prior textual reasoning, reflectively identifies and verbalizes inconsistencies or missing details or incorporating a second round
|
| 32 |
-
of textual reasoning to think twice, enabling iterative reflection and correction.
|
| 33 |
<p align="left"><img src="unireason_pipeline.png" width="80%"></p>
|
| 34 |
|
| 35 |
## 📊 Benchmarks
|
|
|
|
| 36 |
### 1. Text-to-Image Generation
|
| 37 |
-
| Model | Geneval ↑ |DPGBench ↑ |WISE ↑ |
|
| 38 |
-
| ------------ | --------- | --------- |------
|
| 39 |
-
| BAGEL
|
| 40 |
-
| Hunyuan-Image-3.0
|
| 41 |
-
| Qwen-Image
|
| 42 |
-
| UniCoT
|
| 43 |
-
| **UniReason**
|
| 44 |
|
| 45 |
### 2. Image Editing
|
| 46 |
-
| Model
|
| 47 |
-
| ------------ | --------- | --------- |--------- |
|
| 48 |
-
| BAGEL
|
| 49 |
-
| Qwen-Image-Edit
|
| 50 |
-
| LightFusion-World | 6.58
|
| 51 |
-
| UniCoT
|
| 52 |
-
| **UniReason**
|
| 53 |
|
| 54 |
-
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
-
To use the UniReason checkpoints please merge model files first, We release both stage_1(Foundational Generation Strengthening) and stage_2(Interleaved Reasoning Tuning) checkpoints
|
| 57 |
```bash
|
|
|
|
| 58 |
cat model_part_* > model.safetensors
|
| 59 |
|
|
|
|
| 60 |
cat ema_part_* > ema.safetensors
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
```
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
base_model:
|
| 3 |
- ByteDance-Seed/BAGEL-7B-MoT
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
pipeline_tag: any-to-any
|
| 6 |
---
|
| 7 |
+
|
| 8 |
+
# 🎨 UniReason • Unified Reasoning Framework for World Knowledge–Aligned Image Generation and Editing
|
| 9 |
+
|
| 10 |
<p align="left">
|
| 11 |
<a href="https://arxiv.org/abs/2602.02437">
|
| 12 |
<img
|
|
|
|
| 22 |
</a>
|
| 23 |
</p>
|
| 24 |
|
| 25 |
+
UniReason is a unified framework that harmonizes text-to-image generation and image editing through a dual reasoning paradigm. We formulate generation as world knowledge-enhanced planning to inject implicit constraints, and leverage editing capabilities for fine-grained visual refinement to further correct visual errors via self-reflection. This approach unifies generation and editing within a shared representation, mirroring the human cognitive process of planning followed by refinement.
|
| 26 |
|
| 27 |
<p align="left"><img src="unireason.png" width="80%"></p>
|
| 28 |
|
| 29 |
## 🧠 Method
|
| 30 |
+
Our core objective is to equip the unified multimodal model to infer implicit world knowledge underlying abstract instructions, and integrate world knowledge inference and surface-level organization into textual reasoning. This process provides explicit and structured guidance for synthesizing an initial visual output, mirroring human conceptual planning prior to rendering. The second complementary component is Fine-grained Editing-like Visual Refinement that re-assesses the initial synthesized image considering prior textual reasoning, reflectively identifies and verbalizes inconsistencies or missing details, enabling iterative reflection and correction.
|
| 31 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
<p align="left"><img src="unireason_pipeline.png" width="80%"></p>
|
| 33 |
|
| 34 |
## 📊 Benchmarks
|
| 35 |
+
|
| 36 |
### 1. Text-to-Image Generation
|
| 37 |
+
| Model | Geneval ↑ | DPGBench ↑ | WISE ↑ |
|
| 38 |
+
| ------------ | --------- | ---------- | ------ |
|
| 39 |
+
| BAGEL | 0.88 | 85.07 | 0.70 |
|
| 40 |
+
| Hunyuan-Image-3.0 | 0.72 | 86.10 | 0.57 |
|
| 41 |
+
| Qwen-Image | 0.74 | **88.32** | 0.62 |
|
| 42 |
+
| UniCoT | 0.83 | - | 0.75 |
|
| 43 |
+
| **UniReason** | **0.90** | 86.21 | **0.78** |
|
| 44 |
|
| 45 |
### 2. Image Editing
|
| 46 |
+
| Model | GEdit-EN ↑ | KrisBench ↑ | UniREditBench ↑ |
|
| 47 |
+
| ----------------- | ---------- | ----------- | --------------- |
|
| 48 |
+
| BAGEL | 6.52 | 60.18 | 50.96 |
|
| 49 |
+
| Qwen-Image-Edit | **7.56** | - | 56.52 |
|
| 50 |
+
| LightFusion-World | 6.58 | 61.85 | - |
|
| 51 |
+
| UniCoT | 6.74 | 68.00 | - |
|
| 52 |
+
| **UniReason** | 6.94 | **68.23** | **70.06** |
|
| 53 |
|
| 54 |
+
## 🛠️ Usage
|
| 55 |
+
|
| 56 |
+
### Merge Model Files
|
| 57 |
+
To use the UniReason checkpoints, please merge the sharded model files first. We release both stage_1 (Foundational Generation Strengthening) and stage_2 (Interleaved Reasoning Tuning) checkpoints.
|
| 58 |
|
|
|
|
| 59 |
```bash
|
| 60 |
+
# Merge model weights
|
| 61 |
cat model_part_* > model.safetensors
|
| 62 |
|
| 63 |
+
# Merge EMA weights
|
| 64 |
cat ema_part_* > ema.safetensors
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
## ✍️ Citation
|
| 68 |
+
```bibtex
|
| 69 |
+
@misc{wang2026unireason10unifiedreasoning,
|
| 70 |
+
title={UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing},
|
| 71 |
+
author={Dianyi Wang and Chaofan Ma and Feng Han and Size Wu and Wei Song and Yibin Wang and Zhixiong Zhang and Tianhang Wang and Siyuan Wang and Zhongyu Wei and Jiaqi Wang},
|
| 72 |
+
year={2026},
|
| 73 |
+
eprint={2602.02437},
|
| 74 |
+
archivePrefix={arXiv},
|
| 75 |
+
primaryClass={cs.CV},
|
| 76 |
+
url={https://arxiv.org/abs/2602.02437},
|
| 77 |
+
}
|
| 78 |
```
|