Alex11556666
/

UniReason

Text-to-Image

Model card Files Files and versions

xet

Community

Add pipeline tag and citation

by nielsr HF Staff - opened Feb 4

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+42

-25

Files changed (1) hide show

README.md +42 -25

README.md CHANGED Viewed

@@ -1,9 +1,12 @@
 ---
-license: apache-2.0
 base_model:
 - ByteDance-Seed/BAGEL-7B-MoT
 ---
-# 🎨 UniReason •  Unified Reasoning Framework for World Knowledge–Aligned Image Generation and Editing
 <p align="left">
   <a href="https://arxiv.org/abs/2602.02437">
     <img
@@ -19,43 +22,57 @@ base_model:
   </a>
 </p>
-we propose UniReason, a unified framework that harmonizes these two tasks through a dual reasoning paradigm. We formulate generation as world knowledge-enhanced planning to inject implicit constraints, and leverage editing capabilities for fine-grained visual refinement to further correct visual errors via self-reflection. This approach unifies generation and editing within a shared representation, mirroring the human cognitive process of planning followed by refinement.
 <p align="left"><img src="unireason.png" width="80%"></p>
 ## 🧠 Method
-Our core objective is to equip the unified multimodal
-model to infer implicit world knowledge underlying abstract instructions, and integrate world knowledge inference
-and surface-level organization into textual reasoning. This
-process provides explicit and structured guidance for synthesizing an initial visual output, mirroring human conceptual
-planning prior to rendering. The second complementary components is Fine-grained Editing-like Visual Refinement that re-assesses the initial synthesized image considering prior textual reasoning, reflectively identifies and verbalizes inconsistencies or missing details or incorporating a second round
-of textual reasoning to think twice, enabling iterative reflection and correction.
 <p align="left"><img src="unireason_pipeline.png" width="80%"></p>
 ## 📊 Benchmarks
 ### 1. Text-to-Image Generation
-| Model        | Geneval ↑ |DPGBench ↑ |WISE ↑ |
-| ------------ | --------- | --------- |--------- |
-| BAGEL  | 0.88      |85.07|0.70|
-| Hunyuan-Image-3.0  |  0.72     |86.10|0.57|
-| Qwen-Image  | 0.74      |**88.32** |0.62|
-| UniCoT    | 0.83  |- |0.75|
-| **UniReason**    | **0.90**  |86.21|**0.78**|
 ### 2. Image Editing
-| Model      |GEdit-EN ↑ |KrisBench ↑ |UniREditBench ↑ |
-| ------------ | --------- | --------- |--------- |
-| BAGEL  | 6.52      |60.18|50.96|
-| Qwen-Image-Edit  | **7.56**      |-|56.52|
-| LightFusion-World | 6.58      |61.85|-|
-| UniCoT    | 6.74  |68.00|-|
-| **UniReason**    | 6.94  |**68.23**|**70.06**|
-**Merge Model Files**
-To use the UniReason checkpoints please merge model files first,  We release both stage_1(Foundational Generation Strengthening) and stage_2(Interleaved Reasoning Tuning) checkpoints
 ```bash
 cat model_part_* > model.safetensors
 cat ema_part_* > ema.safetensors
 ```

 ---
 base_model:
 - ByteDance-Seed/BAGEL-7B-MoT
+license: apache-2.0
+pipeline_tag: any-to-any
 ---
+# 🎨 UniReason • Unified Reasoning Framework for World Knowledge–Aligned Image Generation and Editing
 <p align="left">
   <a href="https://arxiv.org/abs/2602.02437">
     <img
   </a>
 </p>
+UniReason is a unified framework that harmonizes text-to-image generation and image editing through a dual reasoning paradigm. We formulate generation as world knowledge-enhanced planning to inject implicit constraints, and leverage editing capabilities for fine-grained visual refinement to further correct visual errors via self-reflection. This approach unifies generation and editing within a shared representation, mirroring the human cognitive process of planning followed by refinement.
 <p align="left"><img src="unireason.png" width="80%"></p>
 ## 🧠 Method
+Our core objective is to equip the unified multimodal model to infer implicit world knowledge underlying abstract instructions, and integrate world knowledge inference and surface-level organization into textual reasoning. This process provides explicit and structured guidance for synthesizing an initial visual output, mirroring human conceptual planning prior to rendering. The second complementary component is Fine-grained Editing-like Visual Refinement that re-assesses the initial synthesized image considering prior textual reasoning, reflectively identifies and verbalizes inconsistencies or missing details, enabling iterative reflection and correction.
 <p align="left"><img src="unireason_pipeline.png" width="80%"></p>
 ## 📊 Benchmarks
 ### 1. Text-to-Image Generation
+| Model        | Geneval ↑ | DPGBench ↑ | WISE ↑ |
+| ------------ | --------- | ---------- | ------ |
+| BAGEL        | 0.88      | 85.07      | 0.70   |
+| Hunyuan-Image-3.0 | 0.72  | 86.10      | 0.57   |
+| Qwen-Image   | 0.74      | **88.32**  | 0.62   |
+| UniCoT       | 0.83      | -          | 0.75   |
+| **UniReason** | **0.90** | 86.21      | **0.78** |
 ### 2. Image Editing
+| Model             | GEdit-EN ↑ | KrisBench ↑ | UniREditBench ↑ |
+| ----------------- | ---------- | ----------- | --------------- |
+| BAGEL             | 6.52       | 60.18       | 50.96           |
+| Qwen-Image-Edit   | **7.56**   | -           | 56.52           |
+| LightFusion-World | 6.58       | 61.85       | -               |
+| UniCoT            | 6.74       | 68.00       | -               |
+| **UniReason**     | 6.94       | **68.23**   | **70.06**       |
+## 🛠️ Usage
+### Merge Model Files
+To use the UniReason checkpoints, please merge the sharded model files first. We release both stage_1 (Foundational Generation Strengthening) and stage_2 (Interleaved Reasoning Tuning) checkpoints.
 ```bash
+# Merge model weights
 cat model_part_* > model.safetensors
+# Merge EMA weights
 cat ema_part_* > ema.safetensors
+```
+## ✍️ Citation
+```bibtex
+@misc{wang2026unireason10unifiedreasoning,
+      title={UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing},
+      author={Dianyi Wang and Chaofan Ma and Feng Han and Size Wu and Wei Song and Yibin Wang and Zhixiong Zhang and Tianhang Wang and Siyuan Wang and Zhongyu Wei and Jiaqi Wang},
+      year={2026},
+      eprint={2602.02437},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2602.02437},
+}
 ```