Add pipeline tag and citation

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +42 -25
README.md CHANGED
@@ -1,9 +1,12 @@
1
  ---
2
- license: apache-2.0
3
  base_model:
4
  - ByteDance-Seed/BAGEL-7B-MoT
 
 
5
  ---
6
- # 🎨 UniReason • Unified Reasoning Framework for World Knowledge–Aligned Image Generation and Editing
 
 
7
  <p align="left">
8
  <a href="https://arxiv.org/abs/2602.02437">
9
  <img
@@ -19,43 +22,57 @@ base_model:
19
  </a>
20
  </p>
21
 
22
- we propose UniReason, a unified framework that harmonizes these two tasks through a dual reasoning paradigm. We formulate generation as world knowledge-enhanced planning to inject implicit constraints, and leverage editing capabilities for fine-grained visual refinement to further correct visual errors via self-reflection. This approach unifies generation and editing within a shared representation, mirroring the human cognitive process of planning followed by refinement.
23
 
24
  <p align="left"><img src="unireason.png" width="80%"></p>
25
 
26
  ## 🧠 Method
27
- Our core objective is to equip the unified multimodal
28
- model to infer implicit world knowledge underlying abstract instructions, and integrate world knowledge inference
29
- and surface-level organization into textual reasoning. This
30
- process provides explicit and structured guidance for synthesizing an initial visual output, mirroring human conceptual
31
- planning prior to rendering. The second complementary components is Fine-grained Editing-like Visual Refinement that re-assesses the initial synthesized image considering prior textual reasoning, reflectively identifies and verbalizes inconsistencies or missing details or incorporating a second round
32
- of textual reasoning to think twice, enabling iterative reflection and correction.
33
  <p align="left"><img src="unireason_pipeline.png" width="80%"></p>
34
 
35
  ## 📊 Benchmarks
 
36
  ### 1. Text-to-Image Generation
37
- | Model | Geneval ↑ |DPGBench ↑ |WISE ↑ |
38
- | ------------ | --------- | --------- |--------- |
39
- | BAGEL | 0.88 |85.07|0.70|
40
- | Hunyuan-Image-3.0 | 0.72 |86.10|0.57|
41
- | Qwen-Image | 0.74 |**88.32** |0.62|
42
- | UniCoT | 0.83 |- |0.75|
43
- | **UniReason** | **0.90** |86.21|**0.78**|
44
 
45
  ### 2. Image Editing
46
- | Model |GEdit-EN ↑ |KrisBench ↑ |UniREditBench ↑ |
47
- | ------------ | --------- | --------- |--------- |
48
- | BAGEL | 6.52 |60.18|50.96|
49
- | Qwen-Image-Edit | **7.56** |-|56.52|
50
- | LightFusion-World | 6.58 |61.85|-|
51
- | UniCoT | 6.74 |68.00|-|
52
- | **UniReason** | 6.94 |**68.23**|**70.06**|
53
 
54
- **Merge Model Files**
 
 
 
55
 
56
- To use the UniReason checkpoints please merge model files first, We release both stage_1(Foundational Generation Strengthening) and stage_2(Interleaved Reasoning Tuning) checkpoints
57
  ```bash
 
58
  cat model_part_* > model.safetensors
59
 
 
60
  cat ema_part_* > ema.safetensors
 
 
 
 
 
 
 
 
 
 
 
 
 
61
  ```
 
1
  ---
 
2
  base_model:
3
  - ByteDance-Seed/BAGEL-7B-MoT
4
+ license: apache-2.0
5
+ pipeline_tag: any-to-any
6
  ---
7
+
8
+ # 🎨 UniReason • Unified Reasoning Framework for World Knowledge–Aligned Image Generation and Editing
9
+
10
  <p align="left">
11
  <a href="https://arxiv.org/abs/2602.02437">
12
  <img
 
22
  </a>
23
  </p>
24
 
25
+ UniReason is a unified framework that harmonizes text-to-image generation and image editing through a dual reasoning paradigm. We formulate generation as world knowledge-enhanced planning to inject implicit constraints, and leverage editing capabilities for fine-grained visual refinement to further correct visual errors via self-reflection. This approach unifies generation and editing within a shared representation, mirroring the human cognitive process of planning followed by refinement.
26
 
27
  <p align="left"><img src="unireason.png" width="80%"></p>
28
 
29
  ## 🧠 Method
30
+ Our core objective is to equip the unified multimodal model to infer implicit world knowledge underlying abstract instructions, and integrate world knowledge inference and surface-level organization into textual reasoning. This process provides explicit and structured guidance for synthesizing an initial visual output, mirroring human conceptual planning prior to rendering. The second complementary component is Fine-grained Editing-like Visual Refinement that re-assesses the initial synthesized image considering prior textual reasoning, reflectively identifies and verbalizes inconsistencies or missing details, enabling iterative reflection and correction.
31
+
 
 
 
 
32
  <p align="left"><img src="unireason_pipeline.png" width="80%"></p>
33
 
34
  ## 📊 Benchmarks
35
+
36
  ### 1. Text-to-Image Generation
37
+ | Model | Geneval ↑ | DPGBench ↑ | WISE ↑ |
38
+ | ------------ | --------- | ---------- | ------ |
39
+ | BAGEL | 0.88 | 85.07 | 0.70 |
40
+ | Hunyuan-Image-3.0 | 0.72 | 86.10 | 0.57 |
41
+ | Qwen-Image | 0.74 | **88.32** | 0.62 |
42
+ | UniCoT | 0.83 | - | 0.75 |
43
+ | **UniReason** | **0.90** | 86.21 | **0.78** |
44
 
45
  ### 2. Image Editing
46
+ | Model | GEdit-EN ↑ | KrisBench ↑ | UniREditBench ↑ |
47
+ | ----------------- | ---------- | ----------- | --------------- |
48
+ | BAGEL | 6.52 | 60.18 | 50.96 |
49
+ | Qwen-Image-Edit | **7.56** | - | 56.52 |
50
+ | LightFusion-World | 6.58 | 61.85 | - |
51
+ | UniCoT | 6.74 | 68.00 | - |
52
+ | **UniReason** | 6.94 | **68.23** | **70.06** |
53
 
54
+ ## 🛠️ Usage
55
+
56
+ ### Merge Model Files
57
+ To use the UniReason checkpoints, please merge the sharded model files first. We release both stage_1 (Foundational Generation Strengthening) and stage_2 (Interleaved Reasoning Tuning) checkpoints.
58
 
 
59
  ```bash
60
+ # Merge model weights
61
  cat model_part_* > model.safetensors
62
 
63
+ # Merge EMA weights
64
  cat ema_part_* > ema.safetensors
65
+ ```
66
+
67
+ ## ✍️ Citation
68
+ ```bibtex
69
+ @misc{wang2026unireason10unifiedreasoning,
70
+ title={UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing},
71
+ author={Dianyi Wang and Chaofan Ma and Feng Han and Size Wu and Wei Song and Yibin Wang and Zhixiong Zhang and Tianhang Wang and Siyuan Wang and Zhongyu Wei and Jiaqi Wang},
72
+ year={2026},
73
+ eprint={2602.02437},
74
+ archivePrefix={arXiv},
75
+ primaryClass={cs.CV},
76
+ url={https://arxiv.org/abs/2602.02437},
77
+ }
78
  ```