Update README.md
Browse files
README.md
CHANGED
|
@@ -54,11 +54,10 @@ Code: https://github.com/Franklin-Zhang0/Image-RL
|
|
| 54 |
|
| 55 |
## 1. Introduction
|
| 56 |
|
| 57 |
-
Although chain-of-thought (CoT) reasoning and reinforcement learning (RL) have driven breakthroughs in NLP, their integration into generative vision models remains underexplored. We introduce ReasonGen-R1, a two-stage framework that first imbues an autoregressive image generator with explicit text-based
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
Evaluations on Geneval, DPG, and the T2I benchmark demonstrate that ReasonGen-R1 consistently outperforms strong baselines and prior state-of-the-art models. We will open-source our generated reasoning dataset and training code to accelerate further advances in text-based reasoning–driven image generation.
|
| 61 |
-
|
| 62 |
<div align="center">
|
| 63 |
<img alt="image" src="images/model_structure_white_bg.png" style="width:90%;">
|
| 64 |
<br>
|
|
|
|
| 54 |
|
| 55 |
## 1. Introduction
|
| 56 |
|
| 57 |
+
Although chain-of-thought (CoT) reasoning and reinforcement learning (RL) have driven breakthroughs in NLP, their integration into generative vision models remains underexplored. We introduce ReasonGen-R1, a two-stage framework that first imbues an autoregressive image generator with explicit text-based "thinking" skills via supervised fine-tuning (SFT) on a newly generated reasoning dataset of written rationales, and then refines its outputs using Group Relative Policy Optimization (GRPO).
|
| 58 |
+
To enable the model to reason through text before generating images, We automatically generate and release a corpus of model-crafted rationales paired with visual prompts, enabling controlled planning of object layouts, styles, and scene compositions.
|
| 59 |
+
Our GRPO algorithm uses reward signals from a pretrained vision–language model to assess overall visual quality, optimizing the policy in each update.
|
| 60 |
+
Evaluations on Geneval, DPG, and the T2I benchmark demonstrate that ReasonGen-R1 consistently outperforms strong baselines and prior state-of-the-art models. We will open-source our generated reasoning dataset and training code to accelerate further advances in text-based reasoning–driven image generation.
|
|
|
|
| 61 |
<div align="center">
|
| 62 |
<img alt="image" src="images/model_structure_white_bg.png" style="width:90%;">
|
| 63 |
<br>
|