Text-to-Image
Transformers
Safetensors
multi_modality
Franklin0 commited on
Commit
0a439d0
·
verified ·
1 Parent(s): ae7318d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -54,11 +54,10 @@ Code: https://github.com/Franklin-Zhang0/Image-RL
54
 
55
  ## 1. Introduction
56
 
57
- Although chain-of-thought (CoT) reasoning and reinforcement learning (RL) have driven breakthroughs in NLP, their integration into generative vision models remains underexplored. We introduce ReasonGen-R1, a two-stage framework that first imbues an autoregressive image generator with explicit text-based thinking skills via supervised fine-tuning (SFT) on a newly generated reasoning dataset of written rationales, and then refines its outputs using Generation-Reward Proximal Optimization (GRPO).
58
- Text-based CoT reasoning dataset for image synthesis. We automatically generate and release a corpus of step-by-step, model-crafted rationales paired with visual prompts, enabling controlled planning of object layouts, styles, and scene compositions.
59
- RL refinement with GRPO. Our GRPO algorithm uses reward signals from a pretrained vision–language model to assess overall visual quality, optimizing the policy in each update.
60
- Evaluations on Geneval, DPG, and the T2I benchmark demonstrate that ReasonGen-R1 consistently outperforms strong baselines and prior state-of-the-art models. We will open-source our generated reasoning dataset and training code to accelerate further advances in text-based reasoning–driven image generation.
61
-
62
  <div align="center">
63
  <img alt="image" src="images/model_structure_white_bg.png" style="width:90%;">
64
  <br>
 
54
 
55
  ## 1. Introduction
56
 
57
+ Although chain-of-thought (CoT) reasoning and reinforcement learning (RL) have driven breakthroughs in NLP, their integration into generative vision models remains underexplored. We introduce ReasonGen-R1, a two-stage framework that first imbues an autoregressive image generator with explicit text-based "thinking" skills via supervised fine-tuning (SFT) on a newly generated reasoning dataset of written rationales, and then refines its outputs using Group Relative Policy Optimization (GRPO).
58
+ To enable the model to reason through text before generating images, We automatically generate and release a corpus of model-crafted rationales paired with visual prompts, enabling controlled planning of object layouts, styles, and scene compositions.
59
+ Our GRPO algorithm uses reward signals from a pretrained vision–language model to assess overall visual quality, optimizing the policy in each update.
60
+ Evaluations on Geneval, DPG, and the T2I benchmark demonstrate that ReasonGen-R1 consistently outperforms strong baselines and prior state-of-the-art models. We will open-source our generated reasoning dataset and training code to accelerate further advances in text-based reasoning–driven image generation.
 
61
  <div align="center">
62
  <img alt="image" src="images/model_structure_white_bg.png" style="width:90%;">
63
  <br>