|
|
--- |
|
|
base_model: |
|
|
- deepseek-ai/Janus-Pro-7B |
|
|
datasets: |
|
|
- Franklin0/ReasonGen-R1-SFT-230k |
|
|
library_name: transformers |
|
|
license: apache-2.0 |
|
|
pipeline_tag: text-to-image |
|
|
--- |
|
|
|
|
|
# Model Card for ReasonGen-R1 (SFT Only) |
|
|
|
|
|
ReasonGen-R1 (SFT Only) is a text-to-image model fine-tuned using supervised fine-tuning (SFT) on a dataset of image prompts and rationales. It's based on the deepseek-ai/Janus-Pro-7B model and is described in the paper: "[ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL](https://huggingface.co/papers/2505.24875)". |
|
|
|
|
|
Website: https://aka.ms/reasongen |
|
|
|
|
|
Code: https://github.com/Franklin-Zhang0/Image-RL |
|
|
|
|
|
Arxiv: https://arxiv.org/abs/2505.24875 |
|
|
|
|
|
## 1. Introduction |
|
|
|
|
|
Although chain-of-thought (CoT) reasoning and reinforcement learning (RL) have driven breakthroughs in NLP, their integration into generative vision models remains underexplored. We introduce ReasonGen-R1, a two-stage framework that first imbues an autoregressive image generator with explicit text-based "thinking" skills via supervised fine-tuning (SFT) on a newly generated reasoning dataset of written rationales, and then refines its outputs using Group Relative Policy Optimization (GRPO). |
|
|
To enable the model to reason through text before generating images, We automatically generate and release a corpus of model-crafted rationales paired with visual prompts, enabling controlled planning of object layouts, styles, and scene compositions. |
|
|
Our GRPO algorithm uses reward signals from a pretrained vision–language model to assess overall visual quality, optimizing the policy in each update. |
|
|
Evaluations on Geneval, DPG, and the T2I benchmark demonstrate that ReasonGen-R1 consistently outperforms strong baselines and prior state-of-the-art models. We will open-source our generated reasoning dataset and training code to accelerate further advances in text-based reasoning–driven image generation. |
|
|
|
|
|
## 4. Acknowledgements |
|
|
|
|
|
We would like to thank Verl, upon which our repo is built. |