| base_model: | |
| - deepseek-ai/Janus-Pro-7B | |
| datasets: | |
| - Franklin0/ReasonGen-R1-SFT-230k | |
| library_name: transformers | |
| license: apache-2.0 | |
| pipeline_tag: text-to-image | |
| # Model Card for ReasonGen-R1 (SFT Only) | |
| ReasonGen-R1 (SFT Only) is a text-to-image model fine-tuned using supervised fine-tuning (SFT) on a dataset of image prompts and rationales. It's based on the deepseek-ai/Janus-Pro-7B model and is described in the paper: "[ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL](https://huggingface.co/papers/2505.24875)". | |
| Website: https://aka.ms/reasongen | |
| Code: https://github.com/Franklin-Zhang0/Image-RL | |
| Arxiv: https://arxiv.org/abs/2505.24875 | |
| ## 1. Introduction | |
| Although chain-of-thought (CoT) reasoning and reinforcement learning (RL) have driven breakthroughs in NLP, their integration into generative vision models remains underexplored. We introduce ReasonGen-R1, a two-stage framework that first imbues an autoregressive image generator with explicit text-based "thinking" skills via supervised fine-tuning (SFT) on a newly generated reasoning dataset of written rationales, and then refines its outputs using Group Relative Policy Optimization (GRPO). | |
| To enable the model to reason through text before generating images, We automatically generate and release a corpus of model-crafted rationales paired with visual prompts, enabling controlled planning of object layouts, styles, and scene compositions. | |
| Our GRPO algorithm uses reward signals from a pretrained vision–language model to assess overall visual quality, optimizing the policy in each update. | |
| Evaluations on Geneval, DPG, and the T2I benchmark demonstrate that ReasonGen-R1 consistently outperforms strong baselines and prior state-of-the-art models. We will open-source our generated reasoning dataset and training code to accelerate further advances in text-based reasoning–driven image generation. | |
| ## 4. Acknowledgements | |
| We would like to thank Verl, upon which our repo is built. |