Add model card for GoT-R1-1B

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +42 -0
README.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: text-to-image
4
+ ---
5
+
6
+ # GoT-R1-1B
7
+
8
+ GoT-R1 is a framework that applies reinforcement learning to enhance semantic-spatial reasoning in visual generation, as presented in [GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning](https://huggingface.co/papers/2505.17022).
9
+
10
+ ## Introduction
11
+
12
+ Visual generation models often struggle with complex prompts specifying multiple objects with precise spatial relationships. **GoT-R1** addresses this by applying reinforcement learning to enhance semantic-spatial reasoning. Building upon the Generation Chain-of-Thought (GoT) approach, GoT-R1 enables models to autonomously discover effective reasoning strategies through a dual-stage multi-dimensional reward framework.
13
+
14
+ - **Enhanced Semantic-Spatial Reasoning**: Uses RL to improve planning of complex scenes.
15
+ - **Autonomous Reasoning Chain Discovery**: Moves beyond fixed templates to allow the model to explore more effective reasoning paths.
16
+ - **Comprehensive MLLM-based Rewards**: Evaluates both the intermediate reasoning process and the final visual output.
17
+
18
+ ## Resources
19
+
20
+ - **GitHub Repository**: [https://github.com/gogoduan/GoT-R1](https://github.com/gogoduan/GoT-R1)
21
+ - **Paper**: [GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning](https://huggingface.co/papers/2505.17022)
22
+
23
+ ## Usage
24
+
25
+ For installation and setup, please refer to the [official GitHub repository](https://github.com/gogoduan/GoT-R1). To run inference using the provided script from the repository:
26
+
27
+ ```bash
28
+ python infer.py --ckpt_path <Your GoT-R1 checkpoint path>
29
+ ```
30
+
31
+ ## Citation
32
+
33
+ If you find this work helpful, please consider citing the paper:
34
+
35
+ ```bibtex
36
+ @article{duan2025got,
37
+ title={GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning},
38
+ author={Duan, Chengqi and Fang, General and Wang, Yuqing and Wang, Kun and Huang, Linjiang and Zeng, Xingyu and Li, Hongsheng and Liu, Xihui},
39
+ journal={arXiv preprint arXiv:2505.17022},
40
+ year={2025}
41
+ }
42
+ ```