Update README.md
Browse files
README.md
CHANGED
|
@@ -12,8 +12,8 @@ library_name: transformers
|
|
| 12 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 13 |
R1-VL-7B is a reasoning model trained with step-wise group relative policy optimization (StepGRPO).
|
| 14 |
|
| 15 |
-
## Paper: https://arxiv.org/pdf/2503.12937
|
| 16 |
|
| 17 |
-
## Github: https://github.com/jingyi0000/R1-VL
|
| 18 |
|
| 19 |
-
## Base model: https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct
|
|
|
|
| 12 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 13 |
R1-VL-7B is a reasoning model trained with step-wise group relative policy optimization (StepGRPO).
|
| 14 |
|
| 15 |
+
### Paper: https://arxiv.org/pdf/2503.12937
|
| 16 |
|
| 17 |
+
### Github: https://github.com/jingyi0000/R1-VL
|
| 18 |
|
| 19 |
+
### Base model: https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct
|