Improve model card
#2
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,10 +1,10 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
library_name: transformers
|
|
|
|
| 4 |
pipeline_tag: video-text-to-text
|
| 5 |
---
|
| 6 |
|
| 7 |
-
This repository contains the GRPO-CARE model, presented in the paper [GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning](https://huggingface.co/papers/2506.16141).
|
| 8 |
|
| 9 |
Code released at [GRPO-CARE](https://github.com/TencentARC/GRPO-CARE).
|
| 10 |
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
+
license: apache-2.0
|
| 4 |
pipeline_tag: video-text-to-text
|
| 5 |
---
|
| 6 |
|
| 7 |
+
This repository contains the GRPO-CARE model, presented in the paper [GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning](https://huggingface.co/papers/2506.16141). GRPO-CARE is a novel consistency-aware RL framework that jointly optimizes for both answer correctness and reasoning coherence, without requiring explicit process supervision. It introduces a two-tiered reward system to address limitations of standard outcome-supervised GRPO, improving both accuracy and logical consistency in multimodal reasoning.
|
| 8 |
|
| 9 |
Code released at [GRPO-CARE](https://github.com/TencentARC/GRPO-CARE).
|
| 10 |
|