Instructions to use TencentARC/GRPO-CARE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TencentARC/GRPO-CARE with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("TencentARC/GRPO-CARE", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Improve model card
#2
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,10 +1,10 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
library_name: transformers
|
|
|
|
| 4 |
pipeline_tag: video-text-to-text
|
| 5 |
---
|
| 6 |
|
| 7 |
-
This repository contains the GRPO-CARE model, presented in the paper [GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning](https://huggingface.co/papers/2506.16141).
|
| 8 |
|
| 9 |
Code released at [GRPO-CARE](https://github.com/TencentARC/GRPO-CARE).
|
| 10 |
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
+
license: apache-2.0
|
| 4 |
pipeline_tag: video-text-to-text
|
| 5 |
---
|
| 6 |
|
| 7 |
+
This repository contains the GRPO-CARE model, presented in the paper [GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning](https://huggingface.co/papers/2506.16141). GRPO-CARE is a novel consistency-aware RL framework that jointly optimizes for both answer correctness and reasoning coherence, without requiring explicit process supervision. It introduces a two-tiered reward system to address limitations of standard outcome-supervised GRPO, improving both accuracy and logical consistency in multimodal reasoning.
|
| 8 |
|
| 9 |
Code released at [GRPO-CARE](https://github.com/TencentARC/GRPO-CARE).
|
| 10 |
|