TencentARC
/

GRPO-CARE

Video-Text-to-Text

Model card Files Files and versions

Improve model card

#2

by nielsr HF Staff - opened Jun 24, 2025

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -1,10 +1,10 @@
 ---
-license: apache-2.0
 library_name: transformers
 pipeline_tag: video-text-to-text
 ---
-This repository contains the GRPO-CARE model, presented in the paper [GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning](https://huggingface.co/papers/2506.16141).
 Code released at [GRPO-CARE](https://github.com/TencentARC/GRPO-CARE).

 ---
 library_name: transformers
+license: apache-2.0
 pipeline_tag: video-text-to-text
 ---
+This repository contains the GRPO-CARE model, presented in the paper [GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning](https://huggingface.co/papers/2506.16141). GRPO-CARE is a novel consistency-aware RL framework that jointly optimizes for both answer correctness and reasoning coherence, without requiring explicit process supervision.  It introduces a two-tiered reward system to address limitations of standard outcome-supervised GRPO, improving both accuracy and logical consistency in multimodal reasoning.
 Code released at [GRPO-CARE](https://github.com/TencentARC/GRPO-CARE).