Video-Text-to-Text
Transformers
Safetensors

Improve model card

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -1,10 +1,10 @@
1
  ---
2
- license: apache-2.0
3
  library_name: transformers
 
4
  pipeline_tag: video-text-to-text
5
  ---
6
 
7
- This repository contains the GRPO-CARE model, presented in the paper [GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning](https://huggingface.co/papers/2506.16141).
8
 
9
  Code released at [GRPO-CARE](https://github.com/TencentARC/GRPO-CARE).
10
 
 
1
  ---
 
2
  library_name: transformers
3
+ license: apache-2.0
4
  pipeline_tag: video-text-to-text
5
  ---
6
 
7
+ This repository contains the GRPO-CARE model, presented in the paper [GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning](https://huggingface.co/papers/2506.16141). GRPO-CARE is a novel consistency-aware RL framework that jointly optimizes for both answer correctness and reasoning coherence, without requiring explicit process supervision. It introduces a two-tiered reward system to address limitations of standard outcome-supervised GRPO, improving both accuracy and logical consistency in multimodal reasoning.
8
 
9
  Code released at [GRPO-CARE](https://github.com/TencentARC/GRPO-CARE).
10