|
|
--- |
|
|
license: apache-2.0 |
|
|
pipeline_tag: video-text-to-text |
|
|
library_name: transformers |
|
|
datasets: |
|
|
- Video-R1/Video-R1-data |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- accuracy |
|
|
base_model: |
|
|
- Qwen/Qwen2.5-VL-7B-Instruct |
|
|
--- |
|
|
|
|
|
This repository contains the Video-R1-7B model as presented in [Video-R1: Reinforcing Video Reasoning in MLLMs](https://arxiv.org/pdf/2503.21776). |
|
|
|
|
|
For training and evaluation, please refer to the Code: https://github.com/tulerfeng/Video-R1 |
|
|
|
|
|
For inference on a single example, you may refer to: https://github.com/tulerfeng/Video-R1/blob/main/src/inference_example.py |