File size: 1,135 Bytes
147cb3e ddb5844 1b174c1 147cb3e c8a30f4 147cb3e 28a1978 147cb3e 28a1978 43a4bad 28a1978 1b174c1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
---
license: apache-2.0
pipeline_tag: video-text-to-text
library_name: transformers
---
**<center><span style="font-size:2em;">TinyLLaVA-Video-R1</span></center>**
[](https://arxiv.org/abs/2504.09641)[](https://github.com/ZhangXJ199/TinyLLaVA-Video-R1)
Here, we introduce a small-scale video reasoning model TinyLLaVA-Video-R1, based on the traceably trained model [TinyLLaVA-Video](https://github.com/ZhangXJ199/TinyLLaVA-Video). After reinforcement learning on general Video-QA datasets, the model not only significantly improves its reasoning and thinking abilities, but also exhibits the emergent characteristic of “aha moments”.
### Result
| Model (HF Path) | Video-MME | MVBench | MLVU | MMVU |
| :----------------------------------------: | ------------- | ------- | -------------- | ---------- |
| [Zhang199/TinyLLaVA-Video-R1](https://huggingface.co/Zhang199/TinyLLaVA-Video-R1) | 46.6 | 49.5 | 52.4 | 46.9 | |