| --- |
| license: apache-2.0 |
| language: |
| - en |
| tags: |
| - video-language-model |
| - long-video-understanding |
| - reinforcement-learning |
| - self-correction |
| - reflection |
| - qwen2.5-vl |
| --- |
| |
| # Reflect-R1 |
|
|
| Model checkpoints for **Reflect-R1: Evidence-Driven Reflection for Self-Correction in Long Video Understanding**. |
|
|
| - Paper: https://arxiv.org/abs/2606.27922 |
| - Code: https://github.com/ShuimuChen-hyq/Reflect-R1 |
| - Data: https://huggingface.co/datasets/CSDDSFSFSAFSAF/Reflect-R1-data |
|
|
| ## Checkpoints |
|
|
| ```text |
| Reflect-R1-SFT-6000/ Cold-start SFT checkpoint. |
| Reflect-R1-GRPO-Final/ Final SD-GRPO checkpoint. |
| ``` |
|
|
| Both checkpoints are based on Qwen2.5-VL-7B and include sharded `safetensors` weights together with the corresponding tokenizer and processor configuration files. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{chen2026reflectr1, |
| title = {Reflect-R1: Evidence-Driven Reflection for Self-Correction in Long Video Understanding}, |
| author = {Shuimu Chen and Yuteng Chen and Yuanshen Guan and Zebang Cheng and Zeyu Zhang and Shengqian Qin and Bin Xia and Jiaran Li and Wenming Yang and Fei Ma}, |
| journal = {arXiv preprint arXiv:2606.27922}, |
| year = {2026} |
| } |
| ``` |
|
|