--- license: apache-2.0 language: - en tags: - video-language-model - long-video-understanding - reinforcement-learning - self-correction - reflection - qwen2.5-vl --- # Reflect-R1 Model checkpoints for **Reflect-R1: Evidence-Driven Reflection for Self-Correction in Long Video Understanding**. - Paper: https://arxiv.org/abs/2606.27922 - Code: https://github.com/ShuimuChen-hyq/Reflect-R1 - Data: https://huggingface.co/datasets/CSDDSFSFSAFSAF/Reflect-R1-data ## Checkpoints ```text Reflect-R1-SFT-6000/ Cold-start SFT checkpoint. Reflect-R1-GRPO-Final/ Final SD-GRPO checkpoint. ``` Both checkpoints are based on Qwen2.5-VL-7B and include sharded `safetensors` weights together with the corresponding tokenizer and processor configuration files. ## Citation ```bibtex @article{chen2026reflectr1, title = {Reflect-R1: Evidence-Driven Reflection for Self-Correction in Long Video Understanding}, author = {Shuimu Chen and Yuteng Chen and Yuanshen Guan and Zebang Cheng and Zeyu Zhang and Shengqian Qin and Bin Xia and Jiaran Li and Wenming Yang and Fei Ma}, journal = {arXiv preprint arXiv:2606.27922}, year = {2026} } ```