Spaces:
Running
Running
| title: VideoSearch-R1 | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| short_description: Video retrieval with SQR. | |
| <div align="center"> | |
| # Welcome to VideoSearch-R1 | |
| ### Iterative Video Retrieval and Reasoning via Soft Query Refinement | |
| <p> | |
| <a href="https://github.com/mlvlab/VideoSearch-R1"> | |
| <img src="https://img.shields.io/badge/GitHub-Code-181717?style=for-the-badge&logo=github" alt="GitHub"> | |
| </a> | |
| <a href="https://mlvlab.github.io/VideoSearch-R1/"> | |
| <img src="https://img.shields.io/badge/Project-Page-2b4f9e?style=for-the-badge" alt="Project Page"> | |
| </a> | |
| <a href="https://arxiv.org/abs/2607.00446"> | |
| <img src="https://img.shields.io/badge/arXiv-2607.00446-b31b1b?style=for-the-badge" alt="arXiv"> | |
| </a> | |
| <img src="https://img.shields.io/badge/ECCV-2026-4c6fff?style=for-the-badge" alt="ECCV 2026"> | |
| </p> | |
| **VideoSearch-R1** is an agentic framework for video corpus moment retrieval. It unifies inter-video retrieval and intra-video temporal reasoning through a retrieve β verify β refine β ground loop, with **Soft Query Refinement (SQR)** in the continuous query embedding space. | |
| </div> | |
| --- | |
| ## News | |
| - <img src="https://img.shields.io/badge/2026.06.17-4c6fff?style=flat-square" alt="2026.06.17"> π VideoSearch-R1 is accepted to **ECCV 2026**. | |
| - <img src="https://img.shields.io/badge/2026.06.20-18a058?style=flat-square" alt="2026.06.20"> Code released. | |
| - <img src="https://img.shields.io/badge/2026.06.20-f59e0b?style=flat-square" alt="2026.06.20"> Trained model checkpoints released. | |
| - <img src="https://img.shields.io/badge/2026.07.01-b31b1b?style=flat-square" alt="2026.07.01"> Paper preprint released on [arXiv](https://arxiv.org/abs/2607.00446). | |
| ## Released Resources | |
| | Resource | Status | Link | | |
| |---|---:|---| | |
| | Code | Released | [mlvlab/VideoSearch-R1](https://github.com/mlvlab/VideoSearch-R1) | | |
| | Project page | Released | [mlvlab.github.io/VideoSearch-R1](https://mlvlab.github.io/VideoSearch-R1/) | | |
| | Trained checkpoints | Released | See model repos below | | |
| | Paper preprint | Released | [arXiv:2607.00446](https://arxiv.org/abs/2607.00446) | | |
| ## Model Checkpoints | |
| | Dataset | Stage 1 SFT | Stage 2 GRPO | | |
| |---|---|---| | |
| | DiDeMo | [didemo-sft](https://huggingface.co/VideoSearchR1/didemo-sft) | [didemo-grpo](https://huggingface.co/VideoSearchR1/didemo-grpo) | | |
| | Charades-STA | [charades-sft](https://huggingface.co/VideoSearchR1/charades-sft) | [charades-grpo](https://huggingface.co/VideoSearchR1/charades-grpo) | | |
| | ActivityNet Captions | Coming soon | Coming soon | | |
| ## Links | |
| - [GitHub repository](https://github.com/mlvlab/VideoSearch-R1) | |
| - [Project page](https://mlvlab.github.io/VideoSearch-R1/) | |
| - [Paper](https://arxiv.org/abs/2607.00446) | |
| ## Acknowledgements | |
| VideoSearch-R1 builds on the open-source video-language and reinforcement learning ecosystem, and evaluates on VERIFIED with ActivityNet Captions, DiDeMo, and Charades-STA. We thank the benchmark and dataset creators for making these resources available to the community. | |