--- base_model: Qwen/Qwen3-VL-4B-Instruct library_name: transformers license: apache-2.0 pipeline_tag: video-text-to-text tags: - video-retrieval - temporal-grounding - videosearch-r1 --- # VideoSearch-R1 DiDeMo Stage 2 This is the Stage 2 VideoSearch-R1 checkpoint trained for DiDeMo, presented in the paper [VideoSearch-R1: Iterative Video Retrieval and Reasoning via Soft Query Refinement](https://huggingface.co/papers/2607.00446). - **Project Page:** [mlvlab.github.io/VideoSearch-R1](https://mlvlab.github.io/VideoSearch-R1/) - **Repository:** [GitHub - mlvlab/VideoSearch-R1](https://github.com/mlvlab/VideoSearch-R1) ## Usage Use with the VideoSearch-R1 codebase: ```bash bash scripts/data_construct/download_preextracted_data.bash didemo EVAL_GPUS=0 bash scripts/inference/inference.bash didemo --checkpoint VideoSearchR1/didemo-stage2 ``` ## Citation ```bibtex @inproceedings{lee2026videosearchr1, title = {VideoSearch-R1: Iterative Video Retrieval and Reasoning via Soft Query Refinement}, author = {Lee, Seohyun and Choi, Seoung and Ko, Dohwan and Kim, Jongha and Kim, Hyunwoo J.}, booktitle = {European Conference on Computer Vision (ECCV)}, year = {2026} } ```