didemo-stage2 / README.md

happy8825

Add video-text-to-text pipeline tag, link to paper, project page, and code (#1)

a576c16 2 days ago

preview code

Raw

History Blame Contribute Delete

1.2 kB

metadata

base_model: Qwen/Qwen3-VL-4B-Instruct
library_name: transformers
license: apache-2.0
pipeline_tag: video-text-to-text
tags:
  - video-retrieval
  - temporal-grounding
  - videosearch-r1

VideoSearch-R1 DiDeMo Stage 2

This is the Stage 2 VideoSearch-R1 checkpoint trained for DiDeMo, presented in the paper VideoSearch-R1: Iterative Video Retrieval and Reasoning via Soft Query Refinement.

Project Page: mlvlab.github.io/VideoSearch-R1
Repository: GitHub - mlvlab/VideoSearch-R1

Usage

Use with the VideoSearch-R1 codebase:

bash scripts/data_construct/download_preextracted_data.bash didemo
EVAL_GPUS=0 bash scripts/inference/inference.bash didemo --checkpoint VideoSearchR1/didemo-stage2

Citation

@inproceedings{lee2026videosearchr1,
  title     = {VideoSearch-R1: Iterative Video Retrieval and Reasoning via Soft Query Refinement},
  author    = {Lee, Seohyun and Choi, Seoung and Ko, Dohwan and Kim, Jongha and Kim, Hyunwoo J.},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2026}
}