How to use from the
Use from the
Transformers library
# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("VideoSearchR1/activitynet-stage2")
model = AutoModelForMultimodalLM.from_pretrained("VideoSearchR1/activitynet-stage2")
Quick Links

VideoSearch-R1 ActivityNet Stage 2

This is the Stage 2 VideoSearch-R1 checkpoint trained for ActivityNet, presented in the paper VideoSearch-R1: Iterative Video Retrieval and Reasoning via Soft Query Refinement.

Stage 2 starts from the ActivityNet Stage 1 checkpoint and optimizes iterative retrieval and temporal grounding behavior with the VideoSearch-R1 training pipeline.

Usage

Use with the VideoSearch-R1 codebase:

bash scripts/data_construct/download_preextracted_data.bash activitynet
EVAL_GPUS=0 bash scripts/inference/inference.bash activitynet --checkpoint VideoSearchR1/activitynet-stage2

Citation

@inproceedings{lee2026videosearchr1,
  title     = {VideoSearch-R1: Iterative Video Retrieval and Reasoning via Soft Query Refinement},
  author    = {Lee, Seohyun and Choi, Seoung and Ko, Dohwan and Kim, Jongha and Kim, Hyunwoo J.},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2026}
}
Downloads last month
22
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for VideoSearchR1/activitynet-stage2

Finetuned
(331)
this model

Paper for VideoSearchR1/activitynet-stage2