RankVideo

RankVideo is a video-native reasoning reranker for text-to-video retrieval, fine-tuned from Qwen3-VL-8B-Instruct.

The model explicitly reasons over query-video pairs using video content to assess relevance. It was introduced in the paper RANKVIDEO: Reasoning Reranking for Text-to-Video Retrieval.

Repository: https://github.com/tskow99/RANKVIDEO-Reasoning-Reranker
Paper: RANKVIDEO: Reasoning Reranking for Text-to-Video Retrieval

Training Data

This model was trained using the MultiVENT 2.0 dataset and RankVideo-Dataset.

Usage

You can use the model for scoring query-video pairs via the rankvideo library as follows:

from rankvideo import VLMReranker

reranker = VLMReranker(model_path="hltcoe/RankVideo")

# Score query-video pairs for relevance
scores = reranker.score_batch(
    queries=["person playing guitar"],
    video_paths=["/path/to/video.mp4"],
)

print(f"Relevance score: {scores[0]['logit_delta_yes_minus_no']:.3f}")

BibTeX

@misc{skow2026rankvideoreasoningrerankingtexttovideo,
      title={RANKVIDEO: Reasoning Reranking for Text-to-Video Retrieval}, 
      author={Tyler Skow and Alexander Martin and Benjamin Van Durme and Rama Chellappa and Reno Kriz},
      year={2026},
      eprint={2602.02444},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2602.02444}, 
}

Downloads last month: 32

Safetensors

Model size

9B params

Tensor type

BF16

Inference Providers NEW

Video-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hltcoe/RankVideo

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(318)

this model

Dataset used to train hltcoe/RankVideo

Paper for hltcoe/RankVideo

RANKVIDEO: Reasoning Reranking for Text-to-Video Retrieval

Paper • 2602.02444 • Published Feb 2 • 18