| base_model: Qwen/Qwen3-VL-8B-Instruct | |
| language: | |
| - en | |
| license: mit | |
| pipeline_tag: video-text-to-text | |
| library_name: transformers | |
| arxiv: 2602.02444 | |
| tags: | |
| - video | |
| - retrieval | |
| - reranking | |
| - qwen3-vl | |
| # RankVideo | |
| RankVideo is a video-native reasoning reranker for text-to-video retrieval, fine-tuned from [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct). | |
| The model explicitly reasons over query-video pairs using video content to assess relevance. It was introduced in the paper [RANKVIDEO: Reasoning Reranking for Text-to-Video Retrieval](https://huggingface.co/papers/2602.02444). | |
| - **Repository:** [https://github.com/tskow99/RANKVIDEO-Reasoning-Reranker](https://github.com/tskow99/RANKVIDEO-Reasoning-Reranker) | |
| - **Paper:** [RANKVIDEO: Reasoning Reranking for Text-to-Video Retrieval](https://arxiv.org/abs/2602.02444) | |
| ## Training Data | |
| This model was trained using the [MultiVENT 2.0 dataset](https://huggingface.co/datasets/hltcoe/MultiVENT2.0). | |
| ## Usage | |
| You can use the model for scoring query-video pairs via the `rankvideo` library as follows: | |
| ```python | |
| from rankvideo import VLMReranker | |
| reranker = VLMReranker(model_path="hltcoe/RankVideo") | |
| # Score query-video pairs for relevance | |
| scores = reranker.score_batch( | |
| queries=["person playing guitar"], | |
| video_paths=["/path/to/video.mp4"], | |
| ) | |
| print(f"Relevance score: {scores[0]['logit_delta_yes_minus_no']:.3f}") | |
| ``` | |
| ## BibTeX | |
| ```bibtex | |
| @misc{skow2026rankvideoreasoningrerankingtexttovideo, | |
| title={RANKVIDEO: Reasoning Reranking for Text-to-Video Retrieval}, | |
| author={Tyler Skow and Alexander Martin and Benjamin Van Durme and Rama Chellappa and Reno Kriz}, | |
| year={2026}, | |
| eprint={2602.02444}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.IR}, | |
| url={https://arxiv.org/abs/2602.02444}, | |
| } | |
| ``` |