UFVideo-7B

This repository provides the complete code and datasets for UFVideo, a Video LLM that flexibly unifies general question answering, video object referring, video segmentation, and temporal video grounding to achieve multi-grained video understanding.

📥 Installation

Environment

First, clone the repository and navigate to the project folder.

git clone https://github.com/Heven-Pan/UFVideo
cd UFVideo

Then, install the requirement packages.

conda create -n UFVideo python=3.10.14
conda activate UFVideo

# our cuda version is 'cu124'
pip install -r requirements.txt
# other versions have no been verified
pip install flash-attn --no-build-isolation

For evaluation and training, please refer to the UFVideo repository.

📑 Citation

Please kindly cite our paper if you find this project helpful.

@article{pan2025ufvideo,
  title={UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models},
  author={Pan, Hewen and Wei, Cong and Liang, Dashuang and Huang, Zepeng and Gao, Pengfei and Zhou, Ziqi and Xue, Lulu and Yan, Pengfei and Wei, Xiaoming and Li, Minghui and others},
  journal={arXiv preprint arXiv:2512.11336},
  year={2025}
}

Downloads last month: 6

Safetensors

Model size

9B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Hevven/UFVideo-7B

Base model

DAMO-NLP-SG/VideoRefer-7B

Finetuned

(1)

this model

Paper for Hevven/UFVideo-7B

UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models

Paper • 2512.11336 • Published Dec 12, 2025