UFVideo-7B

This repository provides the complete code and datasets for UFVideo, a Video LLM that flexibly unifies general question answering, video object referring, video segmentation, and temporal video grounding to achieve multi-grained video understanding.

πŸ“₯ Installation

Environment

First, clone the repository and navigate to the project folder.

git clone https://github.com/Heven-Pan/UFVideo
cd UFVideo

Then, install the requirement packages.

conda create -n UFVideo python=3.10.14
conda activate UFVideo

# our cuda version is 'cu124'
pip install -r requirements.txt
# other versions have no been verified
pip install flash-attn --no-build-isolation

For evaluation and training, please refer to the UFVideo repository.

πŸ“‘ Citation

Please kindly cite our paper if you find this project helpful.

@article{pan2025ufvideo,
  title={UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models},
  author={Pan, Hewen and Wei, Cong and Liang, Dashuang and Huang, Zepeng and Gao, Pengfei and Zhou, Ziqi and Xue, Lulu and Yan, Pengfei and Wei, Xiaoming and Li, Minghui and others},
  journal={arXiv preprint arXiv:2512.11336},
  year={2025}
}
Downloads last month
14
Safetensors
Model size
9B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Hevven/UFVideo-7B

Finetuned
(1)
this model