UFVideo-7B
This repository provides the complete code and datasets for UFVideo, a Video LLM that flexibly unifies general question answering, video object referring, video segmentation, and temporal video grounding to achieve multi-grained video understanding.
π₯ Installation
Environment
First, clone the repository and navigate to the project folder.
git clone https://github.com/Heven-Pan/UFVideo
cd UFVideo
Then, install the requirement packages.
conda create -n UFVideo python=3.10.14
conda activate UFVideo
# our cuda version is 'cu124'
pip install -r requirements.txt
# other versions have no been verified
pip install flash-attn --no-build-isolation
For evaluation and training, please refer to the UFVideo repository.
π Citation
Please kindly cite our paper if you find this project helpful.
@article{pan2025ufvideo,
title={UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models},
author={Pan, Hewen and Wei, Cong and Liang, Dashuang and Huang, Zepeng and Gao, Pengfei and Zhou, Ziqi and Xue, Lulu and Yan, Pengfei and Wei, Xiaoming and Li, Minghui and others},
journal={arXiv preprint arXiv:2512.11336},
year={2025}
}
- Downloads last month
- 14
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for Hevven/UFVideo-7B
Base model
DAMO-NLP-SG/VideoRefer-7B