File size: 1,487 Bytes
66b2d3c d5f0b0e 66b2d3c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
---
license: mit
base_model:
- DAMO-NLP-SG/VideoRefer-7B
---
# UFVideo-7B
This repository provides the complete code and datasets for UFVideo, a Video LLM that flexibly unifies general question answering, video object referring, video segmentation, and temporal video grounding to achieve multi-grained video understanding.
<!-- <p align="center"><img width="750" src="https://raw.githubusercontent.com/Heven-Pan/UFVideo/refs/heads/main/figs/overall_tasks.png"></p> -->
## 📥 Installation
### Environment
First, clone the repository and navigate to the project folder.
```bash
git clone https://github.com/Heven-Pan/UFVideo
cd UFVideo
```
Then, install the requirement packages.
```bash
conda create -n UFVideo python=3.10.14
conda activate UFVideo
# our cuda version is 'cu124'
pip install -r requirements.txt
# other versions have no been verified
pip install flash-attn --no-build-isolation
```
#### For evaluation and training, please refer to the [UFVideo](https://github.com/Heven-Pan/UFVideo) repository.
## 📑 Citation
Please kindly cite our paper if you find this project helpful.
```
@article{pan2025ufvideo,
title={UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models},
author={Pan, Hewen and Wei, Cong and Liang, Dashuang and Huang, Zepeng and Gao, Pengfei and Zhou, Ziqi and Xue, Lulu and Yan, Pengfei and Wei, Xiaoming and Li, Minghui and others},
journal={arXiv preprint arXiv:2512.11336},
year={2025}
}
``` |