|
|
--- |
|
|
license: mit |
|
|
base_model: |
|
|
- DAMO-NLP-SG/VideoRefer-7B |
|
|
--- |
|
|
|
|
|
# UFVideo-7B |
|
|
|
|
|
This repository provides the complete code and datasets for UFVideo, a Video LLM that flexibly unifies general question answering, video object referring, video segmentation, and temporal video grounding to achieve multi-grained video understanding. |
|
|
|
|
|
<!-- <p align="center"><img width="750" src="https://raw.githubusercontent.com/Heven-Pan/UFVideo/refs/heads/main/figs/overall_tasks.png"></p> --> |
|
|
|
|
|
## 📥 Installation |
|
|
### Environment |
|
|
First, clone the repository and navigate to the project folder. |
|
|
```bash |
|
|
git clone https://github.com/Heven-Pan/UFVideo |
|
|
cd UFVideo |
|
|
``` |
|
|
Then, install the requirement packages. |
|
|
```bash |
|
|
conda create -n UFVideo python=3.10.14 |
|
|
conda activate UFVideo |
|
|
|
|
|
# our cuda version is 'cu124' |
|
|
pip install -r requirements.txt |
|
|
# other versions have no been verified |
|
|
pip install flash-attn --no-build-isolation |
|
|
``` |
|
|
|
|
|
#### For evaluation and training, please refer to the [UFVideo](https://github.com/Heven-Pan/UFVideo) repository. |
|
|
|
|
|
## 📑 Citation |
|
|
|
|
|
Please kindly cite our paper if you find this project helpful. |
|
|
|
|
|
``` |
|
|
@article{pan2025ufvideo, |
|
|
title={UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models}, |
|
|
author={Pan, Hewen and Wei, Cong and Liang, Dashuang and Huang, Zepeng and Gao, Pengfei and Zhou, Ziqi and Xue, Lulu and Yan, Pengfei and Wei, Xiaoming and Li, Minghui and others}, |
|
|
journal={arXiv preprint arXiv:2512.11336}, |
|
|
year={2025} |
|
|
} |
|
|
``` |