Hevven
/

UFVideo-7B

videorefer_qwen2

Model card Files Files and versions

UFVideo-7B / README.md

Hevven's picture

Update README.md

d5f0b0e verified 27 days ago

|

history blame contribute delete

1.49 kB

	---
	license: mit
	base_model:
	- DAMO-NLP-SG/VideoRefer-7B
	---

	# UFVideo-7B

	This repository provides the complete code and datasets for UFVideo, a Video LLM that flexibly unifies general question answering, video object referring, video segmentation, and temporal video grounding to achieve multi-grained video understanding.

	<!-- <p align="center"><img width="750" src="https://raw.githubusercontent.com/Heven-Pan/UFVideo/refs/heads/main/figs/overall_tasks.png"></p> -->

	## 📥 Installation
	### Environment
	First, clone the repository and navigate to the project folder.
	```bash
	git clone https://github.com/Heven-Pan/UFVideo
	cd UFVideo
	```
	Then, install the requirement packages.
	```bash
	conda create -n UFVideo python=3.10.14
	conda activate UFVideo

	# our cuda version is 'cu124'
	pip install -r requirements.txt
	# other versions have no been verified
	pip install flash-attn --no-build-isolation
	```

	#### For evaluation and training, please refer to the [UFVideo](https://github.com/Heven-Pan/UFVideo) repository.

	## 📑 Citation

	Please kindly cite our paper if you find this project helpful.

	```
	@article{pan2025ufvideo,
	title={UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models},
	author={Pan, Hewen and Wei, Cong and Liang, Dashuang and Huang, Zepeng and Gao, Pengfei and Zhou, Ziqi and Xue, Lulu and Yan, Pengfei and Wei, Xiaoming and Li, Minghui and others},
	journal={arXiv preprint arXiv:2512.11336},
	year={2025}
	}
	```