MCG-NJU
/

UniAVGen

Model card Files Files and versions

UniAVGen / README.md

zgzaacm's picture

Update README.md

53c0561 verified about 1 month ago

|

history blame contribute delete

2.87 kB

	---
	license: apache-2.0
	base_model:
	- Wan-AI/Wan2.2-TI2V-5B
	pipeline_tag: image-to-video
	---

	<p align="center">

	<h2 align="center">UniAVGen: Unified Audio and Video Generation with <br> Asymmetric Cross-Modal Interactions</h2>
	<p align="center">
	<a href="https://scholar.google.com/citations?user=48vfuRAAAAAJ&hl=zh-CN"><strong>Guozhen Zhang</strong></a>
	·
	<a href="https://scholar.google.cz/citations?user=F2cnLlIAAAAJ&hl=zh-CN&oi=ao"><strong>Zixiang Zhou</strong></a>
	·
	<a href="https://scholar.google.cz/citations?user=Jm5qsAYAAAAJ&hl=zh-CN&authuser=1"><strong>Teng Hu</strong></a>
	·
	<a href="https://scholar.google.com/citations?user=gYTyZGYAAAAJ&hl=zh-CN&oi=sra"><strong>Ziqiao Peng</strong></a>
	·
	<a href="https://github.com/angzong"><strong>Youliang Zhang</strong></a>
	<br>
	<a href="https://scholar.google.com/citations?user=dmdhJjgAAAAJ&hl=zh-CN"><strong>Yi Chen</strong></a>
	·
	<a href="https://openreview.net/profile?id=~Yuan_Zhou12"><strong>Yuan Zhou</strong></a>
	·
	<a href="https://openreview.net/profile?id=~Qinglin_Lu2"><strong>Qinglin Lu</strong></a>
	·
	<a href="https://scholar.google.com/citations?user=HEuN8PcAAAAJ&hl=en"><strong>Limin Wang</strong></a>
	<br>
	<b></a>MCG-NJU   \|   </a> Tencent Hunyuan </b>
	<br><br>
	<a href="https://arxiv.org/pdf/2511.03334"><img src='https://img.shields.io/badge/arXiv-2511.03334-red' alt='Paper PDF'></a>
	<a href='https://mcg-nju.github.io/UniAVGen/'><img src='https://img.shields.io/badge/Project-Page-blue' alt='Project Page'></a>
	<a href='https://github.com/MCG-NJU/Sora2-mini'><img src='https://img.shields.io/badge/Github-UniAVGen-orange'></a>
	<a href='https://huggingface.co/MCG-NJU/UniAVGen'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>
	<br>
	</p>
	</p>

	This repository is the `checkpoint` of paper "UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions". UniAVGen is a unified framework for high-fidelity joint audio-video generation, addressing key limitations of existing methods such as poor lip synchronization, insufficient semantic consistency, and limited task generalization.



	## Citation

	If you think this project is helpful in your research or for application, please feel free to leave a star⭐️ and cite our paper:

	```BibTeX
	@misc{zhang2025uniavgenunifiedaudiovideo,
	title={UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions},
	author={Guozhen Zhang and Zixiang Zhou and Teng Hu and Ziqiao Peng and Youliang Zhang and Yi Chen and Yuan Zhou and Qinglin Lu and Limin Wang},
	year={2025},
	eprint={2511.03334},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2511.03334},
	}
	```