UniAVGen: Unified Audio and Video Generation with
Asymmetric Cross-Modal Interactions
Guozhen Zhang
路
Zixiang Zhou
路
Teng Hu
路
Ziqiao Peng
路
Youliang Zhang
Yi Chen
路
Yuan Zhou
路
Qinglin Lu
路
Limin Wang
MCG-NJU | Tencent Hunyuan
This repository is the checkpoint of paper "UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions". UniAVGen is a unified framework for high-fidelity joint audio-video generation, addressing key limitations of existing methods such as poor lip synchronization, insufficient semantic consistency, and limited task generalization.
Citation
If you think this project is helpful in your research or for application, please feel free to leave a star猸愶笍 and cite our paper:
@misc{zhang2025uniavgenunifiedaudiovideo,
title={UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions},
author={Guozhen Zhang and Zixiang Zhou and Teng Hu and Ziqiao Peng and Youliang Zhang and Yi Chen and Yuan Zhou and Qinglin Lu and Limin Wang},
year={2025},
eprint={2511.03334},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.03334},
}
- Downloads last month
- 96
Model tree for MCG-NJU/UniAVGen
Base model
Wan-AI/Wan2.2-TI2V-5B