--- license: apache-2.0 base_model: - Wan-AI/Wan2.2-TI2V-5B pipeline_tag: image-to-video ---

UniAVGen: Unified Audio and Video Generation with
Asymmetric Cross-Modal Interactions

Guozhen Zhang · Zixiang Zhou · Teng Hu · Ziqiao Peng · Youliang Zhang
Yi Chen · Yuan Zhou · Qinglin Lu · Limin Wang
MCG-NJU   |   Tencent Hunyuan

Paper PDF Project Page

This repository is the `checkpoint` of paper "UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions". UniAVGen is a unified framework for high-fidelity joint audio-video generation, addressing key limitations of existing methods such as poor lip synchronization, insufficient semantic consistency, and limited task generalization. ## Citation If you think this project is helpful in your research or for application, please feel free to leave a star⭐️ and cite our paper: ```BibTeX @misc{zhang2025uniavgenunifiedaudiovideo, title={UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions}, author={Guozhen Zhang and Zixiang Zhou and Teng Hu and Ziqiao Peng and Youliang Zhang and Yi Chen and Yuan Zhou and Qinglin Lu and Limin Wang}, year={2025}, eprint={2511.03334}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2511.03334}, } ```