---
license: apache-2.0
base_model:
- Wan-AI/Wan2.2-TI2V-5B
pipeline_tag: image-to-video
---
UniAVGen: Unified Audio and Video Generation with
Asymmetric Cross-Modal Interactions
Guozhen Zhang
·
Zixiang Zhou
·
Teng Hu
·
Ziqiao Peng
·
Youliang Zhang
Yi Chen
·
Yuan Zhou
·
Qinglin Lu
·
Limin Wang
MCG-NJU | Tencent Hunyuan
This repository is the `checkpoint` of paper "UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions". UniAVGen is a unified framework for high-fidelity joint audio-video generation, addressing key limitations of existing methods such as poor lip synchronization, insufficient semantic consistency, and limited task generalization.
## Citation
If you think this project is helpful in your research or for application, please feel free to leave a star⭐️ and cite our paper:
```BibTeX
@misc{zhang2025uniavgenunifiedaudiovideo,
title={UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions},
author={Guozhen Zhang and Zixiang Zhou and Teng Hu and Ziqiao Peng and Youliang Zhang and Yi Chen and Yuan Zhou and Qinglin Lu and Limin Wang},
year={2025},
eprint={2511.03334},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.03334},
}
```