UniAVGen: Unified Audio and Video Generation with
Asymmetric Cross-Modal Interactions

Guozhen Zhang · Zixiang Zhou · Teng Hu · Ziqiao Peng · Youliang Zhang
Yi Chen · Yuan Zhou · Qinglin Lu · Limin Wang
MCG-NJU | Tencent Hunyuan

This repository is the checkpoint of paper "UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions". UniAVGen is a unified framework for high-fidelity joint audio-video generation, addressing key limitations of existing methods such as poor lip synchronization, insufficient semantic consistency, and limited task generalization.

Citation

If you think this project is helpful in your research or for application, please feel free to leave a star⭐️ and cite our paper:

@misc{zhang2025uniavgenunifiedaudiovideo,
      title={UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions}, 
      author={Guozhen Zhang and Zixiang Zhou and Teng Hu and Ziqiao Peng and Youliang Zhang and Yi Chen and Yuan Zhou and Qinglin Lu and Limin Wang},
      year={2025},
      eprint={2511.03334},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.03334}, 
}

Downloads last month: 33

Model tree for MCG-NJU/UniAVGen

Base model

Wan-AI/Wan2.2-TI2V-5B

Finetuned

(61)

this model

Paper for MCG-NJU/UniAVGen

UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

Paper • 2511.03334 • Published Nov 5, 2025 • 54