Image-to-Video
Diffusers
Safetensors
ti2v

UniAVGen: Unified Audio and Video Generation with
Asymmetric Cross-Modal Interactions

Guozhen ZhangZixiang ZhouTeng HuZiqiao PengYouliang Zhang
Yi ChenYuan ZhouQinglin LuLimin Wang
MCG-NJU   |   Tencent Hunyuan

Paper PDF Project Page

This repository is the checkpoint of paper "UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions". UniAVGen is a unified framework for high-fidelity joint audio-video generation, addressing key limitations of existing methods such as poor lip synchronization, insufficient semantic consistency, and limited task generalization.

Citation

If you think this project is helpful in your research or for application, please feel free to leave a star猸愶笍 and cite our paper:

@misc{zhang2025uniavgenunifiedaudiovideo,
      title={UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions}, 
      author={Guozhen Zhang and Zixiang Zhou and Teng Hu and Ziqiao Peng and Youliang Zhang and Yi Chen and Yuan Zhou and Qinglin Lu and Limin Wang},
      year={2025},
      eprint={2511.03334},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.03334}, 
}
Downloads last month
96
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for MCG-NJU/UniAVGen

Finetuned
(13)
this model