Image-to-Video
Diffusers
Safetensors
ti2v
UniAVGen / README.md
zgzaacm's picture
Update README.md
53c0561 verified
---
license: apache-2.0
base_model:
- Wan-AI/Wan2.2-TI2V-5B
pipeline_tag: image-to-video
---
<p align="center">
<h2 align="center">UniAVGen: Unified Audio and Video Generation with <br> Asymmetric Cross-Modal Interactions</h2>
<p align="center">
<a href="https://scholar.google.com/citations?user=48vfuRAAAAAJ&hl=zh-CN"><strong>Guozhen Zhang</strong></a>
<a href="https://scholar.google.cz/citations?user=F2cnLlIAAAAJ&hl=zh-CN&oi=ao"><strong>Zixiang Zhou</strong></a>
<a href="https://scholar.google.cz/citations?user=Jm5qsAYAAAAJ&hl=zh-CN&authuser=1"><strong>Teng Hu</strong></a>
<a href="https://scholar.google.com/citations?user=gYTyZGYAAAAJ&hl=zh-CN&oi=sra"><strong>Ziqiao Peng</strong></a>
<a href="https://github.com/angzong"><strong>Youliang Zhang</strong></a>
<br>
<a href="https://scholar.google.com/citations?user=dmdhJjgAAAAJ&hl=zh-CN"><strong>Yi Chen</strong></a>
<a href="https://openreview.net/profile?id=~Yuan_Zhou12"><strong>Yuan Zhou</strong></a>
<a href="https://openreview.net/profile?id=~Qinglin_Lu2"><strong>Qinglin Lu</strong></a>
<a href="https://scholar.google.com/citations?user=HEuN8PcAAAAJ&hl=en"><strong>Limin Wang</strong></a>
<br>
<b></a>MCG-NJU &nbsp; | &nbsp; </a> Tencent Hunyuan </b>
<br><br>
<a href="https://arxiv.org/pdf/2511.03334"><img src='https://img.shields.io/badge/arXiv-2511.03334-red' alt='Paper PDF'></a>
<a href='https://mcg-nju.github.io/UniAVGen/'><img src='https://img.shields.io/badge/Project-Page-blue' alt='Project Page'></a>
<a href='https://github.com/MCG-NJU/Sora2-mini'><img src='https://img.shields.io/badge/Github-UniAVGen-orange'></a>
<a href='https://huggingface.co/MCG-NJU/UniAVGen'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>
<br>
</p>
</p>
This repository is the `checkpoint` of paper "UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions". UniAVGen is a unified framework for high-fidelity joint audio-video generation, addressing key limitations of existing methods such as poor lip synchronization, insufficient semantic consistency, and limited task generalization.
## Citation
If you think this project is helpful in your research or for application, please feel free to leave a star猸愶笍 and cite our paper:
```BibTeX
@misc{zhang2025uniavgenunifiedaudiovideo,
title={UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions},
author={Guozhen Zhang and Zixiang Zhou and Teng Hu and Ziqiao Peng and Youliang Zhang and Yi Chen and Yuan Zhou and Qinglin Lu and Limin Wang},
year={2025},
eprint={2511.03334},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.03334},
}
```