|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- Wan-AI/Wan2.2-TI2V-5B |
|
|
pipeline_tag: image-to-video |
|
|
--- |
|
|
|
|
|
<p align="center"> |
|
|
|
|
|
<h2 align="center">UniAVGen: Unified Audio and Video Generation with <br> Asymmetric Cross-Modal Interactions</h2> |
|
|
<p align="center"> |
|
|
<a href="https://scholar.google.com/citations?user=48vfuRAAAAAJ&hl=zh-CN"><strong>Guozhen Zhang</strong></a> |
|
|
路 |
|
|
<a href="https://scholar.google.cz/citations?user=F2cnLlIAAAAJ&hl=zh-CN&oi=ao"><strong>Zixiang Zhou</strong></a> |
|
|
路 |
|
|
<a href="https://scholar.google.cz/citations?user=Jm5qsAYAAAAJ&hl=zh-CN&authuser=1"><strong>Teng Hu</strong></a> |
|
|
路 |
|
|
<a href="https://scholar.google.com/citations?user=gYTyZGYAAAAJ&hl=zh-CN&oi=sra"><strong>Ziqiao Peng</strong></a> |
|
|
路 |
|
|
<a href="https://github.com/angzong"><strong>Youliang Zhang</strong></a> |
|
|
<br> |
|
|
<a href="https://scholar.google.com/citations?user=dmdhJjgAAAAJ&hl=zh-CN"><strong>Yi Chen</strong></a> |
|
|
路 |
|
|
<a href="https://openreview.net/profile?id=~Yuan_Zhou12"><strong>Yuan Zhou</strong></a> |
|
|
路 |
|
|
<a href="https://openreview.net/profile?id=~Qinglin_Lu2"><strong>Qinglin Lu</strong></a> |
|
|
路 |
|
|
<a href="https://scholar.google.com/citations?user=HEuN8PcAAAAJ&hl=en"><strong>Limin Wang</strong></a> |
|
|
<br> |
|
|
<b></a>MCG-NJU | </a> Tencent Hunyuan </b> |
|
|
<br><br> |
|
|
<a href="https://arxiv.org/pdf/2511.03334"><img src='https://img.shields.io/badge/arXiv-2511.03334-red' alt='Paper PDF'></a> |
|
|
<a href='https://mcg-nju.github.io/UniAVGen/'><img src='https://img.shields.io/badge/Project-Page-blue' alt='Project Page'></a> |
|
|
<a href='https://github.com/MCG-NJU/Sora2-mini'><img src='https://img.shields.io/badge/Github-UniAVGen-orange'></a> |
|
|
<a href='https://huggingface.co/MCG-NJU/UniAVGen'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a> |
|
|
<br> |
|
|
</p> |
|
|
</p> |
|
|
|
|
|
This repository is the `checkpoint` of paper "UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions". UniAVGen is a unified framework for high-fidelity joint audio-video generation, addressing key limitations of existing methods such as poor lip synchronization, insufficient semantic consistency, and limited task generalization. |
|
|
|
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
If you think this project is helpful in your research or for application, please feel free to leave a star猸愶笍 and cite our paper: |
|
|
|
|
|
```BibTeX |
|
|
@misc{zhang2025uniavgenunifiedaudiovideo, |
|
|
title={UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions}, |
|
|
author={Guozhen Zhang and Zixiang Zhou and Teng Hu and Ziqiao Peng and Youliang Zhang and Yi Chen and Yuan Zhou and Qinglin Lu and Limin Wang}, |
|
|
year={2025}, |
|
|
eprint={2511.03334}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV}, |
|
|
url={https://arxiv.org/abs/2511.03334}, |
|
|
} |
|
|
``` |