| <h1 align="center">From Inpainting to Editing: Unlocking Robust Mask-Free Visual Dubbing via Generative Bootstrapping</h1> |
|
|
| <div align='center'> |
| <p> |
| <a href="https://scholar.google.com/citations?user=KMrFk2MAAAAJ&hl=en&oi=sra">Xu He</a><sup>1,*</sup> |
| <a href="">Haoxian Zhang</a><sup>2,β </sup> |
| <a href="">Hejia Chen</a><sup>3</sup> |
| <a href="">Changyuan Zheng</a><sup>1</sup> |
| <a href="">Liyang Chen</a><sup>1</sup> |
| <br> |
| <a href="">Songlin Tang</a><sup>2</sup> |
| <a href="">Jiehui Huang</a><sup>4</sup> |
| <a href="">Xiaoqiang Liu</a><sup>2</sup> |
| <a href="">Pengfei Wan</a><sup>2</sup> |
| <a href="">Zhiyong Wu</a><sup>1,5,β</sup> |
| </p> |
| <p> |
| <sup>1</sup>Tsinghua University |
| <sup>2</sup>Kling Team, Kuaishou Technology |
| <sup>3</sup>Beihang University |
| <sup>4</sup>HKUST |
| <sup>5</sup>CUHK |
| <br> |
| <sup>*</sup>Work done at Kling Team, Kuaishou Technology |
| <sup>β </sup>Project leader |
| <sup>β</sup>Corresponding author |
| </p> |
| |
| |
| </div> |
|
|
| <p align="center"> |
| <a href="https://hjrphoebus.github.io/X-Dub/"><img src="https://img.shields.io/badge/Project-Homepage-green"></a> |
| |
| <a href="https://arxiv.org/abs/2512.25066"><img src="https://img.shields.io/static/v1?label=ArXiv&message=X-Dub&color=red&logo=arxiv"></a> |
| |
| <a href="https://github.com/KlingAIResearch/X-Dub"><img src="https://img.shields.io/badge/GitHub-X--Dub-black?logo=github"></a> |
| </p> |
|
|
| Please refer to the [GitHub README](https://github.com/KlingAIResearch/X-Dub) for usage. |
|
|
| * Paper: [https://arxiv.org/abs/2512.25066](https://arxiv.org/abs/2512.25066) |
| * Project Page: [https://hjrphoebus.github.io/X-Dub/](https://hjrphoebus.github.io/X-Dub/) |
| * Code: [https://github.com/KlingAIResearch/X-Dub](https://github.com/KlingAIResearch/X-Dub) |
|
|
|
|
| ## π TL;DR |
| X-Dub is a visual dubbing system that synchronizes a character's lip movements in a video to match arbitrary input audio. This repository hosts the public Wan-based X-Dub release and its pretrained weights. |
|
|
|
|
| ## π Citation |
| Please cite our paper if you find our work helpful. |
|
|
| ```bibtex |
| @article{he2025from, |
| title={From Inpainting to Editing: A Self-Bootstrapping Framework for Context-Rich Visual Dubbing}, |
| author={He, Xu and Zhang, Haoxian and Chen, Hejia and Zheng, Changyuan and Chen, Liyang and Tang, Songlin and Huang, Jiehui and Liu, Xiaoqiang and Wan, Pengfei and Wu, Zhiyong}, |
| journal={arXiv preprint arXiv:2512.25066}, |
| year={2025} |
| } |
| ``` |
|
|