From Inpainting to Editing: Unlocking Robust Mask-Free Visual Dubbing via Generative Bootstrapping

Xu He^1,* Haoxian Zhang^2,† Hejia Chen³ Changyuan Zheng¹ Liyang Chen¹
Songlin Tang² Jiehui Huang⁴ Xiaoqiang Liu² Pengfei Wan² Zhiyong Wu^1,5,✉

¹Tsinghua University ²Kling Team, Kuaishou Technology ³Beihang University ⁴HKUST ⁵CUHK
^*Work done at Kling Team, Kuaishou Technology ^†Project leader ^✉Corresponding author

Please refer to the [GitHub README](https://github.com/KlingAIResearch/X-Dub) for usage. * Paper: [https://arxiv.org/abs/2512.25066](https://arxiv.org/abs/2512.25066) * Project Page: [https://hjrphoebus.github.io/X-Dub/](https://hjrphoebus.github.io/X-Dub/) * Code: [https://github.com/KlingAIResearch/X-Dub](https://github.com/KlingAIResearch/X-Dub) ## 📌 TL;DR X-Dub is a visual dubbing system that synchronizes a character's lip movements in a video to match arbitrary input audio. This repository hosts the public Wan-based X-Dub release and its pretrained weights. ## 🌟 Citation Please cite our paper if you find our work helpful. ```bibtex @article{he2025from, title={From Inpainting to Editing: A Self-Bootstrapping Framework for Context-Rich Visual Dubbing}, author={He, Xu and Zhang, Haoxian and Chen, Hejia and Zheng, Changyuan and Chen, Liyang and Tang, Songlin and Huang, Jiehui and Liu, Xiaoqiang and Wan, Pengfei and Wu, Zhiyong}, journal={arXiv preprint arXiv:2512.25066}, year={2025} } ```