From Inpainting to Editing: Unlocking Robust Mask-Free Visual Dubbing via Generative Bootstrapping

Xu He1,* Haoxian Zhang2,† Hejia Chen3 Changyuan Zheng1 Liyang Chen1
Songlin Tang2 Jiehui Huang4 Xiaoqiang Liu2 Pengfei Wan2 Zhiyong Wu1,5,✉

1Tsinghua University    2Kling Team, Kuaishou Technology    3Beihang University    4HKUST    5CUHK
*Work done at Kling Team, Kuaishou Technology    Project leader    Corresponding author

   

Please refer to the [GitHub README](https://github.com/KlingAIResearch/X-Dub) for usage. * Paper: [https://arxiv.org/abs/2512.25066](https://arxiv.org/abs/2512.25066) * Project Page: [https://hjrphoebus.github.io/X-Dub/](https://hjrphoebus.github.io/X-Dub/) * Code: [https://github.com/KlingAIResearch/X-Dub](https://github.com/KlingAIResearch/X-Dub) ## 📌 TL;DR X-Dub is a visual dubbing system that synchronizes a character's lip movements in a video to match arbitrary input audio. This repository hosts the public Wan-based X-Dub release and its pretrained weights. ## 🌟 Citation Please cite our paper if you find our work helpful. ```bibtex @article{he2025from, title={From Inpainting to Editing: A Self-Bootstrapping Framework for Context-Rich Visual Dubbing}, author={He, Xu and Zhang, Haoxian and Chen, Hejia and Zheng, Changyuan and Chen, Liyang and Tang, Songlin and Huang, Jiehui and Liu, Xiaoqiang and Wan, Pengfei and Wu, Zhiyong}, journal={arXiv preprint arXiv:2512.25066}, year={2025} } ```