From Inpainting to Editing: Unlocking Robust Mask-Free Visual Dubbing via Generative Bootstrapping

Xu He^1,* Haoxian Zhang^2,† Hejia Chen³ Changyuan Zheng¹ Liyang Chen¹
Songlin Tang² Jiehui Huang⁴ Xiaoqiang Liu² Pengfei Wan² Zhiyong Wu^1,5,✉

¹Tsinghua University ²Kling Team, Kuaishou Technology ³Beihang University ⁴HKUST ⁵CUHK
^*Work done at Kling Team, Kuaishou Technology ^†Project leader ^✉Corresponding author

Please refer to the GitHub README for usage.

Paper: https://arxiv.org/abs/2512.25066
Project Page: https://hjrphoebus.github.io/X-Dub/
Code: https://github.com/KlingAIResearch/X-Dub

📌 TL;DR

X-Dub is a visual dubbing system that synchronizes a character's lip movements in a video to match arbitrary input audio. This repository hosts the public Wan-based X-Dub release and its pretrained weights.

🌟 Citation

Please cite our paper if you find our work helpful.

@article{he2025from,
  title={From Inpainting to Editing: A Self-Bootstrapping Framework for Context-Rich Visual Dubbing},
  author={He, Xu and Zhang, Haoxian and Chen, Hejia and Zheng, Changyuan and Chen, Liyang and Tang, Songlin and Huang, Jiehui and Liu, Xiaoqiang and Wan, Pengfei and Wu, Zhiyong},
  journal={arXiv preprint arXiv:2512.25066},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for KlingTeam/X-Dub

From Inpainting to Editing: A Self-Bootstrapping Framework for Context-Rich Visual Dubbing

Paper • 2512.25066 • Published Dec 31, 2025 • 3