Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation
Yingjie Chen
·
Shilun Lin
·
Cai Xing
·
Qixin Yan
·
Wenjing Wang
Dingming Liu
·
Hao Liu
·
Chen Li
·
Jing LYU
Multimedia Computing Group, Nanjing University | Platform and Content Group (PCG), Tencent
This repository is the official implementation of "Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation". Identity-as-Presence is a unified and scalable framework for identity-aware joint audio-video generation, enabling high-fidelity and consistent personalization. Specifically, we introduce a data curation pipeline that automatically extracts identity-bearing information with paired annotations across audio and visual modalities, covering diverse scenarios from single-subject to multi-subject interactions. We further propose a flexible and scalable identity injection mechanism for single- and multi-subject scenarios, in which both facial appearance and vocal timbre act as identity-bearing control signals.
🔗 Citation
If you find this model useful for your research, please use the following BibTeX entry.
@inproceedings{chen2026identity,
title={Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation},
author={Chen, Yingjie and Lin, Shilun and Xing, Cai and Yan, Qixin and Wang, Wenjing and Liu, Dingming and Liu, Hao and Li, Chen and LYU, Jing},
journal={arXiv preprint arXiv:TBD},
website={https://chen-yingjie.github.io/projects/Identity-as-Presence/index.html},
year={2026}}
- Downloads last month
- -
