Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation

Yingjie Chen · Shilun Lin · Cai Xing · Qixin Yan · Wenjing Wang
Dingming Liu · Hao Liu · Chen Li · Jing LYU

Multimedia Computing Group, Nanjing University | Platform and Content Group (PCG), Tencent

This repository is the official implementation of "Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation". Identity-as-Presence is a unified and scalable framework for identity-aware joint audio-video generation, enabling high-fidelity and consistent personalization. Specifically, we introduce a data curation pipeline that automatically extracts identity-bearing information with paired annotations across audio and visual modalities, covering diverse scenarios from single-subject to multi-subject interactions. We further propose a flexible and scalable identity injection mechanism for single- and multi-subject scenarios, in which both facial appearance and vocal timbre act as identity-bearing control signals.

🔗 Citation

If you find this model useful for your research, please use the following BibTeX entry.

@inproceedings{chen2026identity,
  title={Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation},
  author={Chen, Yingjie and Lin, Shilun and Xing, Cai and Yan, Qixin and Wang, Wenjing and Liu, Dingming and Liu, Hao and Li, Chen and LYU, Jing},
  journal={arXiv preprint arXiv:TBD},
  website={https://chen-yingjie.github.io/projects/Identity-as-Presence/index.html},
  year={2026}}

Downloads last month: -

Model tree for echoanran/Identity-as-Presence

Base model

Wan-AI/Wan2.2-TI2V-5B

Finetuned

chetwinlow1/Ovi

Finetuned

(1)

this model

Paper for echoanran/Identity-as-Presence

Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation

Paper • 2603.17889 • Published 1 day ago • 2