Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation

Yingjie Chen · Shilun Lin · Cai Xing · Qixin Yan · Wenjing Wang
Dingming Liu · Hao Liu · Chen Li · Jing LYU

Paper PDF Project Page
Multimedia Computing Group, Nanjing University   |   Platform and Content Group (PCG), Tencent

teaser

This repository is the official implementation of "Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation". Identity-as-Presence is a unified and scalable framework for identity-aware joint audio-video generation, enabling high-fidelity and consistent personalization. Specifically, we introduce a data curation pipeline that automatically extracts identity-bearing information with paired annotations across audio and visual modalities, covering diverse scenarios from single-subject to multi-subject interactions. We further propose a flexible and scalable identity injection mechanism for single- and multi-subject scenarios, in which both facial appearance and vocal timbre act as identity-bearing control signals.

🔗 Citation

If you find this model useful for your research, please use the following BibTeX entry.

@inproceedings{chen2026identity,
  title={Identity as Presence: Towards Appearance and Voice Personalized Joint Audio-Video Generation},
  author={Chen, Yingjie and Lin, Shilun and Xing, Cai and Yan, Qixin and Wang, Wenjing and Liu, Dingming and Liu, Hao and Li, Chen and LYU, Jing},
  journal={arXiv preprint arXiv:TBD},
  website={https://chen-yingjie.github.io/projects/Identity-as-Presence/index.html},
  year={2026}}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for echoanran/Identity-as-Presence

Finetuned
chetwinlow1/Ovi
Finetuned
(1)
this model

Paper for echoanran/Identity-as-Presence