ID-LoRA-TalkVid

ID-LoRA (Identity-Driven In-Context LoRA) enables identity-preserving audio–video generation in a single model. This repository contains the ID-LoRA checkpoint trained on the TalkVid dataset.

Project Page | GitHub | Paper

Model Description

ID-LoRA jointly generates a subject's appearance and voice in a single model, letting a text prompt, a reference image, and a short audio clip govern both modalities together. Built on top of LTX-2, it is the first method to personalize visual appearance and voice within a single generative pass.

Unlike cascaded pipelines that treat audio and video separately, ID-LoRA operates in a unified latent space where a single text prompt can simultaneously dictate the scene's visual content, environmental acoustics, and speaking style—while preserving the subject's vocal identity and visual likeness.

Details

Property	Value
Base model	LTX-2 19B
Training dataset	TalkVid
LoRA rank	128
Training steps	6,000
Strategy	`audio_ref_only_ic` with negative temporal positions

Usage

To use this checkpoint, clone the official repository and run the inference script:

python scripts/inference_two_stage.py \
  --lora-path lora_weights.safetensors \
  --reference-audio reference_speaker.wav \
  --first-frame first_frame.png \
  --prompt "[VISUAL]: A person speaks in a sunlit park... [SPEECH]: Hello world... [SOUNDS]: ..." \
  --output-dir outputs/

Files

lora_weights.safetensors -- LoRA adapter weights (~1.1 GB)
training_config.yaml -- Training configuration used to produce this checkpoint

Citation

@misc{dahan2026idloraidentitydrivenaudiovideopersonalization,
  title     = {ID-LoRA: Identity-Driven Audio-Video Personalization with In-Context LoRA},
  author    = {Aviad Dahan and Moran Yanuka and Noa Kraicer and Lior Wolf and Raja Giryes},
  year      = {2026},
  eprint    = {2603.10256},
  archivePrefix = {arXiv},
  primaryClass  = {cs.SD},
  url       = {https://arxiv.org/abs/2603.10256}
}

Downloads last month: 13

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AviadDahan/ID-LoRA-TalkVid

Base model

Lightricks/LTX-Video

Adapter

(348)

this model

Paper for AviadDahan/ID-LoRA-TalkVid

ID-LoRA: Identity-Driven Audio-Video Personalization with In-Context LoRA

Paper • 2603.10256 • Published Mar 10 • 25