--- language: - en pipeline_tag: image-to-video tags: - image-to-video - audio-conditioned - diffusion - talking-avatar - pytorch ---
AvatarForcing is a **one-step streaming diffusion** framework for talking avatars. It generates video from **one reference image + speech audio + (optional) text prompt**, using **local-future sliding-window denoising** with **heterogeneous noise levels** and **dual-anchor temporal forcing** for long-form stability. For method details, see: https://arxiv.org/abs/2603.14331 This Hugging Face repo (`lycui/AvatarForcing`) provides two training-stage checkpoints: - `ode_audio_init.pt`: stage-1 **ODE** initialization weights - `model.pt`: stage-2 **DMD** weights ## Model Download | Models | Download Link | Notes | |---|---|---| | Wan2.1-T2V-1.3B | 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) | Base model (student) | | AvatarForcing | 🤗 [Huggingface](https://huggingface.co/lycui/AvatarForcing) | `ode_audio_init.pt` (ODE) + `model.pt` (DMD) | | Wav2Vec | 🤗 [Huggingface](https://huggingface.co/facebook/wav2vec2-base-960h) | Audio encoder | Download models using `huggingface-cli`: ```sh pip install "huggingface_hub[cli]" mkdir -p pretrained_models huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./pretrained_models/Wan2.1-T2V-1.3B huggingface-cli download facebook/wav2vec2-base-960h --local-dir ./pretrained_models/wav2vec2-base-960h huggingface-cli download lycui/AvatarForcing --local-dir ./pretrained_models/AvatarForcing ```