lycui
/

AvatarForcing

audio-conditioned

Model card Files Files and versions

lycui commited on Mar 26

Commit

3d20648

·

verified ·

1 Parent(s): 5d16cc8

Update README.md

Files changed (1) hide show

README.md +68 -3

README.md CHANGED Viewed

@@ -1,3 +1,68 @@
----
-license: apache-2.0
----

+---
+language:
+- en
+pipeline_tag: image-to-video
+tags:
+- image-to-video
+- audio-conditioned
+- diffusion
+- talking-avatar
+- pytorch
+---
+<div align="center">
+<h1>AvatarForcing</h1>
+<h3>One-Step Streaming Talking Avatars via Local-Future Sliding-Window Denoising</h3>
+<p>
+  <a href="https://huggingface.co/lycui/AvatarForcing"><img src="https://img.shields.io/badge/HuggingFace-Model-ffbd45?style=for-the-badge&logo=huggingface&logoColor=white" alt="Hugging Face Model"></a>
+  <a href="https://arxiv.org/abs/2603.14331"><img src="https://img.shields.io/badge/arXiv-2603.14331-b31b1b?style=for-the-badge" alt="arXiv"></a>
+  <a href="https://cuiliyuan121.github.io/AvatarForcing/"><img src="https://img.shields.io/badge/Project-Page-blue?style=for-the-badge&logo=googlechrome&logoColor=white" alt="Project Page"></a>
+</p>
+</div>
+AvatarForcing is a **one-step streaming diffusion** framework for talking avatars. It generates video from **one reference image + speech audio + (optional) text prompt**, using **local-future sliding-window denoising** with **heterogeneous noise levels** and **dual-anchor temporal forcing** for long-form stability. For method details, see: https://arxiv.org/abs/2603.14331
+This Hugging Face repo (`lycui/AvatarForcing`) provides two training-stage checkpoints:
+- `ode_audio_init.pt`: stage-1 **ODE** initialization weights
+- `model.pt`: stage-2 **DMD** weights
+## Model Download
+| Models | Download Link | Notes |
+|---|---|---|
+| Wan2.1-T2V-1.3B | 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) | Base model (student) |
+| AvatarForcing | 🤗 [Huggingface](https://huggingface.co/lycui/AvatarForcing) | `ode_audio_init.pt` (ODE) + `model.pt` (DMD) |
+| Wav2Vec | 🤗 [Huggingface](https://huggingface.co/facebook/wav2vec2-base-960h) | Audio encoder |
+Download models using `huggingface-cli`:
+```sh
+pip install "huggingface_hub[cli]"
+mkdir -p pretrained_models
+huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./pretrained_models/Wan2.1-T2V-1.3B
+huggingface-cli download facebook/wav2vec2-base-960h --local-dir ./pretrained_models/wav2vec2-base-960h
+huggingface-cli download lycui/AvatarForcing --local-dir ./pretrained_models/AvatarForcing
+```
+<details>
+<summary><strong>Citation</strong></summary>
+```bibtex
+@misc{cui2026avatarforcingonestepstreamingtalking,
+      title={AvatarForcing: One-Step Streaming Talking Avatars via Local-Future Sliding-Window Denoising},
+      author={Liyuan Cui and Wentao Hu and Wenyuan Zhang and Zesong Yang and Fan Shi and Xiaoqiang Liu},
+      year={2026},
+      eprint={2603.14331},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2603.14331},
+}
+```
+</details>