lycui commited on
Commit
3d20648
·
verified ·
1 Parent(s): 5d16cc8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -3
README.md CHANGED
@@ -1,3 +1,68 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ pipeline_tag: image-to-video
5
+ tags:
6
+ - image-to-video
7
+ - audio-conditioned
8
+ - diffusion
9
+ - talking-avatar
10
+ - pytorch
11
+ ---
12
+
13
+ <div align="center">
14
+
15
+ <h1>AvatarForcing</h1>
16
+
17
+ <h3>One-Step Streaming Talking Avatars via Local-Future Sliding-Window Denoising</h3>
18
+
19
+ <p>
20
+ <a href="https://huggingface.co/lycui/AvatarForcing"><img src="https://img.shields.io/badge/HuggingFace-Model-ffbd45?style=for-the-badge&logo=huggingface&logoColor=white" alt="Hugging Face Model"></a>
21
+ <a href="https://arxiv.org/abs/2603.14331"><img src="https://img.shields.io/badge/arXiv-2603.14331-b31b1b?style=for-the-badge" alt="arXiv"></a>
22
+ <a href="https://cuiliyuan121.github.io/AvatarForcing/"><img src="https://img.shields.io/badge/Project-Page-blue?style=for-the-badge&logo=googlechrome&logoColor=white" alt="Project Page"></a>
23
+ </p>
24
+
25
+ </div>
26
+
27
+ AvatarForcing is a **one-step streaming diffusion** framework for talking avatars. It generates video from **one reference image + speech audio + (optional) text prompt**, using **local-future sliding-window denoising** with **heterogeneous noise levels** and **dual-anchor temporal forcing** for long-form stability. For method details, see: https://arxiv.org/abs/2603.14331
28
+
29
+ This Hugging Face repo (`lycui/AvatarForcing`) provides two training-stage checkpoints:
30
+
31
+ - `ode_audio_init.pt`: stage-1 **ODE** initialization weights
32
+ - `model.pt`: stage-2 **DMD** weights
33
+
34
+ ## Model Download
35
+
36
+ | Models | Download Link | Notes |
37
+ |---|---|---|
38
+ | Wan2.1-T2V-1.3B | 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) | Base model (student) |
39
+ | AvatarForcing | 🤗 [Huggingface](https://huggingface.co/lycui/AvatarForcing) | `ode_audio_init.pt` (ODE) + `model.pt` (DMD) |
40
+ | Wav2Vec | 🤗 [Huggingface](https://huggingface.co/facebook/wav2vec2-base-960h) | Audio encoder |
41
+
42
+ Download models using `huggingface-cli`:
43
+
44
+ ```sh
45
+ pip install "huggingface_hub[cli]"
46
+ mkdir -p pretrained_models
47
+
48
+ huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./pretrained_models/Wan2.1-T2V-1.3B
49
+ huggingface-cli download facebook/wav2vec2-base-960h --local-dir ./pretrained_models/wav2vec2-base-960h
50
+ huggingface-cli download lycui/AvatarForcing --local-dir ./pretrained_models/AvatarForcing
51
+ ```
52
+
53
+ <details>
54
+ <summary><strong>Citation</strong></summary>
55
+
56
+ ```bibtex
57
+ @misc{cui2026avatarforcingonestepstreamingtalking,
58
+ title={AvatarForcing: One-Step Streaming Talking Avatars via Local-Future Sliding-Window Denoising},
59
+ author={Liyuan Cui and Wentao Hu and Wenyuan Zhang and Zesong Yang and Fan Shi and Xiaoqiang Liu},
60
+ year={2026},
61
+ eprint={2603.14331},
62
+ archivePrefix={arXiv},
63
+ primaryClass={cs.CV},
64
+ url={https://arxiv.org/abs/2603.14331},
65
+ }
66
+ ```
67
+
68
+ </details>