File size: 9,387 Bytes
26fc343 006bcaa 26fc343 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 |
---
license: apache-2.0
language:
- en
base_model:
- Wan-AI/Wan2.2-S2V-14B
pipeline_tag: image-to-video
---
<div align="center">
<p align="center">
<img src="./assets/logo.png" width="200px" alt="Live Avatar Teaser">
</p>
<h1>🎬 Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length</h1>
<!-- <h3>The code will be open source in <strong><span style="color: #87CEEB;">early December</span></strong>.</h3> -->
<p>
<a href="https://github.com/Yubo-Shankui" style="color: inherit;">Yubo Huang</a><sup>1,2</sup> ·
<a href="#" style="color: inherit;">Hailong Guo</a><sup>1,3</sup> ·
<a href="#" style="color: inherit;">Fangtai Wu</a><sup>1,4</sup> ·
<a href="#" style="color: inherit;">Shifeng Zhang</a><sup>1</sup> ·
<a href="#" style="color: inherit;">Shijie Huang</a><sup>1</sup> ·
<a href="#" style="color: inherit;">Qijun Gan</a><sup>4</sup> ·
<a href="#" style="color: inherit;">Lin Liu</a><sup>2</sup> ·
<a href="#" style="color: inherit;">Sirui Zhao</a><sup>2,*</sup> ·
<a href="http://staff.ustc.edu.cn/~cheneh/" style="color: inherit;">Enhong Chen</a><sup>2,*</sup> ·
<a href="https://openreview.net/profile?id=%7EJiaming_Liu7" style="color: inherit;">Jiaming Liu</a><sup>1,‡</sup> ·
<a href="https://sites.google.com/view/stevenhoi/" style="color: inherit;">Steven Hoi</a><sup>1</sup>
</p>
<p style="font-size: 0.9em;">
<sup>1</sup> Alibaba Group
<sup>2</sup> University of Science and Technology of China
<sup>3</sup> Beijing University of Posts and Telecommunications
<sup>4</sup> Zhejiang University
</p>
<p style="font-size: 0.9em;">
<sup>*</sup> Corresponding authors. <sup>‡</sup> Project leader.
</p>
<!-- Badges -->
<a href="https://arxiv.org/abs/2512.04677"><img src="https://img.shields.io/badge/arXiv-2512.04677-b31b1b.svg?style=for-the-badge" alt="arXiv"></a> <a href="https://huggingface.co/papers/2512.04677"><img src="https://img.shields.io/badge/🤗%20Daily%20Paper-ff9d00?style=for-the-badge" alt="Daily Paper"></a> <a href="https://huggingface.co/Quark-Vision/Live-Avatar"><img src="https://img.shields.io/badge/Hugging%20Face-Model-ffbd45?style=for-the-badge&logo=huggingface&logoColor=white" alt="HuggingFace"></a> <a href="https://github.com/Alibaba-Quark/LiveAvatar"><img src="https://img.shields.io/badge/Github-Code-black?style=for-the-badge&logo=github" alt="Github"></a> <a href="https://liveavatar.github.io/"><img src="https://img.shields.io/badge/Project-Page-blue?style=for-the-badge&logo=googlechrome&logoColor=white" alt="Project Page"></a>
</div>
> **TL;DR:** **Live Avatar** is an algorithm–system co-designed framework that enables real-time, streaming, infinite-length interactive avatar video generation. Powered by a **14B-parameter** diffusion model, it achieves **20 FPS** on **5×H800** GPUs with **4-step** sampling and supports **Block-wise Autoregressive** processing for **10,000+** second streaming videos.
<div align="center">
[](https://www.youtube.com/watch?v=srbsGlLNpAc)
<strong>👀 More Demos:</strong> <br>
:robot: Human-AI Conversation | ♾️ Infinite Video | 🎭 Diverse Characters | 🎬 Animated Tech Explanation <br>
<a href="https://liveavatar.github.io/">
<strong>👉 Click Here to Visit Project Page! 🌐</strong>
</a>
<br>
</div>
---
## ✨ Highlights
> - ⚡ **Real-time Streaming Interaction** - Achieve **20** FPS real-time streaming with low latency
> - ♾️ **Infinite-length Autoregressive Generation** - Support **10,000+** second continuous video generation
> - 🎨 **Generalization Performances** - Strong generalization across cartoon characters, singing, and diverse scenarios
---
## 📰 News
- **[2025.12.08]** 🚀 We released real-time inference [Code](infinite_inference_multi_gpu.sh) and the model [Weight](https://huggingface.co/Quark-Vision/Live-Avatar).
- **[2025.12.08]** 🎉 LiveAvatar won the Hugging Face [#1 Paper of the day](https://huggingface.co/papers/date/2025-12-05)!
- **[2025.12.04]** 🏃♂️ We committed to open-sourcing the code in **early December**.
- **[2025.12.04]** 🔥 We released [Paper](https://arxiv.org/abs/2512.04677) and [demo page](https://liveavatar.github.io/) Website.
---
## 📑 Todo List
### 🌟 **Early December** (core code release)
- ✅ Release the paper
- ✅ Release the demo website
- ✅ Release checkpoints on Hugging Face
- ✅ Release Gradio Web UI
- ✅ Experimental real-time streaming inference on at least H800 GPUs
- ✅ Distribution-matching distillation to 4 steps
- ✅ Timestep-forcing pipeline parallelism
### ⚙️ **Later updates**
- ⬜ UI integration for easily streaming interaction
- ⬜ Inference code supporting single GPU (offline generation)
- ⬜ Multi-character support
- ⬜ Training code
- ⬜ TTS integration
- ⬜ LiveAvatar v1.1
## 🛠️ Installation
Please follow the steps below to set up the environment.
### 1. Create Environment
```bash
conda create -n liveavatar python=3.10 -y
conda activate liveavatar
```
### 2. Install CUDA Dependencies (optional)
```bash
conda install nvidia/label/cuda-12.4.1::cuda -y
conda install -c nvidia/label/cuda-12.4.1 cudatoolkit -y
```
### 3. Install PyTorch & Flash Attention
```bash
pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
pip install flash-attn==2.8.3 --no-build-isolation
```
### 4. Install Python Requirements
```bash
pip install -r requirements.txt
```
### 5. Install FFMPEG
```bash
apt-get update && apt-get install -y ffmpeg
```
---
## 📥 Download Models
Please download the pretrained checkpoints from links below and place them in the `./ckpt/` directory.
| Model Component | Description | Link |
| :--- | :--- | :---: |
| `WanS2V-14B` | base model| 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.2-S2V-14B) |
| `liveAvatar` | our lora model| 🤗 [Huggingface](https://huggingface.co/Quark-Vision/Live-Avatar) |
```bash
# If you are in china mainland, run this first: export HF_ENDPOINT=https://hf-mirror.com
pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.2-S2V-14B --local-dir ./ckpt/Wan2.2-S2V-14B
huggingface-cli download Quark-Vision/Live-Avatar --local-dir ./ckpt/LiveAvatar
```
After downloading, your directory structure should look like this:
```
ckpt/
├── Wan2.2-S2V-14B/ # Base model
│ ├── config.json
│ ├── diffusion_pytorch_model-*.safetensors
│ └── ...
└── LiveAvatar/ # Our LoRA model
├── liveavatar.safetensors
└── ...
```
## 🚀 Inference
### Real-time Inference with TPP
> 💡 Currently, This command can run on GPUs with at least 80GB VRAM.
```bash
# CLI Inference
bash infinite_inference_multi_gpu.sh
# Gradio Web UI
bash gradio_multi_gpu.sh
```
> 💡 The model can generate videos from audio input combined with reference image and optional text prompt.
> 💡 The `size` parameter represents the area of the generated video, with the aspect ratio following that of the original input image.
> 💡 The `--num_clip` parameter controls the number of video clips generated, useful for quick preview with shorter generation time.
> 💡 Currently, our TPP pipeline requires **five** GPUs for inference. We are planning to develop a 3-step version that can be deployed on a 4-GPU cluster.
Furthermore, we are planning to integrate the [LightX2V](https://github.com/ModelTC/LightX2V) VAE component. This integration will eliminate the dependency on additional single-GPU VAE parallelism and support 4-step inference within a 4-GPU setup.
Please visit our [project page](https://liveavatar.github.io/) to see more examples and learn about the scenarios suitable for this model.
## 📝 Citation
If you find this project useful for your research, please consider citing our paper:
```bibtex
@misc{huang2025liveavatarstreamingrealtime,
title={Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length},
author={Yubo Huang and Hailong Guo and Fangtai Wu and Shifeng Zhang and Shijie Huang and Qijun Gan and Lin Liu and Sirui Zhao and Enhong Chen and Jiaming Liu and Steven Hoi},
year={2025},
eprint={2512.04677},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.04677},
}
```
## ⭐ Star History
[](https://www.star-history.com/#Alibaba-Quark/LiveAvatar&type=date&legend=top-left)
## 📜 License Agreement
* The majority of this project is released under the Apache 2.0 license as found in the [LICENSE](LICENSE).
* The Wan model (Our base model) is also released under the Apache 2.0 license as found in the [LICENSE](https://github.com/Wan-Video/Wan2.2/blob/main/LICENSE.txt).
* The project is a research preview. Please contact us if you find any potential violations. (jmliu1217@gmail.com)
## 🙏 Acknowledgements
We would like to express our gratitude to the following projects:
* [CausVid](https://github.com/tianweiy/CausVid)
* [Longlive](https://github.com/NVlabs/LongLive)
* [WanS2V](https://humanaigc.github.io/wan-s2v-webpage/) |