Image-to-Video
English
Live-Avatar / README.md
nielsr's picture
nielsr HF Staff
Improve model card: add paper link, HF author profiles and tags
41efc81 verified
|
raw
history blame
8.48 kB
---
base_model:
- Wan-AI/Wan2.2-S2V-14B
language:
- en
license: apache-2.0
pipeline_tag: image-to-video
tags:
- lora
- talking-head
- audio-driven
- avatar-generation
---
<div align="center">
<p align="center">
<img src="./assets/logo.png" width="200px" alt="Live Avatar Teaser">
</p>
<h1>🎬 Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length</h1>
<p>
<a href="https://huggingface.co/Yubo-Huang" style="color: inherit;">Yubo Huang</a><sup>1,2</sup> Β·
<a href="#" style="color: inherit;">Hailong Guo</a><sup>2,3</sup> Β·
<a href="#" style="color: inherit;">Fangtai Wu</a><sup>2,4</sup> Β·
<a href="#" style="color: inherit;">Shifeng Zhang</a><sup>2</sup> Β·
<a href="#" style="color: inherit;">Shijie Huang</a><sup>2</sup> Β·
<a href="#" style="color: inherit;">Qijun Gan</a><sup>4</sup> Β·
<a href="#" style="color: inherit;">Lin Liu</a><sup>1</sup> Β·
<a href="#" style="color: inherit;">Sirui Zhao</a><sup>1,*</sup> Β·
<a href="https://huggingface.co/Hongni" style="color: inherit;">Enhong Chen</a><sup>1,*</sup> Β·
<a href="https://huggingface.co/jamesliu1217" style="color: inherit;">Jiaming Liu</a><sup>2,‑</sup> Β·
<a href="https://huggingface.co/stevenhoi" style="color: inherit;">Steven Hoi</a><sup>2</sup>
</p>
<p style="font-size: 0.9em;">
<sup>1</sup> University of Science and Technology of China &nbsp;&nbsp;
<sup>2</sup> Alibaba Group &nbsp;&nbsp;
<sup>3</sup> Beijing University of Posts and Telecommunications &nbsp;&nbsp;
<sup>4</sup> Zhejiang University
</p>
<p style="font-size: 0.9em;">
<sup>*</sup> Corresponding authors. &nbsp;&nbsp; <sup>‑</sup> Project leader.
</p>
<!-- Badges -->
<a href="https://arxiv.org/abs/2512.04677"><img src="https://img.shields.io/badge/arXiv-2512.04677-b31b1b.svg?style=for-the-badge" alt="arXiv"></a> <a href="https://huggingface.co/papers/2512.04677"><img src="https://img.shields.io/badge/πŸ€—%20Daily%20Paper-ff9d00?style=for-the-badge" alt="Daily Paper"></a> <a href="https://huggingface.co/Quark-Vision/Live-Avatar"><img src="https://img.shields.io/badge/Hugging%20Face-Model-ffbd45?style=for-the-badge&logo=huggingface&logoColor=white" alt="HuggingFace"></a> <a href="https://github.com/Alibaba-Quark/LiveAvatar"><img src="https://img.shields.io/badge/Github-Code-black?style=for-the-badge&logo=github" alt="Github"></a> <a href="https://liveavatar.github.io/"><img src="https://img.shields.io/badge/Project-Page-blue?style=for-the-badge&logo=googlechrome&logoColor=white" alt="Project Page"></a>
</div>
This repository contains the weights for the paper [Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length](https://huggingface.co/papers/2512.04677).
> **TL;DR:** **Live Avatar** is an algorithm–system co-designed framework that enables real-time, streaming, infinite-length interactive avatar video generation. Powered by a **14B-parameter** diffusion model, it achieves **45 FPS** on multi-card **H800** GPUs with **4-step** sampling and supports **Block-wise Autoregressive** processing for **10,000+** second streaming videos.
<div align="center">
[![Watch the video](assets/demo.png)](https://www.youtube.com/watch?v=srbsGlLNpAc)
<strong>πŸ‘€ More Demos:</strong> <br>
πŸ€– Human-AI Conversation &nbsp;|&nbsp; ♾️ Infinite Video &nbsp;|&nbsp; 🎭 Diverse Characters &nbsp;|&nbsp; 🎬 Animated Tech Explanation <br>
<a href="https://liveavatar.github.io/">
<strong>πŸ‘‰ Click Here to Visit Project Page! 🌐</strong>
</a>
<br>
</div>
---
## ✨ Highlights
> - ⚑ **​​Real-time Streaming Interaction**​​ - Achieve **45** FPS real-time streaming with low latency
> - ♾️ ​​**​​Infinite-length Autoregressive Generation**​​​​ - Support **10,000+** second continuous video generation
> - 🎨 ​​**​​Generalization Performances**​​​​ - Strong generalization across cartoon characters, singing, and diverse scenarios
---
## πŸ“° News
- **[2026.1.20]** πŸš€ Major performance breakthrough (**v1.1**)! **FP8 quantization** enables inference on **48GB GPUs**, while advanced **compilation** and **cuDNN** attention boost speed to **~2.5x** peak and **3x** average FPS. Achieving stable **45+ FPS** on multi-H800.
- **[2025.12.16]** πŸŽ‰ LiveAvatar has reached **1,000+** stars on GitHub!
- **[2025.12.12]** πŸš€ We released **single-gpu** inference [Code](https://github.com/Alibaba-Quark/LiveAvatar/blob/main/infinite_inference_single_gpu.sh) β€” a single 80GB VRAM GPU is enough to enjoy.
- **[2025.12.08]** πŸš€ We released real-time inference [Code](https://github.com/Alibaba-Quark/LiveAvatar/blob/main/infinite_inference_multi_gpu.sh) and the model [Weight](https://huggingface.co/Quark-Vision/Live-Avatar).
- **[2025.12.08]** πŸŽ‰ LiveAvatar won the Hugging Face [#1 Paper of the day](https://huggingface.co/papers/date/2025-12-05)!
- **[2025.12.04]** πŸ”₯ We released [Paper](https://arxiv.org/abs/2512.04677) and [demo page](https://liveavatar.github.io/) Website.
---
## πŸ“‘ Todo List
### 🌟 **Early December** (core code release)
- βœ… Release the paper
- βœ… Release the demo website
- βœ… Release checkpoints on Hugging Face
- βœ… Release Gradio Web UI
- βœ… Experimental real-time streaming inference on at least H800 GPUs
- βœ… Distribution-matching distillation to 4 steps
- βœ… Timestep-forcing pipeline parallelism
### βš™οΈ **Later updates**
- βœ… Inference code supporting single GPU (offline generation)
- βœ… Multi-character support
- βœ… Inference Acceleration Stage1 (RoPE optimization, compilation, LoRA merge)
- βœ… Streaming-VAE intergration
- βœ… Inference Acceleration Stage2 (further compilation, fp8, cudnn attn)
- ⬜ UI integration for easily streaming interaction
- ⬜ TTS integration
- ⬜ Training code
- ⬜ LiveAvatar v1.2
## πŸ› οΈ Installation
Please follow the steps below to set up the environment.
### 1. Create Environment
```bash
conda create -n liveavatar python=3.10 -y
conda activate liveavatar
```
### 2. Install CUDA Dependencies (optional)
```bash
conda install nvidia/label/cuda-12.4.1::cuda -y
conda install -c nvidia/label/cuda-12.4.1 cudatoolkit -y
```
### 3. Install PyTorch & Flash Attention
```bash
pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
# For H800/H200 setups:
pip install flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch280 --extra-index-url https://download.pytorch.org/whl/cu128
# Otherwise:
pip install flash-attn==2.8.3 --no-build-isolation
```
### 4. Install Python Requirements
```bash
pip install -r requirements.txt
```
### 5. Install FFMPEG
```bash
apt-get update && apt-get install -y ffmpeg
```
---
## πŸ“₯ Download Models
Please download the pretrained checkpoints and place them in the `./ckpt/` directory.
| Model Component | Description | Link |
| :--- | :--- | :---: |
| `WanS2V-14B` | base model| πŸ€— [Huggingface](https://huggingface.co/Wan-AI/Wan2.2-S2V-14B) |
| `liveAvatar` | our lora model| πŸ€— [Huggingface](https://huggingface.co/Quark-Vision/Live-Avatar) |
```bash
pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.2-S2V-14B --local-dir ./ckpt/Wan2.2-S2V-14B
huggingface-cli download Quark-Vision/Live-Avatar --local-dir ./ckpt/LiveAvatar
```
## πŸš€ Inference
### Real-time Inference with TPP
> πŸ’‘ Requires multi-GPU setup with at least 80GB VRAM.
```bash
# CLI Inference
bash infinite_inference_multi_gpu.sh
# Gradio Web UI
bash gradio_multi_gpu.sh
```
### Single-GPU Inference
> πŸ’‘ Can run on a single GPU with at least 80GB VRAM.
```bash
# CLI Inference
bash infinite_inference_single_gpu.sh
# Gradio Web UI
bash gradio_single_gpu.sh
```
## πŸ“ Citation
If you find this project useful for your research, please consider citing our paper:
```bibtex
@misc{huang2025liveavatarstreamingrealtime,
title={Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length},
author={Yubo Huang and Hailong Guo and Fangtai Wu and Shifeng Zhang and Shijie Huang and Qijun Gan and Lin Liu and Sirui Zhao and Enhong Chen and Jiaming Liu and Steven Hoi},
year={2025},
eprint={2512.04677},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.04677},
}
```
## πŸ“œ License Agreement
* The majority of this project is released under the Apache 2.0 license.
* The Wan model (base model) is also released under the Apache 2.0 license.