Live-Avatar / README.md

Update README.md

46ee52e verified 2 months ago

11.8 kB

	---قصة حياة النبي محمد (صلى الله عليه وسلم) تبدأ في مكة بمولد يتيم الأب، فقد أمه وهو صغير، فكفله جده ثم عمه أبو طالب، وعُرف بـ "الصادق الأمين" قبل أن يتلقى الوحي في الأربعين، داعياً للإسلام سرًا وجهرًا، مهاجرًا للمدينة، موحدًا الجزيرة العربية، مؤسسًا دولة، وقائدًا انتصر في غزواته، وأنهى حياته بحجة الوداع ليترك للبشرية رسالة الإسلام.
	النشأة والشباب
	المولد والطفولة: وُلد بمكة المكرمة عام الفيل (حوالي 570 م)، رضع من حليمة السعدية، وفقد والديه مبكرًا، وتولى رعايته جده عبد المطلب، ثم عمه أبو طالب.
	الرعي والتجارة: عمل في رعي الغنم، ثم سافر في رحلات تجارية مع عمه، حيث لُقِّب بـ "الصادق الأمين" لصدقه وأمانته، وعُرف بحكمته.
	الزواج: تزوج السيدة خديجة بنت خويلد (رضي الله عنها) في سن الخامسة والعشرين، وأنجب منها كل أبنائه ما عدا إبراهيم.
	النبوة والدعوة في مكة
	نزول الوحي: في سن الأربعين، بينما كان يتعبد في غار حراء، نزل عليه الوحي جبريل (عليه السلام) وبدأ نزول القرآن.
	الدعوة السرية والجهرية: بدأ يدعو للإسلام سرًا بين المقربين، ثم جهر بالدعوة في مكة، مواجهًا اضطهاد قريش، مما أدى إلى المقاطعة والمحنة.
	عام الحزن والهجرة: تعرض لعام الحزن بوفاة عمه أبي طالب وزوجته خديجة، ثم هاجر إلى الحبشة ثم إلى يثرب (المدينة المنورة) بعد بيعات العقبة، وهو العام الذي بدأ فيه التاريخ الهجري (622م).
	الدعوة في المدينة
	تأسيس الدولة: أسس أول دولة إسلامية، وكتب وثيقة المدينة التي نظمت العلاقات بين المسلمين واليهود.
	الغزوات والفتح: خاض معارك هامة (بدر، أحد، الخندق)، وفتح مكة صلحًا، ونشر الإسلام في أرجاء شبه الجزيرة العربية.
	حجة الوداع: أدى حجة الوداع، حيث بلّغ رسالة الإسلام، وتوفي بعدها في المدينة المنورة.
	وفاته
	توفي النبي محمد (صلى الله عليه وسلم) في المدينة المنورة، ودفن في بيته، تاركًا وراءه القرآن الكريم والسنة النبوية، كمصدر هداية للبشرية جمعاء.
	: أباتشي-2.0
	:
	-- EN.
	:
	-- وان-AI/وان2.2-S2V-14B
	: الصورة إلى الفيديو.
	---
	<

	<ص. محاذاة ="center">
	<
	</

	<H1.> 🎬 Live Avatar: بث الصوت في الوقت الحقيقي من خلال توليد الصورة الرمزية مع طول لانهائي </h1>
	<!- <<!- <h3>سيكون الرمز مفتوح المصدر بأسلوب <strong><span ="اللون: #87CEEB;">أوائل ديسمبر</span></strong>.</h3> -->>سيكون الرمز مفتوح المصدر بأسلوب <strong><span ="اللون: #87CEEB;">أوائل ديسمبر</span></strong>.</h3> -->


	<
	<a><sup>1,2</sup> ·
	<a><sup>1,3</sup> ·
	<a><sup>1,4</sup> ·
	<a><sup>1</sup> ·
	<a><sup>1</sup> ·
	<a><sup>4</sup> ·
	<a><sup>2</sup> ·
	<a><sup>2,*</sup> ·
	<a><sup>2,*</sup> ·
	<a><sup>1، ≥</sup> ·
	<a><sup>1</sup>
	</p>

	<p style="font-size: 0.9em;">
	<sup>1</sup> Alibaba Group
	<sup>2</sup> University of Science and Technology of China
	<sup>3</sup> Beijing University of Posts and Telecommunications
	<sup>4</sup> Zhejiang University
	</p>

	<p style="font-size: 0.9em;">
	<sup>*</sup> Corresponding authors.    <sup>‡</sup> Project leader.
	</p>

	<!-- Badges -->
	<a href="https://arxiv.org/abs/2512.04677"><img src="https://img.shields.io/badge/arXiv-2512.04677-b31b1b.svg?style=for-the-badge" alt="arXiv"></a> <a href="https://huggingface.co/papers/2512.04677"><img src="https://img.shields.io/badge/🤗%20Daily%20Paper-ff9d00?style=for-the-badge" alt="Daily Paper"></a> <a href="https://huggingface.co/Quark-Vision/Live-Avatar"><img src="https://img.shields.io/badge/Hugging%20Face-Model-ffbd45?style=for-the-badge&logo=huggingface&logoColor=white" alt="HuggingFace"></a> <a href="https://github.com/Alibaba-Quark/LiveAvatar"><img src="https://img.shields.io/badge/Github-Code-black?style=for-the-badge&logo=github" alt="Github"></a> <a href="https://liveavatar.github.io/"><img src="https://img.shields.io/badge/Project-Page-blue?style=for-the-badge&logo=googlechrome&logoColor=white" alt="Project Page"></a>

	</div>

	> TL;DR: Live Avatar is an algorithm–system co-designed framework that enables real-time, streaming, infinite-length interactive avatar video generation. Powered by a 14B-parameter diffusion model, it achieves 20 FPS on 5×H800 GPUs with 4-step sampling and supports Block-wise Autoregressive processing for 10,000+ second streaming videos.

	<div align="center">

	[![Watch the video](assets/demo.png)](https://www.youtube.com/watch?v=srbsGlLNpAc)

	<strong>👀 More Demos:</strong> <br>
	:robot: Human-AI Conversation  \|  ♾️ Infinite Video  \|  🎭 Diverse Characters  \|  🎬 Animated Tech Explanation <br>
	<a href="https://liveavatar.github.io/">
	<strong>👉 Click Here to Visit Project Page! 🌐</strong>
	</a>
	<br>

	</div>

	---
	## ✨ Highlights

	> - ⚡ Real-time Streaming Interaction - Achieve 20 FPS real-time streaming with low latency
	> - ♾️ Infinite-length Autoregressive Generation - Support 10,000+ second continuous video generation
	> - 🎨 Generalization Performances - Strong generalization across cartoon characters, singing, and diverse scenarios


	---
	## 📰 News
	- [2025.12.08] 🚀 We released real-time inference [Code](infinite_inference_multi_gpu.sh) and the model [Weight](https://huggingface.co/Quark-Vision/Live-Avatar).
	- [2025.12.08] 🎉 LiveAvatar won the Hugging Face [#1 Paper of the day](https://huggingface.co/papers/date/2025-12-05)!
	- [2025.12.04] 🏃‍♂️ We committed to open-sourcing the code in early December.
	- [2025.12.04] 🔥 We released [Paper](https://arxiv.org/abs/2512.04677) and [demo page](https://liveavatar.github.io/) Website.

	---

	## 📑 Todo List

	### 🌟 Early December (core code release)

	- ✅ Release the paper
	- ✅ Release the demo website
	- ✅ Release checkpoints on Hugging Face
	- ✅ Release Gradio Web UI
	- ✅ Experimental real-time streaming inference on at least H800 GPUs
	- ✅ Distribution-matching distillation to 4 steps
	- ✅ Timestep-forcing pipeline parallelism

	### ⚙️ Later updates

	- ⬜ UI integration for easily streaming interaction
	- ⬜ Inference code supporting single GPU (offline generation)
	- ⬜ Multi-character support
	- ⬜ Training code
	- ⬜ TTS integration
	- ⬜ LiveAvatar v1.1

	## 🛠️ Installation

	Please follow the steps below to set up the environment.

	### 1. Create Environment
	```bash
	conda create -n liveavatar python=3.10 -y
	conda activate liveavatar
	```

	### 2. Install CUDA Dependencies (optional)
	```bash
	conda install nvidia/label/cuda-12.4.1::cuda -y
	conda install -c nvidia/label/cuda-12.4.1 cudatoolkit -y
	```

	### 3. Install PyTorch & Flash Attention
	```bash
	pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128
	pip install flash-attn==2.8.3 --no-build-isolation
	```

	### 4. Install Python Requirements
	```bash
	pip install -r requirements.txt
	```
	### 5. Install FFMPEG
	```bash
	apt-get update && apt-get install -y ffmpeg
	```

	---

	## 📥 Download Models

	Please download the pretrained checkpoints from links below and place them in the `./ckpt/` directory.

	\| Model Component \| Description \| Link \|
	\| :--- \| :--- \| :---: \|
	\| `WanS2V-14B` \| base model\| 🤗 [Huggingface](https://huggingface.co/Wan-AI/Wan2.2-S2V-14B) \|
	\| `liveAvatar` \| our lora model\| 🤗 [Huggingface](https://huggingface.co/Quark-Vision/Live-Avatar) \|
	```bash
	# If you are in china mainland, run this first: export HF_ENDPOINT=https://hf-mirror.com
	pip install "huggingface_hub[cli]"
	huggingface-cli download Wan-AI/Wan2.2-S2V-14B --local-dir ./ckpt/Wan2.2-S2V-14B
	huggingface-cli download Quark-Vision/Live-Avatar --local-dir ./ckpt/LiveAvatar
	```

	After downloading, your directory structure should look like this:

	```
	ckpt/
	├── Wan2.2-S2V-14B/ # Base model
	│ ├── config.json
	│ ├── diffusion_pytorch_model-*.safetensors
	│ └── ...
	└── LiveAvatar/ # Our LoRA model
	├── liveavatar.safetensors
	└── ...
	```



	## 🚀 Inference
	### Real-time Inference with TPP
	> 💡 Currently, This command can run on GPUs with at least 80GB VRAM.
	```bash
	# CLI Inference
	bash infinite_inference_multi_gpu.sh
	# Gradio Web UI
	bash gradio_multi_gpu.sh
	```
	> 💡 The model can generate videos from audio input combined with reference image and optional text prompt.

	> 💡 The `size` parameter represents the area of the generated video, with the aspect ratio following that of the original input image.

	> 💡 The `--num_clip` parameter controls the number of video clips generated, useful for quick preview with shorter generation time.

	> 💡 Currently, our TPP pipeline requires five GPUs for inference. We are planning to develop a 3-step version that can be deployed on a 4-GPU cluster.
	Furthermore, we are planning to integrate the [LightX2V](https://github.com/ModelTC/LightX2V) VAE component. This integration will eliminate the dependency on additional single-GPU VAE parallelism and support 4-step inference within a 4-GPU setup.

	Please visit our [project page](https://liveavatar.github.io/) to see more examples and learn about the scenarios suitable for this model.
	## 📝 Citation

	If you find this project useful for your research, please consider citing our paper:

	```bibtex
	@misc{huang2025liveavatarstreamingrealtime,
	title={Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length},
	author={Yubo Huang and Hailong Guo and Fangtai Wu and Shifeng Zhang and Shijie Huang and Qijun Gan and Lin Liu and Sirui Zhao and Enhong Chen and Jiaming Liu and Steven Hoi},
	year={2025},
	eprint={2512.04677},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2512.04677},
	}
	```
	## ⭐ Star History

	[![Star History Chart](https://api.star-history.com/svg?repos=Alibaba-Quark/LiveAvatar&type=date&legend=top-left)](https://www.star-history.com/#Alibaba-Quark/LiveAvatar&type=date&legend=top-left)

	## 📜 License Agreement
	* The majority of this project is released under the Apache 2.0 license as found in the [LICENSE](LICENSE).
	* The Wan model (Our base model) is also released under the Apache 2.0 license as found in the [LICENSE](https://github.com/Wan-Video/Wan2.2/blob/main/LICENSE.txt).
	* The project is a research preview. Please contact us if you find any potential violations. (jmliu1217@gmail.com)



	## 🙏 Acknowledgements

	We would like to express our gratitude to the following projects:

	* [CausVid](https://github.com/tianweiy/CausVid)
	* [Longlive](https://github.com/NVlabs/LongLive)
	* [WanS2V](https://humanaigc.github.io/wan-s2v-webpage/)