|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- portrait-animation |
|
|
- real-time |
|
|
- diffusion |
|
|
pipeline_tag: image-to-video |
|
|
library_name: diffusers |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
<h1 align="center" style="font-weight: 900; font-size: 80px; color: #FF6B6B; margin-bottom: 20px;"> |
|
|
PersonaLive! |
|
|
</h1> |
|
|
|
|
|
<h2>Expressive Portrait Image Animation for Live Streaming</h2> |
|
|
|
|
|
<a href='https://arxiv.org/abs/2512.11253'><img src='https://img.shields.io/badge/ArXiv-2512.11253-red'></a> <a href='https://huggingface.co/huaichang/PersonaLive'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-ffc107'></a> <a href='https://modelscope.cn/models/huaichang/PersonaLive'><img src='https://img.shields.io/badge/ModelScope-Model-624AFF'></a> [](https://github.com/GVCLab/PersonaLive) |
|
|
|
|
|
[Zhiyuan Li<sup>1,2,3</sup>](https://huai-chang.github.io/) · [Chi-Man Pun<sup>1,📪</sup>](https://cmpun.github.io/) · [Chen Fang<sup>2</sup>](http://fangchen.org/) · [Jue Wang<sup>2</sup>](https://scholar.google.com/citations?user=Bt4uDWMAAAAJ&hl=en) · [Xiaodong Cun<sup>3,📪</sup>](https://vinthony.github.io/academic/) |
|
|
|
|
|
<sup>1</sup> University of Macau <sup>2</sup> [Dzine.ai](https://www.dzine.ai/) <sup>3</sup> [GVC Lab, Great Bay University](https://gvclab.github.io/) |
|
|
|
|
|
<h3 align="center" style="color: #ff4d4d; font-weight: 900; margin-top: 0;"> |
|
|
⚡️ Real-time, Streamable, Infinite-Length ⚡️ <br> |
|
|
⚡️ Portrait Animation requires only ~12GB VRAM ⚡️ |
|
|
</h3> |
|
|
|
|
|
<table width="100%" align="center" style="border: none;"> |
|
|
<tr> |
|
|
<td width="46.5%" align="center" style="border: none;"> |
|
|
<img src="assets/demo_3.gif" style="width: 100%;"> |
|
|
</td> |
|
|
<td width="41%" align="center" style="border: none;"> |
|
|
<img src="assets/demo_2.gif" style="width: 100%;"> |
|
|
</td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
</div> |
|
|
|
|
|
## 📋 TODO |
|
|
- [ ] If you find PersonaLive useful or interesting, please give us a Star 🌟 on our [GitHub repo](https://github.com/GVCLab/PersonaLive)! Your support drives us to keep improving. 🍻 |
|
|
- [ ] Fix bugs (If you encounter any issues, please feel free to open an issue or contact me! 🙏) |
|
|
- [ ] Enhance WebUI (Support reference image replacement |
|
|
- [x] **[2025.12.22]** 🔥 Supported streaming strategy in offline inference to generate long videos on 12GB VRAM! |
|
|
- [x] **[2025.12.17]** 🔥 [ComfyUI-PersonaLive](https://github.com/okdalto/ComfyUI-PersonaLive) is now supported! (Thanks to [@okdalto](https://github.com/okdalto)) |
|
|
- [x] **[2025.12.15]** 🔥 Release `paper`! |
|
|
- [x] **[2025.12.12]** 🔥 Release `inference code`, `config`, and `pretrained weights`! |
|
|
|
|
|
## ⚙️ Framework |
|
|
<img src="assets/overview.png" alt="Image 1" width="100%"> |
|
|
|
|
|
|
|
|
We present PersonaLive, a `real-time` and `streamable` diffusion framework capable of generating `infinite-length` portrait animations on a single `12GB GPU`. |
|
|
|
|
|
|
|
|
## 🚀 Getting Started |
|
|
### 🛠 Installation |
|
|
``` |
|
|
# clone this repo |
|
|
git clone https://github.com/GVCLab/PersonaLive |
|
|
cd PersonaLive |
|
|
|
|
|
# Create conda environment |
|
|
conda create -n personalive python=3.10 |
|
|
conda activate personalive |
|
|
|
|
|
# Install packages with pip |
|
|
pip install -r requirements_base.txt |
|
|
``` |
|
|
|
|
|
### ⏬ Download weights |
|
|
Option 1: Download pre-trained weights of base models and other components ([sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers) and [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse)). You can run the following command to download weights automatically: |
|
|
|
|
|
```bash |
|
|
python tools/download_weights.py |
|
|
``` |
|
|
|
|
|
Option 2: Download pre-trained weights into the `./pretrained_weights` folder from one of the below URLs: |
|
|
|
|
|
<a href='https://drive.google.com/drive/folders/1GOhDBKIeowkMpBnKhGB8jgEhJt_--vbT?usp=drive_link'><img src='https://img.shields.io/badge/Google%20Drive-5B8DEF?style=for-the-badge&logo=googledrive&logoColor=white'></a> <a href='https://pan.baidu.com/s/1DCv4NvUy_z7Gj2xCGqRMkQ?pwd=gj64'><img src='https://img.shields.io/badge/Baidu%20Netdisk-3E4A89?style=for-the-badge&logo=baidu&logoColor=white'></a> <a href='https://modelscope.cn/models/huaichang/PersonaLive'><img src='https://img.shields.io/badge/ModelScope-624AFF?style=for-the-badge&logo=alibabacloud&logoColor=white'></a> <a href='https://huggingface.co/huaichang/PersonaLive'><img src='https://img.shields.io/badge/HuggingFace-E67E22?style=for-the-badge&logo=huggingface&logoColor=white'></a> |
|
|
|
|
|
Finally, these weights should be organized as follows: |
|
|
``` |
|
|
pretrained_weights |
|
|
├── onnx |
|
|
│ ├── unet_opt |
|
|
│ │ ├── unet_opt.onnx |
|
|
│ │ └── unet_opt.onnx.data |
|
|
│ └── unet |
|
|
├── personalive |
|
|
│ ├── denoising_unet.pth |
|
|
│ ├── motion_encoder.pth |
|
|
│ ├── motion_extractor.pth |
|
|
│ ├── pose_guider.pth |
|
|
│ ├── reference_unet.pth |
|
|
│ └── temporal_module.pth |
|
|
├── sd-vae-ft-mse |
|
|
│ ├── diffusion_pytorch_model.bin |
|
|
│ └── config.json |
|
|
├── sd-image-variations-diffusers |
|
|
│ ├── image_encoder |
|
|
│ │ ├── pytorch_model.bin |
|
|
│ │ └── config.json |
|
|
│ ├── unet |
|
|
│ │ ├── diffusion_pytorch_model.bin |
|
|
│ │ └── config.json |
|
|
│ └── model_index.json |
|
|
└── tensorrt |
|
|
└── unet_work.engine |
|
|
``` |
|
|
|
|
|
### 🎞️ Offline Inference |
|
|
``` |
|
|
python inference_offline.py |
|
|
``` |
|
|
⚠️ Note for RTX 50-Series (Blackwell) Users: xformers is not yet fully compatible with the new architecture. To avoid crashes, please disable it by running: |
|
|
``` |
|
|
python inference_offline.py --use_xformers False |
|
|
``` |
|
|
|
|
|
### 📸 Online Inference |
|
|
#### 📦 Setup Web UI |
|
|
``` |
|
|
# install Node.js 18+ |
|
|
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash |
|
|
nvm install 18 |
|
|
|
|
|
cd webcam |
|
|
source start.sh |
|
|
``` |
|
|
|
|
|
#### 🏎️ Acceleration (Optional) |
|
|
Converting the model to TensorRT can significantly speed up inference (~ 2x ⚡️). Building the engine may take about `20 minutes` depending on your device. Note that TensorRT optimizations may lead to slight variations or a small drop in output quality. |
|
|
``` |
|
|
pip install -r requirements_trt.txt |
|
|
|
|
|
python torch2trt.py |
|
|
``` |
|
|
*The provided TensorRT model is from an `H100`. We recommend `ALL users` (including H100 users) re-run `python torch2trt.py` locally to ensure best compatibility.* |
|
|
|
|
|
#### ▶️ Start Streaming |
|
|
``` |
|
|
python inference_online.py --acceleration none (for RTX 50-Series) or xformers or tensorrt |
|
|
``` |
|
|
Then open `http://0.0.0.0:7860` in your browser. (*If `http://0.0.0.0:7860` does not work well, try `http://localhost:7860`) |
|
|
|
|
|
**How to use**: Upload Image ➡️ Fuse Reference ➡️ Start Animation ➡️ Enjoy! 🎉 |
|
|
<div align="center"> |
|
|
<img src="assets/guide.png" alt="PersonaLive" width="60%"> |
|
|
</div> |
|
|
|
|
|
**Regarding Latency**: Latency varies depending on your device's computing power. You can try the following methods to optimize it: |
|
|
|
|
|
1. Lower the "Driving FPS" setting in the WebUI to reduce the computational workload. |
|
|
2. You can increase the multiplier (e.g., set to `num_frames_needed * 4` or higher) to better match your device's inference speed. https://github.com/GVCLab/PersonaLive/blob/6953d1a8b409f360a3ee1d7325093622b29f1e22/webcam/util.py#L73 |
|
|
|
|
|
## 📚 Community Contribution |
|
|
|
|
|
Special thanks to the community for providing helpful setups! 🥂 |
|
|
|
|
|
* **Windows + RTX 50-Series Guide**: Thanks to [@dknos](https://github.com/dknos) for providing a [detailed guide](https://github.com/GVCLab/PersonaLive/issues/10#issuecomment-3662785532) on running this project on Windows with Blackwell GPUs. |
|
|
|
|
|
* **TensorRT on Windows**: If you are trying to convert TensorRT models on Windows, [this discussion](https://github.com/GVCLab/PersonaLive/issues/8) might be helpful. Special thanks to [@MaraScott](https://github.com/MaraScott) and [@Jeremy8776](https://github.com/Jeremy8776) for their insights. |
|
|
|
|
|
* **ComfyUI**: Thanks to [@okdalto](https://github.com/okdalto) for helping implement the [ComfyUI-PersonaLive](https://github.com/okdalto/ComfyUI-PersonaLive) support. |
|
|
|
|
|
* **Useful Scripts**: Thanks to [@suruoxi](https://github.com/suruoxi) for implementing `download_weights.py`, and to [@andchir](https://github.com/andchir) for adding audio merging functionality. |
|
|
|
|
|
## 🎬 More Results |
|
|
#### 👀 Visualization results |
|
|
|
|
|
<table width="100%"> |
|
|
<tr> |
|
|
<td width="50%"> |
|
|
<video src="https://github.com/user-attachments/assets/cdc885ef-5e1c-4139-987a-2fa50fefd6a4" controls="controls" style="max-width: 100%; display: block;"></video> |
|
|
</td> |
|
|
<td width="50%"> |
|
|
<video src="https://github.com/user-attachments/assets/014f7bae-74ce-4f56-8621-24bc76f3c123" controls="controls" style="max-width: 100%; display: block;"></video> |
|
|
</td> |
|
|
</tr> |
|
|
</table> |
|
|
<table width="100%"> |
|
|
<tr> |
|
|
<td width="25%"> |
|
|
<video src="https://github.com/user-attachments/assets/1e6a0809-15d2-4cab-ae8f-8cf1728c6281" controls="controls" style="max-width: 100%; display: block;"></video> |
|
|
</td> |
|
|
<td width="25%"> |
|
|
<video src="https://github.com/user-attachments/assets/d9cf265d-9db0-4f83-81da-be967bbd5f26" controls="controls" style="max-width: 100%; display: block;"></video> |
|
|
</td> |
|
|
<td width="25%"> |
|
|
<video src="https://github.com/user-attachments/assets/86235139-b63e-4f26-b09c-d218466e8e24" controls="controls" style="max-width: 100%; display: block;"></video> |
|
|
</td> |
|
|
<td width="25%"> |
|
|
<video src="https://github.com/user-attachments/assets/238785de-3b4c-484e-9ad0-9d90e7962fee" controls="controls" style="max-width: 100%; display: block;"></video> |
|
|
</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td width="25%"> |
|
|
<video src="https://github.com/user-attachments/assets/c71c4717-d528-4a98-b132-2b0ec8cec22d" controls="controls" style="max-width: 100%; display: block;"></video> |
|
|
</td> |
|
|
<td width="25%"> |
|
|
<video src="https://github.com/user-attachments/assets/7e11fe71-fd16-4011-a6b2-2dbaf7e343fb" controls="controls" style="max-width: 100%; display: block;"></video> |
|
|
</td> |
|
|
<td width="25%"> |
|
|
<video src="https://github.com/user-attachments/assets/f62e2162-d239-4575-9514-34575c16301c" controls="controls" style="max-width: 100%; display: block;"></video> |
|
|
</td> |
|
|
<td width="25%"> |
|
|
<video src="https://github.com/user-attachments/assets/813e7fbd-37e9-47d7-a270-59887fafeca5" controls="controls" style="max-width: 100%; display: block;"></video> |
|
|
</td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
#### 🤺 Comparisons |
|
|
|
|
|
<table width="100%"> |
|
|
<tr> |
|
|
<td width="100%"> |
|
|
<video src="https://github.com/user-attachments/assets/36407cf9-bf82-43ff-9508-a794d223d3f7" controls="controls" style="max-width: 100%; display: block;"></video> |
|
|
</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td width="100%"> |
|
|
<video src="https://github.com/user-attachments/assets/3be99b91-c6a1-4ca4-89e9-8fad42bb9583" controls="controls" style="max-width: 100%; display: block;"></video> |
|
|
</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td width="100%"> |
|
|
<video src="https://github.com/user-attachments/assets/5bd21fe4-96ae-4be6-bf06-a7c476b04ec9" controls="controls" style="max-width: 100%; display: block;"></video> |
|
|
</td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
|
|
|
## ⭐ Citation |
|
|
If you find PersonaLive useful for your research, welcome to cite our work using the following BibTeX: |
|
|
```bibtex |
|
|
@article{li2025personalive, |
|
|
title={PersonaLive! Expressive Portrait Image Animation for Live Streaming}, |
|
|
author={Li, Zhiyuan and Pun, Chi-Man and Fang, Chen and Wang, Jue and Cun, Xiaodong}, |
|
|
journal={arXiv preprint arXiv:2512.11253}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## ❤️ Acknowledgement |
|
|
This code is mainly built upon [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone), [X-NeMo](https://byteaigc.github.io/X-Portrait2/), [StreamDiffusion](https://github.com/cumulo-autumn/StreamDiffusion), [RAIN](https://pscgylotti.github.io/pages/RAIN/) and [LivePortrait](https://github.com/KlingTeam/LivePortrait), thanks to their invaluable contributions. |