SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation
Le Shen*, Qian Qiao*, Tan Yu*, Ke Zhou, Tianhang Yu, Yu Zhan, Zhenjie Wang, Dingcheng Zhen, Ming Tao, Shunshun Yin, Siyuan Liu β
*Equal Contribution βCorresponding Author
π₯ News
- 2026.01.08 - We have released the inference code, and the model weights.
- 2025.12.30 - We released Project page on SoulX-FlashTalk.
- 2025.12.30 - We released SoulX-FlashTalk Technical Report on Arxiv and GitHub repository.
π€« Coming soon
A 4-GPU version of SoulX-FlashTalk and a new open-source real-time streaming digital human model designed specifically for consumer-grade GPUs like 4090 etc.
π Todo List
- Technical report
- Project Page
- Inference code
- Checkpoint release
- Online demo
π Quickstart
π§ Installation
1. Create a Conda environment
conda create -n flashtalk python=3.10
conda activate flashtalk
2. Install PyTorch on CUDA
pip install torch==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu128
3. Install other dependencies
pip install -r requirements.txt
4. Flash-attention installation:
pip install ninja
pip install flash_attn==2.8.0.post2 --no-build-isolation
5. FFmpeg installation
# Ubuntu / Debian
apt-get install ffmpeg
# CentOS / RHEL
yum install ffmpeg ffmpeg-devel
or
# Conda (no root required)
conda install -c conda-forge ffmpeg==7
π€ Model download
| Model Component | Description | Link |
|---|---|---|
SoulX-FlashTalk-14B |
Our 14b model | π€ Huggingface |
chinese-wav2vec2-base |
chinese-wav2vec2-base | π€ Huggingface |
# If you are in china mainland, run this first: export HF_ENDPOINT=https://hf-mirror.com
pip install "huggingface_hub[cli]"
huggingface-cli download Soul-AILab/SoulX-FlashTalk-14B --local-dir ./models/SoulX-FlashTalk-14B
huggingface-cli download TencentGameMate/chinese-wav2vec2-base --local-dir ./models/chinese-wav2vec2-base
π Inference
# Infer on single GPU
# Requires more than 64G of VRAM
bash inference_script_single_gpu.sh
# Infer on multy GPUs
# Real-time inference speed can only be supported on 8xH800 or higher graphics cards
bash inference_script_multi_gpu.sh
π Online Demo
Coming Soon!
π§ Contact Us
If you are interested in leaving a message to our work, feel free to email le.shen@mail.dhu.edu.cn or qiaoqian@soulapp.cn or yutan@soulapp.cn or zhouke@soulapp.cn or liusiyuan@soulapp.cn
Youβre welcome to join our WeChat group for technical discussions, updates.
π Citation
If you find our work useful in your research, please consider citing:
@misc{shen2025soulxflashtalktechnicalreport,
title={SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation},
author={Le Shen and Qian Qiao and Tan Yu and Ke Zhou and Tianhang Yu and Yu Zhan and Zhenjie Wang and Ming Tao and Shunshun Yin and Siyuan Liu},
year={2025},
eprint={2512.23379},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.23379},
}
π Acknowledgement
- Infinitetalk and Wan: the base model we built upon.
- Self forcing: the codebase we built upon.
- DMD and Self forcing++: the key distillation technique used by our method.
If you find our work useful, please also consider starring the original repositories of these foundational methods.
π‘ Star History
- Downloads last month
- 1,297