SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation

Le Shen*, Qian Qiao*, Tan Yu*, Ke Zhou, Tianhang Yu, Yu Zhan, Zhenjie Wang, Dingcheng Zhen, Ming Tao, Shunshun Yin, Siyuan Liu ^✉

^*Equal Contribution ^✉Corresponding Author

🔥 News

2026.01.08 - We have released the inference code, and the model weights.
2025.12.30 - We released Project page on SoulX-FlashTalk.
2025.12.30 - We released SoulX-FlashTalk Technical Report on Arxiv and GitHub repository.

🤫 Coming soon

A 4-GPU version of SoulX-FlashTalk and a new open-source real-time streaming digital human model designed specifically for consumer-grade GPUs like 4090 etc.

📑 Todo List

Technical report
Project Page
Inference code
Checkpoint release
Online demo

🌰 Examples

Portrait Style

Animal Animation

Fast Paced Rap

📖 Quickstart

🔧 Installation

1. Create a Conda environment

conda create -n flashtalk python=3.10
conda activate flashtalk

2. Install PyTorch on CUDA

pip install torch==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu128

3. Install other dependencies

pip install -r requirements.txt

4. Flash-attention installation:

pip install ninja
pip install flash_attn==2.8.0.post2 --no-build-isolation

5. FFmpeg installation

# Ubuntu / Debian
apt-get install ffmpeg
# CentOS / RHEL
yum install ffmpeg ffmpeg-devel

# Conda (no root required) 
conda install -c conda-forge ffmpeg==7

🤗 Model download

Model Component	Description	Link
`SoulX-FlashTalk-14B`	Our 14b model	🤗 Huggingface
`chinese-wav2vec2-base`	chinese-wav2vec2-base	🤗 Huggingface

# If you are in china mainland, run this first: export HF_ENDPOINT=https://hf-mirror.com
pip install "huggingface_hub[cli]"
huggingface-cli download Soul-AILab/SoulX-FlashTalk-14B --local-dir ./models/SoulX-FlashTalk-14B
huggingface-cli download TencentGameMate/chinese-wav2vec2-base --local-dir ./models/chinese-wav2vec2-base

🚀 Inference

# Infer on single GPU
# Requires more than 64G of VRAM
bash inference_script_single_gpu.sh

# Infer on multy GPUs
# Real-time inference speed can only be supported on 8xH800 or higher graphics cards
bash inference_script_multi_gpu.sh

👋 Online Demo

Coming Soon!

📧 Contact Us

If you are interested in leaving a message to our work, feel free to email le.shen@mail.dhu.edu.cn or qiaoqian@soulapp.cn or yutan@soulapp.cn or zhouke@soulapp.cn or liusiyuan@soulapp.cn

You’re welcome to join our WeChat group for technical discussions, updates.

WeChat Group QR Code

📚 Citation

If you find our work useful in your research, please consider citing:

@misc{shen2025soulxflashtalktechnicalreport,
      title={SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation}, 
      author={Le Shen and Qian Qiao and Tan Yu and Ke Zhou and Tianhang Yu and Yu Zhan and Zhenjie Wang and Ming Tao and Shunshun Yin and Siyuan Liu},
      year={2025},
      eprint={2512.23379},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.23379}, 
}

🙇 Acknowledgement

Infinitetalk and Wan: the base model we built upon.
Self forcing: the codebase we built upon.
DMD and Self forcing++: the key distillation technique used by our method.

If you find our work useful, please also consider starring the original repositories of these foundational methods.

💡 Star History

Downloads last month: 2,343

Paper for Soul-AILab/SoulX-FlashTalk-14B

SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation

Paper • 2512.23379 • Published Dec 29, 2025 • 3