--- license: other license_name: uc-research-education-not-for-profit license_link: https://huggingface.co/Louis0324/StyleStream/blob/main/LICENSE library_name: pytorch tags: - voice-conversion - speech - audio - streaming - style-transfer - research - not-for-profit ---

StyleStream

arXiv demo GitHub license

StyleStream: Real-Time Zero-Shot Voice Style Conversion

Official PyTorch model weights for streamable voice style conversion in timbre, accent, and emotion.

StyleStream overview

**Release note:** To reduce voice-cloning misuse, this public release excludes the style encoder weights. Public inference uses curated target speaker embeddings, not arbitrary target-speaker cloning. ## News - 2026/06/11: StyleStream offline / streaming inference code and weights are open sourced! 🔥 🔥 🔥 - 2026/06/03: StyleStream was accepted to the INTERSPEECH 2026 long paper track! 🎉 🎉 🎉 ## Files This Hugging Face repo hosts the public inference assets: - `stylizer-no-style-enc.ckpt`: stylizer checkpoint without style encoder weights - `destylizer.ckpt`: destylizer checkpoint - `vocos_causal_best.ckpt`: causal vocoder checkpoint - `target_spkrs.tar`: larger curated target speaker inventory Small target examples and the full inference code are available in the GitHub repo: ```text https://github.com/Berkeley-Speech-Group/StyleStream ``` ## Download Install the Hugging Face CLI if needed: ```bash pip install huggingface_hub ``` From the StyleStream project root, download checkpoints: ```bash hf download Louis0324/StyleStream \ stylizer-no-style-enc.ckpt destylizer.ckpt vocos_causal_best.ckpt \ --repo-type model --local-dir assets/ckpts ``` Download the larger target speaker inventory: ```bash hf download Louis0324/StyleStream target_spkrs.tar --repo-type model --local-dir assets/target_spkrs ``` Expected local layout: ```text assets/ckpts/ stylizer-no-style-enc.ckpt destylizer.ckpt vocos_causal_best.ckpt assets/target_spkrs/ target_spkrs.tar ``` ## Usage Clone the GitHub repo and follow its setup instructions: ```bash git clone https://github.com/Berkeley-Speech-Group/StyleStream.git cd StyleStream pip install -r requirements.txt ``` Offline Streamlit app: ```bash streamlit run inference/offline_app.py ``` Recommended streaming inference: ```bash python inference/streaming.py ``` Use this terminal script for the fastest realtime performance. It runs the speed test before audio IO, selects a streamable inference-step setting, and lets you switch target styles by typing a target index. Streaming Streamlit app: ```bash streamlit run inference/streaming_app.py ``` Use this when you want browser-based target selection, audio device selection, live status, and speed-test visualization. It has the same core streaming functionality, but is slower because of Streamlit overhead. Command-line examples: ```bash ./inference/run_inference_offline.sh ./inference/run_inference_simulate_streaming.sh ``` ## Style Inventory Target styles use this folder format: ```text target_name/ target_name.wav target_name.npy ``` The `.wav` provides target mel/acoustic context. The `.npy` file is the pre-extracted style embedding with shape `[768]`. ## Intended Use StyleStream is released for educational, research, and not-for-profit use. It is intended for voice style conversion research, benchmarking, comparison, and reproducible inference. The public release does not include style encoder weights and does not support arbitrary target-speaker cloning. ## License The code is released under a **research, educational, and not-for-profit software license**. Commercial use requires prior written permission from The Regents of the University of California. See the `LICENSE` file in this Hugging Face model repo: ```text https://huggingface.co/Louis0324/StyleStream/blob/main/LICENSE ``` ## Acknowledgements [F5-TTS](https://arxiv.org/abs/2410.06885): stylizer flow matching modules. ## Citation If you find StyleStream useful, please consider giving a star and citation: ```bibtex @article{liu2026stylestream, title={StyleStream: Real-Time Zero-Shot Voice Style Conversion}, author={Yisi Liu and Nicholas Lee and Gopala Anumanchipalli}, journal={arXiv preprint arXiv:2602.20113}, year={2026} } ```