Louis0324
/

StyleStream

+---
+license: other
+license_name: research-only-non-commercial
+license_link: https://huggingface.co/Louis0324/StyleStream/blob/main/LICENSE
+library_name: pytorch
+tags:
+  - voice-conversion
+  - speech
+  - audio
+  - streaming
+  - style-transfer
+  - research-only
+---
+<h1 align="center">
+  StyleStream
+</h1>
+<p align="center">
+  <a href="http://arxiv.org/abs/2602.20113"><img src="https://img.shields.io/badge/arXiv-2602.20113-b31b1b.svg?logo=arXiv" alt="arXiv" /></a>
+  <a href="https://berkeley-speech-group.github.io/StyleStream/"><img src="https://img.shields.io/badge/GitHub-Demo-orange.svg" alt="demo" /></a>
+  <a href="https://github.com/Berkeley-Speech-Group/StyleStream"><img src="https://img.shields.io/badge/GitHub-Code-black.svg?logo=github" alt="GitHub" /></a>
+  <a href="https://huggingface.co/Louis0324/StyleStream/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-Research--Only-blue.svg" alt="license" /></a>
+</p>
+<p align="center">
+  <strong>StyleStream: Real-Time Zero-Shot Voice Style Conversion</strong>
+</p>
+<p align="center">
+  Official PyTorch model weights for streamable voice style conversion in timbre, accent, and emotion.
+</p>
+<p align="center">
+  <img src="assets/figures/overview.png" alt="StyleStream overview" width="100%" />
+</p>
+**Release note:** To reduce voice-cloning misuse, this public release excludes the style encoder weights. Public inference uses curated target speaker embeddings, not arbitrary target-speaker cloning.
+## News
+- 2026/06/11: StyleStream offline / streaming inference code and weights are open sourced! 🔥 🔥 🔥
+- 2026/06/03: StyleStream was accepted to the INTERSPEECH 2026 long paper track! 🎉 🎉 🎉
+## Files
+This Hugging Face repo hosts the public inference assets:
+- `stylizer-no-style-enc.ckpt`: stylizer checkpoint without style encoder weights
+- `destylizer.ckpt`: destylizer checkpoint
+- `vocos_causal_best.ckpt`: causal vocoder checkpoint
+- `target_spkrs.tar`: larger curated target speaker inventory
+Small target examples and the full inference code are available in the GitHub repo:
+```text
+https://github.com/Berkeley-Speech-Group/StyleStream
+```
+## Download
+Install the Hugging Face CLI if needed:
+```bash
+pip install huggingface_hub
+```
+From the StyleStream project root, download checkpoints:
+```bash
+hf download Louis0324/StyleStream \
+  stylizer-no-style-enc.ckpt destylizer.ckpt vocos_causal_best.ckpt \
+  --repo-type model --local-dir assets/ckpts
+```
+Download the larger target speaker inventory:
+```bash
+hf download Louis0324/StyleStream target_spkrs.tar --repo-type model --local-dir assets/target_spkrs
+```
+Expected local layout:
+```text
+assets/ckpts/
+  stylizer-no-style-enc.ckpt
+  destylizer.ckpt
+  vocos_causal_best.ckpt
+assets/target_spkrs/
+  target_spkrs.tar
+```
+## Usage
+Clone the GitHub repo and follow its setup instructions:
+```bash
+git clone https://github.com/Berkeley-Speech-Group/StyleStream.git
+cd StyleStream
+pip install -r requirements.txt
+```
+Offline Streamlit app:
+```bash
+streamlit run inference/offline_app.py
+```
+Streaming Streamlit app:
+```bash
+streamlit run inference/streaming_app.py
+```
+Command-line examples:
+```bash
+./inference/run_inference_offline.sh
+./inference/run_inference_simulate_streaming.sh
+```
+## Style Inventory
+Target styles use this folder format:
+```text
+target_name/
+  target_name.wav
+  target_name.npy
+```
+The `.wav` provides target mel/acoustic context. The `.npy` file is the pre-extracted style embedding with shape `[768]`.
+## Intended Use
+StyleStream is released for non-commercial research and education. It is intended for voice style conversion research, benchmarking, comparison, and reproducible inference.
+The public release does not include style encoder weights and does not support arbitrary target-speaker cloning.
+## License
+The code is released under a **research-only, non-commercial license**. Commercial use is not permitted without explicit permission.
+See the `LICENSE` file in this Hugging Face model repo:
+```text
+https://huggingface.co/Louis0324/StyleStream/blob/main/LICENSE
+```
+## Acknowledgements
+[F5-TTS](https://arxiv.org/abs/2410.06885): stylizer flow matching modules.
+## Citation
+If you find StyleStream useful, please consider giving a star and citation:
+```bibtex
+@article{liu2026stylestream,
+  title={StyleStream: Real-Time Zero-Shot Voice Style Conversion},
+  author={Yisi Liu and Nicholas Lee and Gopala Anumanchipalli},
+  journal={arXiv preprint arXiv:2602.20113},
+  year={2026}
+}
+```