Louis0324 commited on
Commit
60f0e79
Β·
verified Β·
1 Parent(s): 034be1e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +166 -0
README.md ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: research-only-non-commercial
4
+ license_link: https://huggingface.co/Louis0324/StyleStream/blob/main/LICENSE
5
+ library_name: pytorch
6
+ tags:
7
+ - voice-conversion
8
+ - speech
9
+ - audio
10
+ - streaming
11
+ - style-transfer
12
+ - research-only
13
+ ---
14
+
15
+ <h1 align="center">
16
+ StyleStream
17
+ </h1>
18
+
19
+ <p align="center">
20
+ <a href="http://arxiv.org/abs/2602.20113"><img src="https://img.shields.io/badge/arXiv-2602.20113-b31b1b.svg?logo=arXiv" alt="arXiv" /></a>
21
+ <a href="https://berkeley-speech-group.github.io/StyleStream/"><img src="https://img.shields.io/badge/GitHub-Demo-orange.svg" alt="demo" /></a>
22
+ <a href="https://github.com/Berkeley-Speech-Group/StyleStream"><img src="https://img.shields.io/badge/GitHub-Code-black.svg?logo=github" alt="GitHub" /></a>
23
+ <a href="https://huggingface.co/Louis0324/StyleStream/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-Research--Only-blue.svg" alt="license" /></a>
24
+ </p>
25
+
26
+ <p align="center">
27
+ <strong>StyleStream: Real-Time Zero-Shot Voice Style Conversion</strong>
28
+ </p>
29
+
30
+ <p align="center">
31
+ Official PyTorch model weights for streamable voice style conversion in timbre, accent, and emotion.
32
+ </p>
33
+
34
+ <p align="center">
35
+ <img src="assets/figures/overview.png" alt="StyleStream overview" width="100%" />
36
+ </p>
37
+
38
+ **Release note:** To reduce voice-cloning misuse, this public release excludes the style encoder weights. Public inference uses curated target speaker embeddings, not arbitrary target-speaker cloning.
39
+
40
+ ## News
41
+
42
+ - 2026/06/11: StyleStream offline / streaming inference code and weights are open sourced! πŸ”₯ πŸ”₯ πŸ”₯
43
+ - 2026/06/03: StyleStream was accepted to the INTERSPEECH 2026 long paper track! πŸŽ‰ πŸŽ‰ πŸŽ‰
44
+
45
+ ## Files
46
+
47
+ This Hugging Face repo hosts the public inference assets:
48
+
49
+ - `stylizer-no-style-enc.ckpt`: stylizer checkpoint without style encoder weights
50
+ - `destylizer.ckpt`: destylizer checkpoint
51
+ - `vocos_causal_best.ckpt`: causal vocoder checkpoint
52
+ - `target_spkrs.tar`: larger curated target speaker inventory
53
+
54
+ Small target examples and the full inference code are available in the GitHub repo:
55
+
56
+ ```text
57
+ https://github.com/Berkeley-Speech-Group/StyleStream
58
+ ```
59
+
60
+ ## Download
61
+
62
+ Install the Hugging Face CLI if needed:
63
+
64
+ ```bash
65
+ pip install huggingface_hub
66
+ ```
67
+
68
+ From the StyleStream project root, download checkpoints:
69
+
70
+ ```bash
71
+ hf download Louis0324/StyleStream \
72
+ stylizer-no-style-enc.ckpt destylizer.ckpt vocos_causal_best.ckpt \
73
+ --repo-type model --local-dir assets/ckpts
74
+ ```
75
+
76
+ Download the larger target speaker inventory:
77
+
78
+ ```bash
79
+ hf download Louis0324/StyleStream target_spkrs.tar --repo-type model --local-dir assets/target_spkrs
80
+ ```
81
+
82
+ Expected local layout:
83
+
84
+ ```text
85
+ assets/ckpts/
86
+ stylizer-no-style-enc.ckpt
87
+ destylizer.ckpt
88
+ vocos_causal_best.ckpt
89
+
90
+ assets/target_spkrs/
91
+ target_spkrs.tar
92
+ ```
93
+
94
+ ## Usage
95
+
96
+ Clone the GitHub repo and follow its setup instructions:
97
+
98
+ ```bash
99
+ git clone https://github.com/Berkeley-Speech-Group/StyleStream.git
100
+ cd StyleStream
101
+ pip install -r requirements.txt
102
+ ```
103
+
104
+ Offline Streamlit app:
105
+
106
+ ```bash
107
+ streamlit run inference/offline_app.py
108
+ ```
109
+
110
+ Streaming Streamlit app:
111
+
112
+ ```bash
113
+ streamlit run inference/streaming_app.py
114
+ ```
115
+
116
+ Command-line examples:
117
+
118
+ ```bash
119
+ ./inference/run_inference_offline.sh
120
+ ./inference/run_inference_simulate_streaming.sh
121
+ ```
122
+
123
+ ## Style Inventory
124
+
125
+ Target styles use this folder format:
126
+
127
+ ```text
128
+ target_name/
129
+ target_name.wav
130
+ target_name.npy
131
+ ```
132
+
133
+ The `.wav` provides target mel/acoustic context. The `.npy` file is the pre-extracted style embedding with shape `[768]`.
134
+
135
+ ## Intended Use
136
+
137
+ StyleStream is released for non-commercial research and education. It is intended for voice style conversion research, benchmarking, comparison, and reproducible inference.
138
+
139
+ The public release does not include style encoder weights and does not support arbitrary target-speaker cloning.
140
+
141
+ ## License
142
+
143
+ The code is released under a **research-only, non-commercial license**. Commercial use is not permitted without explicit permission.
144
+
145
+ See the `LICENSE` file in this Hugging Face model repo:
146
+
147
+ ```text
148
+ https://huggingface.co/Louis0324/StyleStream/blob/main/LICENSE
149
+ ```
150
+
151
+ ## Acknowledgements
152
+
153
+ [F5-TTS](https://arxiv.org/abs/2410.06885): stylizer flow matching modules.
154
+
155
+ ## Citation
156
+
157
+ If you find StyleStream useful, please consider giving a star and citation:
158
+
159
+ ```bibtex
160
+ @article{liu2026stylestream,
161
+ title={StyleStream: Real-Time Zero-Shot Voice Style Conversion},
162
+ author={Yisi Liu and Nicholas Lee and Gopala Anumanchipalli},
163
+ journal={arXiv preprint arXiv:2602.20113},
164
+ year={2026}
165
+ }
166
+ ```