lucasnewman commited on
Commit
c8d4481
·
verified ·
1 Parent(s): a0c7abf

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +181 -0
README.md ADDED
@@ -0,0 +1,181 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: mlx-audio
3
+ tags:
4
+ - mlx
5
+ - text-to-speech
6
+ - speech
7
+ - speech generation
8
+ - voice cloning
9
+ - tts
10
+ - mlx-audio
11
+ license: other
12
+ license_name: fish-audio-research
13
+ license_link: https://huggingface.co/fishaudio/s2-pro/blob/main/LICENSE
14
+ language:
15
+ - en
16
+ - zh
17
+ - ja
18
+ - ko
19
+ - es
20
+ - pt
21
+ - ar
22
+ - ru
23
+ - fr
24
+ - de
25
+ - sv
26
+ - it
27
+ - tr
28
+ - "no"
29
+ - nl
30
+ - cy
31
+ - eu
32
+ - ca
33
+ - da
34
+ - gl
35
+ - ta
36
+ - hu
37
+ - fi
38
+ - pl
39
+ - et
40
+ - hi
41
+ - la
42
+ - ur
43
+ - th
44
+ - vi
45
+ - jv
46
+ - bn
47
+ - yo
48
+ - cs
49
+ - sw
50
+ - he
51
+ - ms
52
+ - uk
53
+ - id
54
+ - kk
55
+ - bg
56
+ - lv
57
+ - my
58
+ - tl
59
+ - sk
60
+ - ne
61
+ - fa
62
+ - af
63
+ - el
64
+ - bo
65
+ - hr
66
+ - ro
67
+ - sn
68
+ - mi
69
+ - yi
70
+ - am
71
+ - be
72
+ - km
73
+ - is
74
+ - az
75
+ - sd
76
+ - br
77
+ - sq
78
+ - ps
79
+ - mn
80
+ - ht
81
+ - ml
82
+ - sr
83
+ - sa
84
+ - te
85
+ - kn
86
+ - si
87
+ - hy
88
+ - mr
89
+ - as
90
+ - gu
91
+ - fo
92
+ pipeline_tag: text-to-speech
93
+ base_model: fishaudio/s2-pro
94
+ ---
95
+
96
+ # mlx-community/fish-audio-s2-pro-8bit
97
+
98
+ This model was converted to MLX format from [`fishaudio/s2-pro`](https://huggingface.co/fishaudio/s2-pro) using mlx-audio version **0.4.0**.
99
+
100
+ Refer to the [original model card](https://huggingface.co/fishaudio/s2-pro) for more details on the model.
101
+
102
+ ## Model Overview
103
+
104
+ Fish Audio S2 Pro is a leading text-to-speech model with fine-grained inline control of prosody and emotion. Trained on **10M+ hours** of audio data across **80+ languages**, it combines reinforcement learning alignment with a Dual-Autoregressive architecture.
105
+
106
+ ### Architecture
107
+
108
+ | Attribute | Value |
109
+ |-----------|-------|
110
+ | Total Parameters | 5B |
111
+ | Slow AR | 4B (time-axis, primary semantic codebook) |
112
+ | Fast AR | 400M (residual codebooks per time step) |
113
+ | Audio Codec | 10 codebooks @ ~21 Hz frame rate |
114
+ | Tensor Type | BF16 |
115
+
116
+ ### Fine-Grained Inline Control
117
+
118
+ Localized control over speech generation using `[tag]` syntax with free-form textual descriptions (15,000+ supported tags):
119
+
120
+ ```
121
+ [whisper in small voice]
122
+ [professional broadcast tone]
123
+ [pitch up]
124
+ ```
125
+
126
+ **Common Tags (15,000+ supported):**
127
+ `[pause]` `[emphasis]` `[laughing]` `[inhale]` `[chuckle]` `[tsk]` `[singing]` `[excited]` `[volume up]` `[echo]` `[angry]` `[whisper]` `[screaming]` `[sad]` `[shocked]` and many more.
128
+
129
+ ### Supported Languages
130
+
131
+ **Tier 1 (Full Support):** Japanese, English, Chinese
132
+ **Tier 2 (Strong Support):** Korean, Spanish, Portuguese, Arabic, Russian, French, German
133
+ **Additional:** 70+ more languages
134
+
135
+ ## Use with mlx-audio
136
+
137
+ ```bash
138
+ pip install -U mlx-audio
139
+ ```
140
+
141
+ ### CLI Example:
142
+ ```bash
143
+ python -m mlx_audio.tts.generate --model mlx-community/fish-audio-s2-pro-8bit --text "Hello, this is a test."
144
+ ```
145
+
146
+ ### Python Example:
147
+ ```python
148
+ from mlx_audio.tts.utils import load_model
149
+ from mlx_audio.tts.generate import generate_audio
150
+
151
+ model = load_model("mlx-community/fish-audio-s2-pro-8bit")
152
+ generate_audio(
153
+ model=model,
154
+ text="Hello, this is a test.",
155
+ ref_audio="path_to_audio.wav",
156
+ file_prefix="test_audio",
157
+ )
158
+ ```
159
+
160
+ ## Citation
161
+
162
+ ```bibtex
163
+ @misc{liao2026fishaudios2technical,
164
+ title={Fish Audio S2 Technical Report},
165
+ author={Shijia Liao and Yuxuan Wang and Songting Liu and Yifan Cheng and Ruoyi Zhang and Tianyu Li and Shidong Li and Yisheng Zheng and Xingwei Liu and Qingzheng Wang and Zhizhuo Zhou and Jiahua Liu and Xin Chen and Dawei Han},
166
+ year={2026},
167
+ eprint={2603.08823},
168
+ archivePrefix={arXiv},
169
+ primaryClass={cs.SD},
170
+ url={https://arxiv.org/abs/2603.08823},
171
+ }
172
+ ```
173
+
174
+ ## License
175
+
176
+ This model is released under the **Fish Audio Research License**:
177
+ - Research use: Free
178
+ - Non-commercial use: Free
179
+ - Commercial use: Requires separate license from Fish Audio (contact: business@fish.audio)
180
+
181
+ See the [original model](https://huggingface.co/fishaudio/s2-pro) for full license details.