mlx-community/fish-audio-s2-pro-bf16

This model was converted to MLX format from fishaudio/s2-pro using mlx-audio version 0.4.0.

Refer to the original model card for more details on the model.

Model Overview

Fish Audio S2 Pro is a leading text-to-speech model with fine-grained inline control of prosody and emotion. Trained on 10M+ hours of audio data across 80+ languages, it combines reinforcement learning alignment with a Dual-Autoregressive architecture.

Architecture

Attribute	Value
Total Parameters	5B
Slow AR	4B (time-axis, primary semantic codebook)
Fast AR	400M (residual codebooks per time step)
Audio Codec	10 codebooks @ ~21 Hz frame rate
Tensor Type	BF16

Fine-Grained Inline Control

Localized control over speech generation using [tag] syntax with free-form textual descriptions (15,000+ supported tags):

[whisper in small voice]
[professional broadcast tone]
[pitch up]

Common Tags (15,000+ supported): [pause] [emphasis] [laughing] [inhale] [chuckle] [tsk] [singing] [excited] [volume up] [echo] [angry] [whisper] [screaming] [sad] [shocked] and many more.

Supported Languages

Tier 1 (Full Support): Japanese, English, Chinese Tier 2 (Strong Support): Korean, Spanish, Portuguese, Arabic, Russian, French, German Additional: 70+ more languages

Use with mlx-audio

pip install -U mlx-audio

CLI Example:

python -m mlx_audio.tts.generate --model mlx-community/fish-audio-s2-pro-bf16 --text "Hello, this is a test."

Python Example:

from mlx_audio.tts.utils import load_model
from mlx_audio.tts.generate import generate_audio

model = load_model("mlx-community/fish-audio-s2-pro-bf16")
generate_audio(
    model=model,
    text="Hello, this is a test.",
    ref_audio="path_to_audio.wav",
    file_prefix="test_audio",
)

Citation

@misc{liao2026fishaudios2technical,
      title={Fish Audio S2 Technical Report},
      author={Shijia Liao and Yuxuan Wang and Songting Liu and Yifan Cheng and Ruoyi Zhang and Tianyu Li and Shidong Li and Yisheng Zheng and Xingwei Liu and Qingzheng Wang and Zhizhuo Zhou and Jiahua Liu and Xin Chen and Dawei Han},
      year={2026},
      eprint={2603.08823},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2603.08823},
}

License

This model is released under the Fish Audio Research License:

Research use: Free
Non-commercial use: Free
Commercial use: Requires separate license from Fish Audio (contact: business@fish.audio)

See the original model for full license details.

Downloads last month: 961

Safetensors

Model size

5B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for mlx-community/fish-audio-s2-pro-bf16

Base model

fishaudio/s2-pro

Finetuned

(10)

this model

Quantizations

1 model

Paper for mlx-community/fish-audio-s2-pro-bf16

Fish Audio S2 Technical Report

Paper • 2603.08823 • Published Mar 9 • 38