Fish Audio S2 Pro โ€” MLX 8-bit

This repository contains a self-contained MLX-native int8 conversion of Fish Audio S2 Pro for local text-to-speech on Apple Silicon.

It is intended for local speech generation with mlx-speech, without a PyTorch runtime at inference time.

Model Details

  • Developed by: AppAutomaton
  • Shared by: AppAutomaton on Hugging Face
  • Upstream model: fishaudio/s2-pro
  • Task: text-to-speech and voice cloning
  • Runtime: MLX on Apple Silicon
  • Precision: int8 main model weights with bundled MLX codec assets

Bundle Contents

This bundle is self-contained and includes:

  • config.json
  • model.safetensors
  • tokenizer files
  • codec-mlx/config.json
  • codec-mlx/model.safetensors

The Fish S2 Pro runtime uses the bundled codec-mlx/ directory to decode model codes into waveform output.

How to Get Started

Basic generation:

python scripts/generate/fish_s2_pro.py \
  --text "Hello from Fish S2 Pro." \
  --model-dir /path/to/fishaudio-s2-pro-8bit-mlx \
  --output outputs/fish_s2_pro.wav

Voice cloning:

python scripts/generate/fish_s2_pro.py \
  --text "This is a cloned voice." \
  --reference-audio /path/to/reference.wav \
  --reference-text "Transcript of the reference audio." \
  --model-dir /path/to/fishaudio-s2-pro-8bit-mlx \
  --output outputs/fish_s2_pro_clone.wav

Inline prosody and emotion tags:

Fish S2 Pro supports 15,000+ inline tags placed directly in the text. Tags are single open-style [tag] โ€” no closing tag. Place them immediately before the word or phrase they apply to.

python scripts/generate/fish_s2_pro.py \
  --text "Now Bobby, [clearing throat] I need to talk to you. [whisper] This stays between us. [chuckle] Just kidding." \
  --reference-audio /path/to/reference.wav \
  --reference-text "Transcript of the reference audio." \
  --output outputs/fish_s2_pro_emotion.wav

Common tags: [whisper], [chuckle], [laugh], [clearing throat], [excited], [sad], [pause]. See the upstream repo for the full tag list.

Minimal Python usage:

from pathlib import Path

from mlx_speech.generation.fish_s2_pro import generate_fish_s2_pro

result = generate_fish_s2_pro(
    "Hello from Fish S2 Pro.",
    model_dir=Path("/path/to/fishaudio-s2-pro-8bit-mlx"),
)

Notes

  • This repo contains the quantized MLX runtime artifact only.
  • The conversion keeps the Fish S2 Pro dual-autoregressive model architecture and ships a bundled MLX codec for waveform decode.
  • Upstream defaults: temperature=0.8, top_p=0.8.
  • The current bundle is intended for local MLX runtime use and parity validation.

Links

License

Fish Audio Research License โ€” following the upstream license published with fishaudio/s2-pro.

Downloads last month
20
Safetensors
Model size
1B params
Tensor type
BF16
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for appautomaton/fishaudio-s2-pro-8bit-mlx

Base model

fishaudio/s2-pro
Quantized
(6)
this model