openmoss-ttsd-mlx / README.md
tamarher's picture
Update model card: S1/S2 speaker tag requirement, modes
f4450f8 verified
metadata
language:
  - zh
  - en
license: apache-2.0
library_name: mlx
pipeline_tag: text-to-speech
base_model: OpenMOSS-Team/MOSS-TTSD-v1.0
base_model_relation: quantized
tags:
  - mlx
  - tts
  - speech
  - multi-speaker
  - dialogue
  - apple-silicon
  - quantized
  - 8bit

OpenMOSS TTSD — MLX

MLX-native int8 conversion of OpenMOSS TTSD for multi-speaker dialogue generation on Apple Silicon.

Variants

Path Precision
mlx-int8/ int8 quantized weights

How to Get Started

Text must include [S1]/[S2] speaker tags. Omitting them produces degraded output.

python scripts/generate/moss_ttsd.py \
  --text "[S1] Watson, I think we should go. [S2] Give me one moment." \
  --output outputs/dialogue.wav

Supported modes: generation, continuation, voice_clone, voice_clone_and_continuation.

python scripts/generate/moss_ttsd.py \
  --mode voice_clone \
  --text "[S1] This voice was cloned from the reference." \
  --prompt-audio-speaker1 reference.wav \
  --output outputs/clone.wav

Batch JSONL mode is also supported — see python scripts/generate/moss_ttsd.py --help.

Model Details

Links

License

Apache 2.0 — following the upstream license published with OpenMOSS-Team/MOSS-TTSD-v1.0.