Update model card: S1/S2 speaker tag requirement, modes

f4450f8 verified 6 days ago

1.59 kB

language:
  - zh
  - en
license: apache-2.0
library_name: mlx
pipeline_tag: text-to-speech
base_model: OpenMOSS-Team/MOSS-TTSD-v1.0
base_model_relation: quantized
tags:
  - mlx
  - tts
  - speech
  - multi-speaker
  - dialogue
  - apple-silicon
  - quantized
  - 8bit

OpenMOSS TTSD — MLX

MLX-native int8 conversion of OpenMOSS TTSD for multi-speaker dialogue generation on Apple Silicon.

Variants

Path	Precision
`mlx-int8/`	int8 quantized weights

How to Get Started

Text must include [S1]/[S2] speaker tags. Omitting them produces degraded output.

python scripts/generate/moss_ttsd.py \
  --text "[S1] Watson, I think we should go. [S2] Give me one moment." \
  --output outputs/dialogue.wav

Supported modes: generation, continuation, voice_clone, voice_clone_and_continuation.

python scripts/generate/moss_ttsd.py \
  --mode voice_clone \
  --text "[S1] This voice was cloned from the reference." \
  --prompt-audio-speaker1 reference.wav \
  --output outputs/clone.wav

Batch JSONL mode is also supported — see python scripts/generate/moss_ttsd.py --help.

Model Details

Upstream model: OpenMOSS-Team/MOSS-TTSD-v1.0
Task: multi-speaker text-to-speech and voice cloning
Runtime: MLX on Apple Silicon

License

Apache 2.0 — following the upstream license published with OpenMOSS-Team/MOSS-TTSD-v1.0.

appautomaton
/

openmoss-ttsd-mlx

OpenMOSS TTSD — MLX

Variants

How to Get Started

Model Details

Links

License