Add kitten_tts tag

4097e05 verified 2 months ago

1.28 kB

tags:
  - mlx
  - text-to-speech
  - kitten-tts
  - kitten_tts

mlx-community/kitten-tts-micro-0.8-8bit

This is the INT8 (MLX 8-bit) MLX conversion of KittenML/kitten-tts-micro-0.8.

Usage

pip install -U mlx-audio

python -m mlx_audio.tts.generate --model mlx-community/kitten-tts-micro-0.8-8bit --text "This is a local MLX test voice." --voice "expr-voice-5-m"

Inference Notes

The MLX implementation includes small end-of-utterance smoothing to prevent abrupt cutoffs. You can override it with fade_out_ms=0 and tail_silence_ms=0 in Model.generate().

Conversion Notes / Fixes

AdaIN fc.weight orientation was corrected (ONNX stores as (in, out) even when square).
AdaIN Snake alpha parameters are loaded and used for generator resblocks.
ConvTranspose output padding matches the original (right-side pad for output_padding=1).
Phase slice is passed through sin before ISTFT, matching the ONNX graph.
ISTFT uses normalized windowing without phase unwrap (to match original behavior).
Tail trim + dynamic fade-out + tail silence are applied at inference time to avoid a trailing spurt.

Original Model

Refer to the original model card for details: https://huggingface.co/KittenML/kitten-tts-micro-0.8