Reza2kn's picture
Add kitten_tts tag
4097e05 verified
metadata
tags:
  - mlx
  - text-to-speech
  - kitten-tts
  - kitten_tts

mlx-community/kitten-tts-micro-0.8-8bit

This is the INT8 (MLX 8-bit) MLX conversion of KittenML/kitten-tts-micro-0.8.

Usage

pip install -U mlx-audio
python -m mlx_audio.tts.generate --model mlx-community/kitten-tts-micro-0.8-8bit --text "This is a local MLX test voice." --voice "expr-voice-5-m"

Inference Notes

The MLX implementation includes small end-of-utterance smoothing to prevent abrupt cutoffs. You can override it with fade_out_ms=0 and tail_silence_ms=0 in Model.generate().

Conversion Notes / Fixes

  • AdaIN fc.weight orientation was corrected (ONNX stores as (in, out) even when square).
  • AdaIN Snake alpha parameters are loaded and used for generator resblocks.
  • ConvTranspose output padding matches the original (right-side pad for output_padding=1).
  • Phase slice is passed through sin before ISTFT, matching the ONNX graph.
  • ISTFT uses normalized windowing without phase unwrap (to match original behavior).
  • Tail trim + dynamic fade-out + tail silence are applied at inference time to avoid a trailing spurt.

Original Model

Refer to the original model card for details: https://huggingface.co/KittenML/kitten-tts-micro-0.8