mlx-indextts2-standard-8bit

This is a converted MLX IndexTTS2 model for Apple Silicon inference with solar2ain/mlx-indextts.

It was prepared for the local /Users/vanch/index-tts IndexTTS2 optimization project, where the goal was stable Vietnamese and multilingual TTS on an M3 Max Mac without PyTorch MPS memory crashes.

Variant

  • Profile: Standard multilingual
  • Precision / quantization: 8bit
  • Approx local size: 2.8GB
  • Source checkpoint directory during conversion: /Users/vanch/index-tts/checkpoints
  • Note: Upstream MLX GPT-only 8-bit quantization; S2Mel and BigVGAN remain fp32.
  • Conversion detail: Converted with mlx-indextts convert --quantize 8. In the current upstream implementation this quantizes GPT only; S2Mel and BigVGAN stay fp32.

Expected Files

The repository root is a ready-to-use MLX IndexTTS2 model directory:

  • gpt.safetensors
  • s2mel.safetensors
  • bigvgan.safetensors
  • vq2emb.safetensors
  • tokenizer.model
  • config.yaml
  • config.json
  • feat1.pt
  • feat2.pt
  • wav2vec2bert_stats.pt

Usage

Install and use mlx-indextts:

git clone https://github.com/solar2ain/mlx-indextts.git
cd mlx-indextts
uv sync --extra convert --extra v2

huggingface-cli download vanch007/mlx-indextts2-standard-8bit \
  --local-dir models/mlx-indextts2-standard-8bit \
  --local-dir-use-symlinks False

uv run mlx-indextts generate \
  -m models/mlx-indextts2-standard-8bit \
  -r /path/to/reference_or_speaker.npz \
  -t "Your text here" \
  -o output.wav \
  --memory-limit 24 \
  --diffusion-steps 16

For repeated generation, precompute speaker conditioning first:

uv run mlx-indextts speaker \
  -m models/mlx-indextts2-standard-8bit \
  -r /path/to/reference.wav \
  -o speaker.npz \
  --memory-limit 24

Benchmark

Benchmarked on a 128GB unified-memory M3 Max Mac using:

  • mlx-indextts from solar2ain/mlx-indextts
  • precomputed .npz speaker conditioning
  • memory_limit=24GB
  • diffusion_steps=16
  • emotion=calm, emo_alpha=0.6
  • same text set across fp32 / fp16 / 8bit / optimized PyTorch MPS

RTF lower is faster:

Case fp32 MLX RTF fp16 MLX RTF 8bit MLX RTF PyTorch MPS RTF
zh short 1.127 1.538 0.966 1.446
zh long 1.232 1.584 1.035 1.699
en short 1.157 1.462 0.914 2.192
en long 1.193 1.511 0.956 1.783

Summary from the local comparison:

  • 8bit was the fastest MLX route in this test set.
  • fp16 saved space but was slower than fp32 for the standard profile.
  • Vietnamese fp16 was slightly faster than Vietnamese fp32, but Vietnamese 8bit was fastest.

ASR Validation

ASR validation with local mlx_whisper + whisper-large-v3-turbo found no empty audio, wrong-language output, or obvious missing sentences. Chinese long-form ASR showed a minor 她/他 homophone difference; English long-form 8-bit ASR showed a minor tense difference.

ASR was used only as an automated sanity check. Final production selection should still include human listening, especially for long-form Vietnamese narration.

Provenance and Scope

This is an MLX conversion for local Apple Silicon inference, not the original PyTorch release. The original implementation and model family are associated with IndexTTS / IndexTTS2; the MLX runtime used here is solar2ain/mlx-indextts.

The benchmark numbers are environment-specific and should be treated as local M3 Max results, not universal performance guarantees.

Downloads last month
34
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support