dots-tts-mlx β€” quantized MLX weights (Apple Silicon)

Ready-to-run MLX weights for rednote-hilab/dots.tts-soar β€” a 2B continuous-AR flow-matching, multilingual (24 languages, same as upstream), zero-shot voice-clone TTS β€” quantized for Apple Silicon. Download and run with the dots-tts-mlx runtime β€” no PyTorch and no conversion step.

Languages: same as upstream dots.tts β€” all 24 (Arabic, Cantonese, Chinese, Czech, Dutch, English, Finnish, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Polish, Portuguese, Romanian, Russian, Spanish, Thai, Turkish, Ukrainian, Vietnamese). The 5-language check in Quality is only the quantization spot-check, not the supported set.

These are converted + LLM-quantized MLX safetensors, not PyTorch. They load only with the dots-tts-mlx runtime on Apple Silicon (Metal). For the original PyTorch model, see rednote-hilab/dots.tts-soar.

Variants

Subfolder Download vs original (~9 GB) Use
int4/ ⭐ ~2.4 GB βˆ’73% recommended
int8/ ~3.1 GB βˆ’65% conservative fallback

Only the Qwen2.5-1.5B LLM trunk (β‰ˆ70% of the weights) is quantized (group-wise affine, group size 64); the precision-sensitive flow-matching DiT, the BigVGAN vocoder, and the CAM++ speaker encoder stay bf16.

Quality

Quantization is validated to be lossless relative to the full-precision MLX build: on a small multilingual acceptance check (EN/DE/ES/FR + Hindi), int8 and int4 showed no transcription-accuracy or voice-similarity regression vs bf16. This is a sanity check, not a dataset-scale benchmark β€” evaluate on your own content.

Correctness of the port itself is gated per-stage against the original PyTorch model (AudioVAE PSNR β‰ˆ 56 dB; attention / DiT / LLM / semantic-encoder cosine β‰₯ 0.9999) β€” see the runtime repo.

Usage

# 1. install the quant-aware runtime (>= v0.2.0)
pip install "git+https://github.com/sb1992/dots-tts-mlx.git@v0.2.0"

# 2. download the variant you want
hf download shraey/dots-tts-mlx --include "int4/*" --local-dir ./dots-tts-mlx-weights

# 3. run (files land in ./dots-tts-mlx-weights/int4/)
dots-tts --model ./dots-tts-mlx-weights/int4 \
    --text "Hello from MLX." --ref-audio reference.wav --language EN \
    --out-path out --out-prefix clone

The runtime auto-detects the quantization block in config.json, so nothing changes at the CLI/API level vs an unquantized directory. Python API and the full flag set: see the runtime repo.

  • Memory: runs in ~6 GB with a short (2–3s) reference; the in-context prompt-prefill scales with reference length, so a longer reference raises the peak.
  • Requires: Apple Silicon (MLX is Metal-only), Python β‰₯ 3.10.

Attribution & licenses

Derivative quantized weights of rednote-hilab/dots.tts-soar (Apache-2.0) β€” you must comply with the upstream license. Components:

  • dots.tts β€” model Β· code β€” Apache-2.0, Β© the dots.tts team at rednote-hilab.
  • Qwen2.5-1.5B-Base (LLM backbone) β€” Apache-2.0.
  • CAM++ / 3D-Speaker (speaker x-vector encoder) β€” Apache-2.0.
  • BigVGAN (vocoder/decoder architecture style) β€” MIT, Β© NVIDIA.

MLX port + quantization code: github.com/sb1992/dots-tts-mlx (Apache-2.0).

Responsible use

This performs zero-shot voice cloning β€” it can reproduce a person's voice from a few seconds of audio. Only clone voices you own or for which you have explicit, informed consent; do not use it for impersonation, fraud, or deception; and disclose AI-generated audio wherever it's shared. See the upstream risks guidance.

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for shraey/dots-tts-mlx

Finetuned
(3)
this model