Instructions to use smcleod/dots.tts-soar-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use smcleod/dots.tts-soar-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir dots.tts-soar-mlx smcleod/dots.tts-soar-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Conversion scripts
The scripts that convert upstream rednote-hilab/dots.tts-soar PyTorch weights into the MLX layout in this repo, and quantise individual components.
They are research scripts: the source snapshot path and output paths are hardcoded near the top of each file (look for /Users/.../models--rednote-hilab--dots.tts-soar/...). Edit those to your local upstream snapshot and a destination directory before running. All require mlx and run on Apple Silicon (Metal); convert_backbone_dit.py also uses mlx_lm.
Pipeline
| Script | Produces | Notes |
|---|---|---|
extract_backbone.py |
Qwen2 backbone in HF layout | Strips the llm. prefix; no lm_head (tied embeddings) |
convert_backbone_dit.py |
backbone/ (MLX, 4-bit g64) and dit/ (F32) |
Backbone via mlx_lm.convert(quantize=True, q_bits=4, q_group_size=64) |
convert_vocoder.py |
vocoder/ |
BigVGAN/AudioVAE decoder; Conv weights transposed to MLX OKI layout |
convert_speaker.py |
speaker/ |
CAM++ x-vector encoder |
convert_refpath.py |
patch_encoder/ and audiovae_encoder/ |
Reference-audio conditioning path |
convert_heads.py |
heads/ |
Coordinate / hidden / latent / xvec / EOS projection heads |
quantize_component.py |
quantised component dir | Generic per-component quantiser (below) |
Per-component quantisation
quantize_component.py <src_dir> <dst_dir> <bits> <group_size> quantises every 2D .weight whose in-features are divisible by the group size (matching MLX's Linear eligibility), writing .weight (packed), .scales, .biases, plus a config.json quantization block. Norms (1D), conv (3D) and biases are left in full precision.
The 4bit/ and 8bit/ variants in this repo were built by running it over dit/ and patch_encoder/:
python quantize_component.py dit dit-int4 4 64
python quantize_component.py dit dit-int8 8 64
python quantize_component.py patch_encoder patch_encoder-int4 4 64
python quantize_component.py patch_encoder patch_encoder-int8 8 64
The backbone is quantised by convert_backbone_dit.py (4-bit) or quantize_component.py (8-bit). Each variant subfolder is then assembled from the quantised backbone/dit/patch_encoder plus the shared F32 vocoder/speaker/audiovae_encoder/heads and the top-level config files, so it loads standalone.