# Conversion scripts The scripts that convert upstream [rednote-hilab/dots.tts-soar](https://huggingface.co/rednote-hilab/dots.tts-soar) PyTorch weights into the MLX layout in this repo, and quantise individual components. They are research scripts: the source snapshot path and output paths are hardcoded near the top of each file (look for `/Users/.../models--rednote-hilab--dots.tts-soar/...`). Edit those to your local upstream snapshot and a destination directory before running. All require `mlx` and run on Apple Silicon (Metal); `convert_backbone_dit.py` also uses `mlx_lm`. ## Pipeline | Script | Produces | Notes | |---|---|---| | `extract_backbone.py` | Qwen2 backbone in HF layout | Strips the `llm.` prefix; no `lm_head` (tied embeddings) | | `convert_backbone_dit.py` | `backbone/` (MLX, 4-bit g64) and `dit/` (F32) | Backbone via `mlx_lm.convert(quantize=True, q_bits=4, q_group_size=64)` | | `convert_vocoder.py` | `vocoder/` | BigVGAN/AudioVAE decoder; Conv weights transposed to MLX OKI layout | | `convert_speaker.py` | `speaker/` | CAM++ x-vector encoder | | `convert_refpath.py` | `patch_encoder/` and `audiovae_encoder/` | Reference-audio conditioning path | | `convert_heads.py` | `heads/` | Coordinate / hidden / latent / xvec / EOS projection heads | | `quantize_component.py` | quantised component dir | Generic per-component quantiser (below) | ## Per-component quantisation `quantize_component.py ` quantises every 2D `.weight` whose in-features are divisible by the group size (matching MLX's `Linear` eligibility), writing `.weight` (packed), `.scales`, `.biases`, plus a `config.json` `quantization` block. Norms (1D), conv (3D) and biases are left in full precision. The `4bit/` and `8bit/` variants in this repo were built by running it over `dit/` and `patch_encoder/`: ```sh python quantize_component.py dit dit-int4 4 64 python quantize_component.py dit dit-int8 8 64 python quantize_component.py patch_encoder patch_encoder-int4 4 64 python quantize_component.py patch_encoder patch_encoder-int8 8 64 ``` The backbone is quantised by `convert_backbone_dit.py` (4-bit) or `quantize_component.py` (8-bit). Each variant subfolder is then assembled from the quantised `backbone`/`dit`/`patch_encoder` plus the shared F32 `vocoder`/`speaker`/`audiovae_encoder`/`heads` and the top-level config files, so it loads standalone.