--- license: apache-2.0 tags: - coreml - tts - text-to-speech - apple - qwen3 language: - en - zh library_name: coremltools --- # Qwen3-TTS CoreML CoreML conversion of [Qwen/Qwen3-TTS](https://huggingface.co/Qwen/Qwen3-TTS) (0.6B) for on-device inference on Apple platforms. Supports English and Chinese text-to-speech synthesis. ## Models | Model | Description | Size | |-------|-------------|------| | `qwen3_tts_lm_prefill_v9` | LM KV-cache prefill (text + speaker conditioning) | ~2.8 GB | | `qwen3_tts_lm_decode_v10` | Autoregressive LM decode (CB0 codec token generation) | ~1.8 GB | | `qwen3_tts_cp_prefill` | Code predictor prefill (CB1-15 conditioning) | ~432 MB | | `qwen3_tts_cp_decode` | Code predictor decode (CB1-15 generation) | ~420 MB | | `qwen3_tts_decoder_10s` | Audio decoder (16-codebook codes → 24kHz waveform) | ~436 MB | | `speaker_embedding_official.npy` | Default speaker embedding (1024-dim) | 4 KB | **Total: ~5.9 GB** ## Pipeline ``` Text tokens + Speaker embedding ↓ LM Prefill (KV cache initialization) ↓ LM Decode (CB0 codec tokens, temperature=0.9, top_k=50) ↓ Code Predictor Prefill + Decode (CB1-15 per frame) ↓ Audio Decoder (16 codebooks → 24kHz waveform) ↓ Silence trimming → Final audio ``` ## Key Parameters - **Sample rate:** 24,000 Hz - **Codebooks:** 16 (CB0 from LM, CB1-15 from code predictor) - **Max codec tokens:** 125 frames (~10s audio) - **Sampling:** temperature=0.9, top_k=50 (both CB0 and CB1-15) - **EOS token ID:** 2150 (in codec logit space) ## Usage ```swift import FluidAudioTTS let manager = Qwen3TtsManager() try await manager.loadFromDirectory(modelDir) let wav = try await manager.synthesize( text: "Hello world", tokenIds: [9707, 1879, ...], // Pre-tokenized with Qwen3 processor useSpeaker: true ) ``` See [FluidAudio](https://github.com/FluidInference/FluidAudio) for the full Swift framework. ## Conversion Converted using [coremltools](https://github.com/apple/coremltools) from the original PyTorch weights. Conversion scripts are in the [mobius](https://github.com/FluidInference/mobius) repository. ## License **Apache-2.0**, inherited from [Qwen/Qwen3-TTS](https://huggingface.co/Qwen/Qwen3-TTS). ## References - [Qwen/Qwen3-TTS](https://huggingface.co/Qwen/Qwen3-TTS) - [FluidAudio](https://github.com/FluidInference/FluidAudio) - [mobius](https://github.com/FluidInference/mobius)