Plapre Pico CoreML
CoreML conversion of syvai/plapre-pico for on-device Danish text-to-speech on iOS 18+.
Models
| Model | Description | Size |
|---|---|---|
PlaprePico.mlpackage |
LLM with stateful KV cache (fp16, ctx 512) | ~238MB |
KanadeDecoder.mlpackage |
Audio tokens + speaker โ mel spectrogram | ~365MB |
Vocoder.mlpackage |
Mel โ waveform (F0 + source gen + HiFT + iSTFT baked in) | ~87MB |
PlaprePico_int8.mlpackage |
int8 quantized LLM (comparable quality) | ~120MB |
Performance (iPhone 15 / A16, CPU Only)
| Config | Prefill | Decode | RTF |
|---|---|---|---|
| fp32, ctx 2048, naive Swift | 14 tok/s | 10 tok/s | 2.5x |
| fp16 safe RMSNorm, ctx 512, optimized | 60 tok/s | 50 tok/s | ~0.5x |
2x realtime on iPhone 15 CPU. See TRIALS.md for the optimization journey.
Architecture
- LLM: LlamaForCausalLM, 30 layers, hidden=576, 9 query / 3 KV heads (GQA), ~127M params
- Vocab: 20,802 tokens โ BPE text (0-8001) + Kanade audio codes (8002-20801)
- Audio: Kanade codec at 25 tok/s, 24kHz output
- Speakers: 5 built-in (tor, ida, liv, ask, kaj), 128-dim embeddings
Adaptations
- fp16-safe RMSNorm: pre-scale by
amaxbefore squaring to prevent fp16 overflow at layer 4+ - Custom attention: explicit matmul replacing SDPA, split-half RoPE with precomputed tables
- KV cache: one-hot broadcast mask writes + MIL pass that injects
coreml_update_stateops (torch.jit.trace doesn't emitprim::SetAttr) - Kanade: interleaved RoPE, local windowed attention, hardcoded dimensions
- Vocoder: manual STFT/iSTFT via conv1d + matmul, DSP baked into one model
Usage
cd swift-cli
swift run plapre-cli "Hej, mit navn er Daniel."
# โ output.wav
Conversion
pip install -r scripts/requirements.txt
python scripts/build.py # LLM + Kanade + Vocoder
python scripts/build.py --quantize int8 # also produce int8 LLM variant
python scripts/build.py --skip llm # only rebuild audio models
python scripts/build.py --skip audio # only rebuild LLM
Known Limitations
- Compute units:
.cpuOnlyrequired โ GPU/ANE crash with error -14 - Streaming: chunked Kanade decoding works, but no real-time audio streaming yet
License
CC-BY-4.0, following the source model license.
- Downloads last month
- 15