Plapre Pico CoreML

CoreML conversion of syvai/plapre-pico for on-device Danish text-to-speech on iOS 18+.

Models

Model Description Size
PlaprePico.mlpackage LLM with stateful KV cache (fp16, ctx 512) ~238MB
KanadeDecoder.mlpackage Audio tokens + speaker โ†’ mel spectrogram ~365MB
Vocoder.mlpackage Mel โ†’ waveform (F0 + source gen + HiFT + iSTFT baked in) ~87MB
PlaprePico_int8.mlpackage int8 quantized LLM (comparable quality) ~120MB

Performance (iPhone 15 / A16, CPU Only)

Config Prefill Decode RTF
fp32, ctx 2048, naive Swift 14 tok/s 10 tok/s 2.5x
fp16 safe RMSNorm, ctx 512, optimized 60 tok/s 50 tok/s ~0.5x

2x realtime on iPhone 15 CPU. See TRIALS.md for the optimization journey.

Architecture

  • LLM: LlamaForCausalLM, 30 layers, hidden=576, 9 query / 3 KV heads (GQA), ~127M params
  • Vocab: 20,802 tokens โ€” BPE text (0-8001) + Kanade audio codes (8002-20801)
  • Audio: Kanade codec at 25 tok/s, 24kHz output
  • Speakers: 5 built-in (tor, ida, liv, ask, kaj), 128-dim embeddings

Adaptations

  • fp16-safe RMSNorm: pre-scale by amax before squaring to prevent fp16 overflow at layer 4+
  • Custom attention: explicit matmul replacing SDPA, split-half RoPE with precomputed tables
  • KV cache: one-hot broadcast mask writes + MIL pass that injects coreml_update_state ops (torch.jit.trace doesn't emit prim::SetAttr)
  • Kanade: interleaved RoPE, local windowed attention, hardcoded dimensions
  • Vocoder: manual STFT/iSTFT via conv1d + matmul, DSP baked into one model

Usage

cd swift-cli
swift run plapre-cli "Hej, mit navn er Daniel."
# โ†’ output.wav

Conversion

pip install -r scripts/requirements.txt
python scripts/build.py                  # LLM + Kanade + Vocoder
python scripts/build.py --quantize int8  # also produce int8 LLM variant
python scripts/build.py --skip llm       # only rebuild audio models
python scripts/build.py --skip audio     # only rebuild LLM

Known Limitations

  • Compute units: .cpuOnly required โ€” GPU/ANE crash with error -14
  • Streaming: chunked Kanade decoding works, but no real-time audio streaming yet

License

CC-BY-4.0, following the source model license.

Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for 42futures/plapre-pico-coreml

Quantized
(1)
this model