GGUF + pure-C++ runtime in CrispASR — Moonshine streaming
#2
by cstr - opened
We've added the streaming Moonshine variants to CrispASR as the moonshine-streaming backend (separate from the offline moonshine backend because the encoder topology is different — sliding-window + raw-waveform frontend).
src/moonshine_streaming.cpp — same approach as the offline Moonshine impl: ggml graph for the sliding-window encoder, KV-cached autoregressive decoder. Companion tokenizer.bin auto-fetched.
This gives us a true low-latency streaming path in CrispASR (paired with --mic / --live and our standard VAD/diarisation post-step):
./build/bin/crispasr --backend moonshine-streaming \
-m moonshine-streaming-tiny-q4_k.gguf --mic
Pre-quantised GGUFs (MIT): cstr/moonshine-streaming-tiny-GGUF. Sibling sizes: -small (110M), -medium (245M).
(Offline Moonshine repos: tiny, base, plus ja/ko/zh/ar/vi/uk variants.)