GGUF + pure-C++ runtime in CrispASR — Moonshine streaming

#2
by cstr - opened

We've added the streaming Moonshine variants to CrispASR as the moonshine-streaming backend (separate from the offline moonshine backend because the encoder topology is different — sliding-window + raw-waveform frontend).

src/moonshine_streaming.cpp — same approach as the offline Moonshine impl: ggml graph for the sliding-window encoder, KV-cached autoregressive decoder. Companion tokenizer.bin auto-fetched.

This gives us a true low-latency streaming path in CrispASR (paired with --mic / --live and our standard VAD/diarisation post-step):

./build/bin/crispasr --backend moonshine-streaming \
    -m moonshine-streaming-tiny-q4_k.gguf --mic

Pre-quantised GGUFs (MIT): cstr/moonshine-streaming-tiny-GGUF. Sibling sizes: -small (110M), -medium (245M).

(Offline Moonshine repos: tiny, base, plus ja/ko/zh/ar/vi/uk variants.)

Sign up or log in to comment