GGUF + pure-C++ runtime in CrispASR — Moonshine streaming

by cstr - opened May 1

May 1

We've added the streaming Moonshine variants to CrispASR as the moonshine-streaming backend (separate from the offline moonshine backend because the encoder topology is different — sliding-window + raw-waveform frontend).

src/moonshine_streaming.cpp — same approach as the offline Moonshine impl: ggml graph for the sliding-window encoder, KV-cached autoregressive decoder. Companion tokenizer.bin auto-fetched.

This gives us a true low-latency streaming path in CrispASR (paired with --mic / --live and our standard VAD/diarisation post-step):

./build/bin/crispasr --backend moonshine-streaming \
    -m moonshine-streaming-tiny-q4_k.gguf --mic

Pre-quantised GGUFs (MIT): cstr/moonshine-streaming-tiny-GGUF. Sibling sizes: -small (~~110M), -medium (~~245M).

(Offline Moonshine repos: tiny, base, plus ja/ko/zh/ar/vi/uk variants.)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment