Moonshine-base โ Qualcomm Hexagon NPU (QHexRT)
Prebuilt QNN context binaries for running UsefulSensors/moonshine-base
speech-to-text on-device on the Qualcomm Hexagon NPU via the QHexRT
runtime. Arch: v81 (SM8850 / Snapdragon 8 Elite Gen 5). Device-validated at WER = 0 vs HF.
A context binary is arch-pinned (the dsp_arch + soc_model are baked in) โ these v81/ bins won't load
on another Hexagon arch. Other arches are sibling <arch>/ dirs (re-converted), added to this same repo.
How it runs
Moonshine is an encoderโdecoder ASR model on the raw 16 kHz waveform (no mel spectrogram). The pipeline:
- Host runs the 3-conv raw-audio stem (
conv1 k127/s64 + tanh โ GroupNorm โ conv2 k7/s3 + gelu โ conv3 k3/s2 + gelu) โ the conv is HTP-hostile so it stays on the CPU; the encoder graph starts at the post-conv features. - NPU encoder (bidirectional, partial interleaved RoPE) โ per-decoder-layer cross-attention K/V.
- NPU decoder (one autoregressive step: causal self-attn + RoPE in-graph, cross-attn to the cached encoder states, gated SwiGLU; tied lm-head) โ tokens, detokenized on the host.
Variable-length audio is handled on a fixed graph (n_audio = 415, ~10 s window): the host pads/truncates
the features and masks the padding (encoder_mask / cross_mask). Precision fp16 (encoder + decoder).
Files (v81/)
| file | role | size |
|---|---|---|
moonshine-base.json |
QHexRT manifest (declarative run plan; moonshine_transcribe host-op) |
~1 KB |
moonshinebase_enc_f16.bin |
encoder context binary | 44 MB |
moonshinebase_dec_f16.bin |
decoder context binary | 106 MB |
moonshine_conv_stem.bin |
host raw-audio conv-stem weights [c1w,c2w,c2b,c3w,c3b,gnw,gnb] |
14 MB |
tokenizer.json |
SentencePiece-style BPE (byte_fallback; metaspace detok) | 3.8 MB |
The QNN runtime libs (libQnnHtp.so / libQnnSystem.so + the v81 HTP skel) come from the QAIRT SDK, not
this repo. The qhx_asr tool comes from a QHexRT build.
Run
hf download runanywhere/moonshine_base_HNPU --local-dir moonshine_base_HNPU
# Windows: adb push from PowerShell with native paths.
adb push moonshine_base_HNPU/v81 /data/local/tmp/wq/moonshine
adb push my_audio_16k_mono.wav /data/local/tmp/wq/moonshine/
adb shell "cd /data/local/tmp/wq && export ADSP_LIBRARY_PATH='/data/local/tmp/wq/dsp;/data/local/tmp/wq;/vendor/dsp/cdsp'; \
LD_LIBRARY_PATH=. ./qhx_asr moonshine/moonshine-base.json libQnnHtp.so libQnnSystem.so moonshine moonshine/my_audio_16k_mono.wav"
Tool arg order is invariant: qhx_asr <manifest> libQnnHtp.so libQnnSystem.so <artifacts_root> <audio16k.wav>.
Input audio must be 16 kHz mono (PCM16 or float32).
Measured (v81, SM8850, soc_model 87, QAIRT 2.47)
| metric | value |
|---|---|
| Parity | WER = 0.0000 vs HF UsefulSensors/moonshine-base (LibriSpeech sample) |
| Latency | 26 tokens in 1101 ms for a 5.9 s clip |
| Precision | fp16 encoder + decoder |
Example: a 5.9 s clip โ "Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel." (matches HF exactly).
Caveats
- v81 only here (arch-pinned). fp16 weights. Audio window ~10 s (
n_audio = 415); longer clips are truncated to the window. - Parity is greedy (temperature 0) vs the HF reference. WER measured on a standard LibriSpeech sample.
Converted + device-validated with the QHexRT forge pipeline
(recipes/moonshine-base).
- Downloads last month
- 26
Model tree for runanywhere/moonshine_base_HNPU
Base model
UsefulSensors/moonshine-base