whisper-base β QHexRT NPU bundle (Hexagon v79 + v81)
Precompiled Whisper-base ASR for the QHexRT runtime on the Qualcomm Hexagon NPU. Two arch-pinned bundles ship as sibling dirs β pick the one matching your device:
| dir | Hexagon arch | device | encoder path | device-validated |
|---|---|---|---|---|
v79/ |
v79 (SM8750, Galaxy S25) | Snapdragon 8 Elite | AI-Hub in-graph conv | β transcript == HF |
v81/ |
v81 (SM8850, soc 87) | Snapdragon 8 Elite Gen 5 | forge host-conv (encoder_features) |
β WER 0 |
A context binary won't load on another arch β the soc_model + dsp_arch are baked in. The host pipeline
(log-mel, decode loop, detok) is QHexRT's own; the same qhx_asr runs both bundles (it branches on the
encoder input: input_features β AI-Hub in-graph conv, encoder_features β host conv stem).
v81/ β Hexagon v81 (SM8850), compiled with forge (QAIRT 2.47)
The encoder's two conv1d+GELU layers are HTP-hostile in-graph, so they run host-side (from
whisper_conv_stem.bin) and the encoder graph starts at the post-conv features (encoder_features [1500,512]).
This is the fix that made Whisper work on v81 (an in-graph conv + a converter dim-reorder had made the decode
loop run zero iterations β both invisible offline). fp16 encoder + decoder.
Files (v81/)
| file | role |
|---|---|
whisper-base.json |
QHexRT manifest (asr_transcribe plan; conv_weights + n_mels host params) |
whisperbase_enc_f16.bin |
encoder (post-conv encoder_features β per-layer cross-attn K/V) |
whisperbase_dec_f16.bin |
autoregressive decoder (self+cross attn, in-graph int32 embed + tied lm-head) |
whisper_conv_stem.bin |
host-side conv-stem weights [c1w,c1b,c2w,c2b,pos] (the host computes the conv) |
whisper_base_mel_filters.bin |
HF 80-mel filter bank (host log-mel) |
tokenizer.json |
Whisper tokenizer (vocab 51865) |
Run (v81/)
huggingface-cli download runanywhere/whisper_base_HNPU --local-dir whisper_base_HNPU
adb push whisper_base_HNPU/v81 /data/local/tmp/wq/whisper # QNN libs + v81 HTP skel come from the QAIRT SDK
adb shell "cd /data/local/tmp/wq && LD_LIBRARY_PATH=. ADSP_LIBRARY_PATH=. \
./qhx_asr whisper/whisper-base.json libQnnHtp.so libQnnSystem.so whisper whisper/<audio16k>.wav"
Measured (SM8850 / v81, soc 87, QAIRT 2.47)
- WER = 0 vs HF
openai/whisper-baseon a clean LibriSpeech clip and a 23 s clip (decode runs the full length β the zero-iteration trap is cleared). Encoder cross-KV cosine 1.00000 vs HF (export gate). - Real-time: β1.4 s for a 6 s clip (27 tokens).
v79/ β Hexagon v79 (SM8750, Galaxy S25), Qualcomm AI Hub graphs
Encoder + decoder are Qualcomm AI Hub qnn_context_binary (float/fp16) graphs; the host pipeline is QHexRT's
own. Device-validated: transcription matches the HF openai/whisper-base reference exactly.
Files (v79/)
| file | what |
|---|---|
whisper-base.json |
QHexRT manifest (ASR family, asr_transcribe plan) |
encoder.bin |
AI Hub Whisper encoder (audio mel β 12 cross-attn KV) |
decoder.bin |
AI Hub Whisper decoder (greedy step: ids+mask+self/cross-KV β logits) |
whisper_base_mel_filters.bin |
HF mel filter bank [201,80] f32 (host log-mel) |
tokenizer.json |
Whisper multilingual tokenizer (vocab 51865) |
Run (v79/)
adb push whisper_base_HNPU/v79 /data/local/tmp/wq/whisper
adb shell "cd /data/local/tmp/wq && LD_LIBRARY_PATH=. ADSP_LIBRARY_PATH=. \
./qhx_asr whisper/whisper-base.json libQnnHtp.so libQnnSystem.so whisper whisper/<audio16k>.wav"
Audio: mono WAV (PCM16 or float32); resampled to 16 kHz host-side. Clips β€ 30 s. No custom op-package needed.
Source model: openai/whisper-base.
v81 bundle built + device-validated with QHexRT forge β recipes/whisper-base.
- Downloads last month
- 48