whisper-base — QHexRT NPU bundle (Hexagon v79 + v81)

Precompiled Whisper-base ASR for the QHexRT runtime on the Qualcomm Hexagon NPU. Two arch-pinned bundles ship as sibling dirs — pick the one matching your device:

dir	Hexagon arch	device	encoder path	device-validated
`v79/`	v79 (SM8750, Galaxy S25)	Snapdragon 8 Elite	AI-Hub in-graph conv	✅ transcript == HF
`v81/`	v81 (SM8850, soc 87)	Snapdragon 8 Elite Gen 5	forge host-conv (`encoder_features`)	✅ WER 0

A context binary won't load on another arch — the soc_model + dsp_arch are baked in. The host pipeline (log-mel, decode loop, detok) is QHexRT's own; the same qhx_asr runs both bundles (it branches on the encoder input: input_features → AI-Hub in-graph conv, encoder_features → host conv stem).

`v81/` — Hexagon v81 (SM8850), compiled with forge (QAIRT 2.47)

The encoder's two conv1d+GELU layers are HTP-hostile in-graph, so they run host-side (from whisper_conv_stem.bin) and the encoder graph starts at the post-conv features (encoder_features [1500,512]). This is the fix that made Whisper work on v81 (an in-graph conv + a converter dim-reorder had made the decode loop run zero iterations — both invisible offline). fp16 encoder + decoder.

Files (`v81/`)

file	role
`whisper-base.json`	QHexRT manifest (`asr_transcribe` plan; `conv_weights` + `n_mels` host params)
`whisperbase_enc_f16.bin`	encoder (post-conv `encoder_features` → per-layer cross-attn K/V)
`whisperbase_dec_f16.bin`	autoregressive decoder (self+cross attn, in-graph int32 embed + tied lm-head)
`whisper_conv_stem.bin`	host-side conv-stem weights `[c1w,c1b,c2w,c2b,pos]` (the host computes the conv)
`whisper_base_mel_filters.bin`	HF 80-mel filter bank (host log-mel)
`tokenizer.json`	Whisper tokenizer (vocab 51865)

Run (`v81/`)

huggingface-cli download runanywhere/whisper_base_HNPU --local-dir whisper_base_HNPU
adb push whisper_base_HNPU/v81 /data/local/tmp/wq/whisper       # QNN libs + v81 HTP skel come from the QAIRT SDK
adb shell "cd /data/local/tmp/wq && LD_LIBRARY_PATH=. ADSP_LIBRARY_PATH=. \
  ./qhx_asr whisper/whisper-base.json libQnnHtp.so libQnnSystem.so whisper whisper/<audio16k>.wav"

Measured (SM8850 / v81, soc 87, QAIRT 2.47)

WER = 0 vs HF openai/whisper-base on a clean LibriSpeech clip and a 23 s clip (decode runs the full length — the zero-iteration trap is cleared). Encoder cross-KV cosine 1.00000 vs HF (export gate).
Real-time: ≈1.4 s for a 6 s clip (27 tokens).

`v79/` — Hexagon v79 (SM8750, Galaxy S25), Qualcomm AI Hub graphs

Encoder + decoder are Qualcomm AI Hub qnn_context_binary (float/fp16) graphs; the host pipeline is QHexRT's own. Device-validated: transcription matches the HF openai/whisper-base reference exactly.

Files (`v79/`)

file	what
`whisper-base.json`	QHexRT manifest (ASR family, `asr_transcribe` plan)
`encoder.bin`	AI Hub Whisper encoder (audio mel → 12 cross-attn KV)
`decoder.bin`	AI Hub Whisper decoder (greedy step: ids+mask+self/cross-KV → logits)
`whisper_base_mel_filters.bin`	HF mel filter bank `[201,80]` f32 (host log-mel)
`tokenizer.json`	Whisper multilingual tokenizer (vocab 51865)

Run (`v79/`)

adb push whisper_base_HNPU/v79 /data/local/tmp/wq/whisper
adb shell "cd /data/local/tmp/wq && LD_LIBRARY_PATH=. ADSP_LIBRARY_PATH=. \
  ./qhx_asr whisper/whisper-base.json libQnnHtp.so libQnnSystem.so whisper whisper/<audio16k>.wav"

Audio: mono WAV (PCM16 or float32); resampled to 16 kHz host-side. Clips ≤ 30 s. No custom op-package needed. Source model: openai/whisper-base.

v81 bundle built + device-validated with QHexRT forge — recipes/whisper-base.

Downloads last month: 48

whisper-base — QHexRT NPU bundle (Hexagon v79 + v81)

v81/ — Hexagon v81 (SM8850), compiled with forge (QAIRT 2.47)

Files (v81/)

Run (v81/)