qaihm-bot commited on
Commit
579a041
·
verified ·
1 Parent(s): 39f8879

See https://github.com/qualcomm/ai-hub-models/releases/v0.50.0 for changelog.

Files changed (2) hide show
  1. README.md +65 -65
  2. release_assets.json +1 -0
README.md CHANGED
@@ -15,7 +15,7 @@ pipeline_tag: automatic-speech-recognition
15
  We have applied w8a16 quantization to significantly enhance performance and efficiency. HuggingFace Whisper-Small ASR (Automatic Speech Recognition) model is a state-of-the-art system designed for transcribing spoken language into written text. This model is based on the transformer architecture and has been optimized for edge inference by replacing Multi-Head Attention (MHA) with Single-Head Attention (SHA) and linear layers with convolutional (conv) layers. It exhibits robust performance in realistic, noisy environments, making it highly reliable for real-world applications. Specifically, it excels in long-form transcription, capable of accurately transcribing audio clips up to 30 seconds long. Time to the first token is the encoder's latency, while time to each additional token is decoder's latency, where we assume a max decoded length specified below.
16
 
17
  This is based on the implementation of Whisper-Small-Quantized found [here](https://github.com/huggingface/transformers/tree/v4.42.3/src/transformers/models/whisper).
18
- This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the [Qualcomm® AI Hub Models](https://github.com/qualcomm/ai-hub-models/tree/v0.49.1/qai_hub_models/models/whisper_small_quantized) library to export with custom configurations. More details on model performance across various devices, can be found [here](#performance-summary).
19
 
20
  Qualcomm AI Hub Models uses [Qualcomm AI Hub Workbench](https://workbench.aihub.qualcomm.com) to compile, profile, and evaluate this model. [Sign up](https://myaccount.qualcomm.com/signup) to run these models on a hosted Qualcomm® device.
21
 
@@ -28,40 +28,40 @@ Below are pre-exported model assets ready for deployment.
28
 
29
  | Runtime | Precision | Chipset | SDK Versions | Download |
30
  |---|---|---|---|---|
31
- | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_8_elite_gen5.zip)
32
- | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X2 Elite | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_x2_elite.zip)
33
- | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X Elite | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_x_elite.zip)
34
- | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Gen 3 Mobile | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_8gen3.zip)
35
- | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS8550 (Proxy) | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_qcs8550_proxy.zip)
36
- | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_8_elite_for_galaxy.zip)
37
- | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 7 Gen 4 Mobile | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_7gen4.zip)
38
- | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCM6690 | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_qcm6690.zip)
39
- | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS9075 | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_qcs9075.zip)
40
- | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_8_elite_gen5.zip)
41
- | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X2 Elite | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_x2_elite.zip)
42
- | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X Elite | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_x_elite.zip)
43
- | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Gen 3 Mobile | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_8gen3.zip)
44
- | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8550 (Proxy) | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_qcs8550_proxy.zip)
45
- | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA8775P | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_sa8775p.zip)
46
- | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_8_elite_for_galaxy.zip)
47
- | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 7 Gen 4 Mobile | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_7gen4.zip)
48
- | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA7255P | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_sa7255p.zip)
49
- | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCM6690 | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_qcm6690.zip)
50
- | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS9075 | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.49.1/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_qcs9075.zip)
51
 
52
  For more device-specific assets and performance metrics, visit **[Whisper-Small-Quantized on Qualcomm® AI Hub](https://aihub.qualcomm.com/models/whisper_small_quantized)**.
53
 
54
 
55
  ### Option 2: Export with Custom Configurations
56
 
57
- Use the [Qualcomm® AI Hub Models](https://github.com/qualcomm/ai-hub-models/tree/v0.49.1/qai_hub_models/models/whisper_small_quantized) Python library to compile and export the model with your own:
58
  - Custom weights (e.g., fine-tuned checkpoints)
59
  - Custom input shapes
60
  - Target device and runtime configurations
61
 
62
  This option is ideal if you need to customize the model beyond the default configuration provided here.
63
 
64
- See our repository for [Whisper-Small-Quantized on GitHub](https://github.com/qualcomm/ai-hub-models/tree/v0.49.1/qai_hub_models/models/whisper_small_quantized) for usage instructions.
65
 
66
  ## Model Details
67
 
@@ -75,48 +75,48 @@ See our repository for [Whisper-Small-Quantized on GitHub](https://github.com/qu
75
  ## Performance Summary
76
  | Model | Runtime | Precision | Chipset | Inference Time (ms) | Peak Memory Range (MB) | Primary Compute Unit
77
  |---|---|---|---|---|---|---
78
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 4.017 ms | 36 - 46 MB | NPU
79
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X2 Elite | 3.825 ms | 185 - 185 MB | NPU
80
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X Elite | 7.953 ms | 185 - 185 MB | NPU
81
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Gen 3 Mobile | 6.47 ms | 39 - 48 MB | NPU
82
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS8550 (Proxy) | 8.304 ms | 27 - 29 MB | NPU
83
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS9075 | 9.145 ms | 24 - 57 MB | NPU
84
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCM6690 | 32.401 ms | 28 - 37 MB | NPU
85
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 4.776 ms | 25 - 37 MB | NPU
86
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 7 Gen 4 Mobile | 10.88 ms | 30 - 36 MB | NPU
87
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 3.967 ms | 30 - 41 MB | NPU
88
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X2 Elite | 4.249 ms | 30 - 30 MB | NPU
89
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X Elite | 7.557 ms | 30 - 30 MB | NPU
90
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Gen 3 Mobile | 6.203 ms | 19 - 27 MB | NPU
91
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8275 (Proxy) | 13.659 ms | 12 - 19 MB | NPU
92
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8550 (Proxy) | 8.187 ms | 30 - 32 MB | NPU
93
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA8775P | 27.56 ms | 19 - 26 MB | NPU
94
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS9075 | 8.936 ms | 25 - 60 MB | NPU
95
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCM6690 | 30.781 ms | 30 - 37 MB | NPU
96
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA7255P | 13.659 ms | 12 - 19 MB | NPU
97
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 4.793 ms | 28 - 41 MB | NPU
98
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 7 Gen 4 Mobile | 10.907 ms | 29 - 36 MB | NPU
99
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 174.548 ms | 63 - 73 MB | NPU
100
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X2 Elite | 155.383 ms | 127 - 127 MB | NPU
101
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X Elite | 267.039 ms | 127 - 127 MB | NPU
102
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Gen 3 Mobile | 240.522 ms | 63 - 74 MB | NPU
103
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS8550 (Proxy) | 386.76 ms | 55 - 58 MB | NPU
104
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS9075 | 255.083 ms | 63 - 66 MB | NPU
105
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCM6690 | 4338.57 ms | 1 - 12 MB | NPU
106
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 200.696 ms | 64 - 75 MB | NPU
107
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 7 Gen 4 Mobile | 457.646 ms | 56 - 62 MB | NPU
108
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 173.204 ms | 1 - 11 MB | NPU
109
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X2 Elite | 154.118 ms | 0 - 0 MB | NPU
110
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X Elite | 295.657 ms | 0 - 0 MB | NPU
111
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Gen 3 Mobile | 267.017 ms | 3 - 10 MB | NPU
112
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8275 (Proxy) | 515.755 ms | 1 - 9 MB | NPU
113
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8550 (Proxy) | 366.478 ms | 1 - 2 MB | NPU
114
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA8775P | 310.162 ms | 0 - 9 MB | NPU
115
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS9075 | 292.347 ms | 0 - 29 MB | NPU
116
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCM6690 | 4026.592 ms | 0 - 7 MB | NPU
117
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA7255P | 515.755 ms | 1 - 9 MB | NPU
118
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 223.174 ms | 1 - 10 MB | NPU
119
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 7 Gen 4 Mobile | 471.739 ms | 1 - 7 MB | NPU
120
 
121
  ## License
122
  * The license for the original implementation of Whisper-Small-Quantized can be found
 
15
  We have applied w8a16 quantization to significantly enhance performance and efficiency. HuggingFace Whisper-Small ASR (Automatic Speech Recognition) model is a state-of-the-art system designed for transcribing spoken language into written text. This model is based on the transformer architecture and has been optimized for edge inference by replacing Multi-Head Attention (MHA) with Single-Head Attention (SHA) and linear layers with convolutional (conv) layers. It exhibits robust performance in realistic, noisy environments, making it highly reliable for real-world applications. Specifically, it excels in long-form transcription, capable of accurately transcribing audio clips up to 30 seconds long. Time to the first token is the encoder's latency, while time to each additional token is decoder's latency, where we assume a max decoded length specified below.
16
 
17
  This is based on the implementation of Whisper-Small-Quantized found [here](https://github.com/huggingface/transformers/tree/v4.42.3/src/transformers/models/whisper).
18
+ This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the [Qualcomm® AI Hub Models](https://github.com/qualcomm/ai-hub-models/blob/main/qai_hub_models/models/whisper_small_quantized) library to export with custom configurations. More details on model performance across various devices, can be found [here](#performance-summary).
19
 
20
  Qualcomm AI Hub Models uses [Qualcomm AI Hub Workbench](https://workbench.aihub.qualcomm.com) to compile, profile, and evaluate this model. [Sign up](https://myaccount.qualcomm.com/signup) to run these models on a hosted Qualcomm® device.
21
 
 
28
 
29
  | Runtime | Precision | Chipset | SDK Versions | Download |
30
  |---|---|---|---|---|
31
+ | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_8_elite_gen5.zip)
32
+ | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X2 Elite | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_x2_elite.zip)
33
+ | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X Elite | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_x_elite.zip)
34
+ | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Gen 3 Mobile | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_8gen3.zip)
35
+ | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS8550 (Proxy) | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_qcs8550_proxy.zip)
36
+ | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_8_elite_for_galaxy.zip)
37
+ | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 7 Gen 4 Mobile | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_7gen4.zip)
38
+ | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCM6690 | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_qcm6690.zip)
39
+ | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS9075 | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_qcs9075.zip)
40
+ | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_8_elite_gen5.zip)
41
+ | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X2 Elite | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_x2_elite.zip)
42
+ | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X Elite | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_x_elite.zip)
43
+ | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Gen 3 Mobile | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_8gen3.zip)
44
+ | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8550 (Proxy) | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_qcs8550_proxy.zip)
45
+ | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA8775P | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_sa8775p.zip)
46
+ | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_8_elite_for_galaxy.zip)
47
+ | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 7 Gen 4 Mobile | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_7gen4.zip)
48
+ | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA7255P | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_sa7255p.zip)
49
+ | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCM6690 | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_qcm6690.zip)
50
+ | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS9075 | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_qcs9075.zip)
51
 
52
  For more device-specific assets and performance metrics, visit **[Whisper-Small-Quantized on Qualcomm® AI Hub](https://aihub.qualcomm.com/models/whisper_small_quantized)**.
53
 
54
 
55
  ### Option 2: Export with Custom Configurations
56
 
57
+ Use the [Qualcomm® AI Hub Models](https://github.com/qualcomm/ai-hub-models/blob/main/qai_hub_models/models/whisper_small_quantized) Python library to compile and export the model with your own:
58
  - Custom weights (e.g., fine-tuned checkpoints)
59
  - Custom input shapes
60
  - Target device and runtime configurations
61
 
62
  This option is ideal if you need to customize the model beyond the default configuration provided here.
63
 
64
+ See our repository for [Whisper-Small-Quantized on GitHub](https://github.com/qualcomm/ai-hub-models/blob/main/qai_hub_models/models/whisper_small_quantized) for usage instructions.
65
 
66
  ## Model Details
67
 
 
75
  ## Performance Summary
76
  | Model | Runtime | Precision | Chipset | Inference Time (ms) | Peak Memory Range (MB) | Primary Compute Unit
77
  |---|---|---|---|---|---|---
78
+ | decoder | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 4.017 ms | 36 - 46 MB | NPU
79
+ | decoder | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X2 Elite | 3.825 ms | 185 - 185 MB | NPU
80
+ | decoder | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X Elite | 7.953 ms | 185 - 185 MB | NPU
81
+ | decoder | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Gen 3 Mobile | 6.47 ms | 39 - 48 MB | NPU
82
+ | decoder | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS8550 (Proxy) | 8.304 ms | 27 - 29 MB | NPU
83
+ | decoder | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS9075 | 9.145 ms | 24 - 57 MB | NPU
84
+ | decoder | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCM6690 | 32.401 ms | 28 - 37 MB | NPU
85
+ | decoder | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 4.776 ms | 25 - 37 MB | NPU
86
+ | decoder | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 7 Gen 4 Mobile | 10.88 ms | 30 - 36 MB | NPU
87
+ | decoder | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 3.967 ms | 30 - 41 MB | NPU
88
+ | decoder | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X2 Elite | 4.249 ms | 30 - 30 MB | NPU
89
+ | decoder | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X Elite | 7.557 ms | 30 - 30 MB | NPU
90
+ | decoder | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Gen 3 Mobile | 6.203 ms | 19 - 27 MB | NPU
91
+ | decoder | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8275 (Proxy) | 13.659 ms | 12 - 19 MB | NPU
92
+ | decoder | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8550 (Proxy) | 8.187 ms | 30 - 32 MB | NPU
93
+ | decoder | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA8775P | 27.56 ms | 19 - 26 MB | NPU
94
+ | decoder | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS9075 | 8.936 ms | 25 - 60 MB | NPU
95
+ | decoder | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCM6690 | 30.781 ms | 30 - 37 MB | NPU
96
+ | decoder | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA7255P | 13.659 ms | 12 - 19 MB | NPU
97
+ | decoder | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 4.793 ms | 28 - 41 MB | NPU
98
+ | decoder | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 7 Gen 4 Mobile | 10.907 ms | 29 - 36 MB | NPU
99
+ | encoder | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 174.548 ms | 63 - 73 MB | NPU
100
+ | encoder | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X2 Elite | 155.383 ms | 127 - 127 MB | NPU
101
+ | encoder | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X Elite | 267.039 ms | 127 - 127 MB | NPU
102
+ | encoder | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Gen 3 Mobile | 240.522 ms | 63 - 74 MB | NPU
103
+ | encoder | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS8550 (Proxy) | 386.76 ms | 55 - 58 MB | NPU
104
+ | encoder | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS9075 | 255.083 ms | 63 - 66 MB | NPU
105
+ | encoder | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCM6690 | 4338.57 ms | 1 - 12 MB | NPU
106
+ | encoder | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 200.696 ms | 64 - 75 MB | NPU
107
+ | encoder | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 7 Gen 4 Mobile | 457.646 ms | 56 - 62 MB | NPU
108
+ | encoder | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 173.204 ms | 1 - 11 MB | NPU
109
+ | encoder | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X2 Elite | 154.118 ms | 0 - 0 MB | NPU
110
+ | encoder | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X Elite | 295.657 ms | 0 - 0 MB | NPU
111
+ | encoder | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Gen 3 Mobile | 267.017 ms | 3 - 10 MB | NPU
112
+ | encoder | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8275 (Proxy) | 515.755 ms | 1 - 9 MB | NPU
113
+ | encoder | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8550 (Proxy) | 366.478 ms | 1 - 2 MB | NPU
114
+ | encoder | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA8775P | 310.162 ms | 0 - 9 MB | NPU
115
+ | encoder | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS9075 | 292.347 ms | 0 - 29 MB | NPU
116
+ | encoder | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCM6690 | 4026.592 ms | 0 - 7 MB | NPU
117
+ | encoder | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA7255P | 515.755 ms | 1 - 9 MB | NPU
118
+ | encoder | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 223.174 ms | 1 - 10 MB | NPU
119
+ | encoder | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 7 Gen 4 Mobile | 471.739 ms | 1 - 7 MB | NPU
120
 
121
  ## License
122
  * The license for the original implementation of Whisper-Small-Quantized can be found
release_assets.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"version":"0.50.0","precisions":{"w8a16":{"chipset_assets":{"qualcomm-snapdragon-8gen3":{"qnn_context_binary":{"tool_versions":{"qairt":"2.43.0.260127150333_193827"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_8gen3.zip"},"precompiled_qnn_onnx":{"tool_versions":{"qairt":"2.42.0.251225135753_193295","onnx_runtime":"1.24.1"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_8gen3.zip"}},"qualcomm-snapdragon-8-elite-for-galaxy":{"qnn_context_binary":{"tool_versions":{"qairt":"2.43.0.260127150333_193827"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_8_elite_for_galaxy.zip"},"precompiled_qnn_onnx":{"tool_versions":{"qairt":"2.42.0.251225135753_193295","onnx_runtime":"1.24.1"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_8_elite_for_galaxy.zip"}},"qualcomm-snapdragon-7gen4":{"qnn_context_binary":{"tool_versions":{"qairt":"2.43.0.260127150333_193827"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_7gen4.zip"},"precompiled_qnn_onnx":{"tool_versions":{"qairt":"2.42.0.251225135753_193295","onnx_runtime":"1.24.1"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_7gen4.zip"}},"qualcomm-snapdragon-8-elite-gen5":{"qnn_context_binary":{"tool_versions":{"qairt":"2.43.0.260127150333_193827"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_8_elite_gen5.zip"},"precompiled_qnn_onnx":{"tool_versions":{"qairt":"2.42.0.251225135753_193295","onnx_runtime":"1.24.1"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_8_elite_gen5.zip"}},"qualcomm-snapdragon-x-elite":{"qnn_context_binary":{"tool_versions":{"qairt":"2.43.0.260127150333_193827"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_x_elite.zip"},"precompiled_qnn_onnx":{"tool_versions":{"qairt":"2.42.0.251225135753_193295","onnx_runtime":"1.24.1"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_x_elite.zip"}},"qualcomm-snapdragon-x2-elite":{"qnn_context_binary":{"tool_versions":{"qairt":"2.43.0.260127150333_193827"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_x2_elite.zip"},"precompiled_qnn_onnx":{"tool_versions":{"qairt":"2.42.0.251225135753_193295","onnx_runtime":"1.24.1"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_x2_elite.zip"}},"qualcomm-sa7255p":{"qnn_context_binary":{"tool_versions":{"qairt":"2.43.0.260127150333_193827"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_sa7255p.zip"}},"qualcomm-sa8775p":{"qnn_context_binary":{"tool_versions":{"qairt":"2.43.0.260127150333_193827"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_sa8775p.zip"}},"qualcomm-qcm6690":{"qnn_context_binary":{"tool_versions":{"qairt":"2.43.0.260127150333_193827"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_qcm6690.zip"},"precompiled_qnn_onnx":{"tool_versions":{"qairt":"2.42.0.251225135753_193295","onnx_runtime":"1.24.1"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_qcm6690.zip"}},"qualcomm-qcs8550-proxy":{"qnn_context_binary":{"tool_versions":{"qairt":"2.43.0.260127150333_193827"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_qcs8550_proxy.zip"},"precompiled_qnn_onnx":{"tool_versions":{"qairt":"2.42.0.251225135753_193295","onnx_runtime":"1.24.1"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_qcs8550_proxy.zip"}},"qualcomm-qcs9075":{"qnn_context_binary":{"tool_versions":{"qairt":"2.43.0.260127150333_193827"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_qcs9075.zip"},"precompiled_qnn_onnx":{"tool_versions":{"qairt":"2.42.0.251225135753_193295","onnx_runtime":"1.24.1"},"download_url":"https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.50.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_qcs9075.zip"}}}}}}