qaihm-bot commited on
Commit
7a9eaac
·
verified ·
1 Parent(s): 871e7da

See https://github.com/qualcomm/ai-hub-models/releases/v0.48.0 for changelog.

Files changed (1) hide show
  1. README.md +65 -61
README.md CHANGED
@@ -15,7 +15,7 @@ pipeline_tag: automatic-speech-recognition
15
  We have applied w8a16 quantization to significantly enhance performance and efficiency. HuggingFace Whisper-Small ASR (Automatic Speech Recognition) model is a state-of-the-art system designed for transcribing spoken language into written text. This model is based on the transformer architecture and has been optimized for edge inference by replacing Multi-Head Attention (MHA) with Single-Head Attention (SHA) and linear layers with convolutional (conv) layers. It exhibits robust performance in realistic, noisy environments, making it highly reliable for real-world applications. Specifically, it excels in long-form transcription, capable of accurately transcribing audio clips up to 30 seconds long. Time to the first token is the encoder's latency, while time to each additional token is decoder's latency, where we assume a max decoded length specified below.
16
 
17
  This is based on the implementation of Whisper-Small-Quantized found [here](https://github.com/huggingface/transformers/tree/v4.42.3/src/transformers/models/whisper).
18
- This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the [Qualcomm® AI Hub Models](https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/whisper_small_quantized) library to export with custom configurations. More details on model performance across various devices, can be found [here](#performance-summary).
19
 
20
  Qualcomm AI Hub Models uses [Qualcomm AI Hub Workbench](https://workbench.aihub.qualcomm.com) to compile, profile, and evaluate this model. [Sign up](https://myaccount.qualcomm.com/signup) to run these models on a hosted Qualcomm® device.
21
 
@@ -28,36 +28,40 @@ Below are pre-exported model assets ready for deployment.
28
 
29
  | Runtime | Precision | Chipset | SDK Versions | Download |
30
  |---|---|---|---|---|
31
- | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X Elite | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.47.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_x_elite.zip)
32
- | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Gen 3 Mobile | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.47.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_8gen3.zip)
33
- | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS8550 (Proxy) | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.47.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_qcs8550_proxy.zip)
34
- | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X2 Elite | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.47.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_x2_elite.zip)
35
- | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCM6690 | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.47.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_qcm6690.zip)
36
- | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS9075 | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.47.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_qcs9075.zip)
37
- | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.47.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_8_elite_for_galaxy.zip)
38
- | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 7 Gen 4 Mobile | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.47.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_7gen4.zip)
39
- | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.47.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_8_elite_gen5.zip)
40
- | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8275 (Proxy) | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.47.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_qcs8275_proxy.zip)
41
- | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8550 (Proxy) | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.47.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_qcs8550_proxy.zip)
42
- | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA8775P | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.47.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_sa8775p.zip)
43
- | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X2 Elite | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.47.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_x2_elite.zip)
44
- | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA7255P | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.47.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_sa7255p.zip)
45
- | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCM6690 | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.47.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_qcm6690.zip)
46
- | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS9075 | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.47.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_qcs9075.zip)
 
 
 
 
47
 
48
  For more device-specific assets and performance metrics, visit **[Whisper-Small-Quantized on Qualcomm® AI Hub](https://aihub.qualcomm.com/models/whisper_small_quantized)**.
49
 
50
 
51
  ### Option 2: Export with Custom Configurations
52
 
53
- Use the [Qualcomm® AI Hub Models](https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/whisper_small_quantized) Python library to compile and export the model with your own:
54
  - Custom weights (e.g., fine-tuned checkpoints)
55
  - Custom input shapes
56
  - Target device and runtime configurations
57
 
58
  This option is ideal if you need to customize the model beyond the default configuration provided here.
59
 
60
- See our repository for [Whisper-Small-Quantized on GitHub](https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/whisper_small_quantized) for usage instructions.
61
 
62
  ## Model Details
63
 
@@ -71,48 +75,48 @@ See our repository for [Whisper-Small-Quantized on GitHub](https://github.com/qu
71
  ## Performance Summary
72
  | Model | Runtime | Precision | Chipset | Inference Time (ms) | Peak Memory Range (MB) | Primary Compute Unit
73
  |---|---|---|---|---|---|---
74
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X Elite | 7.991 ms | 185 - 185 MB | NPU
75
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Gen 3 Mobile | 6.451 ms | 38 - 50 MB | NPU
76
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS8550 (Proxy) | 8.335 ms | 28 - 29 MB | NPU
77
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS9075 | 9.124 ms | 24 - 57 MB | NPU
78
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCM6690 | 35.618 ms | 28 - 38 MB | NPU
79
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 4.767 ms | 25 - 37 MB | NPU
80
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 7 Gen 4 Mobile | 10.888 ms | 28 - 35 MB | NPU
81
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 3.998 ms | 30 - 40 MB | NPU
82
- | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X2 Elite | 3.718 ms | 186 - 186 MB | NPU
83
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X Elite | 7.662 ms | 30 - 30 MB | NPU
84
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Gen 3 Mobile | 6.366 ms | 30 - 39 MB | NPU
85
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8275 (Proxy) | 13.482 ms | 29 - 37 MB | NPU
86
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8550 (Proxy) | 8.23 ms | 30 - 33 MB | NPU
87
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA8775P | 9.233 ms | 19 - 27 MB | NPU
88
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS9075 | 8.915 ms | 25 - 60 MB | NPU
89
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCM6690 | 31.639 ms | 30 - 37 MB | NPU
90
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA7255P | 13.482 ms | 29 - 37 MB | NPU
91
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 4.74 ms | 21 - 35 MB | NPU
92
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 7 Gen 4 Mobile | 10.815 ms | 30 - 37 MB | NPU
93
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 3.972 ms | 30 - 40 MB | NPU
94
- | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X2 Elite | 4.089 ms | 30 - 30 MB | NPU
95
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X Elite | 264.922 ms | 127 - 127 MB | NPU
96
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Gen 3 Mobile | 240.754 ms | 64 - 76 MB | NPU
97
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS8550 (Proxy) | 334.502 ms | 0 - 129 MB | NPU
98
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS9075 | 254.756 ms | 63 - 66 MB | NPU
99
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCM6690 | 4091.143 ms | 2 - 12 MB | NPU
100
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 198.385 ms | 63 - 73 MB | NPU
101
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 7 Gen 4 Mobile | 457.801 ms | 56 - 63 MB | NPU
102
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 175.574 ms | 63 - 73 MB | NPU
103
- | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X2 Elite | 154.538 ms | 127 - 127 MB | NPU
104
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X Elite | 294.934 ms | 0 - 0 MB | NPU
105
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Gen 3 Mobile | 268.023 ms | 1 - 8 MB | NPU
106
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8275 (Proxy) | 517.1 ms | 1 - 9 MB | NPU
107
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8550 (Proxy) | 358.638 ms | 1 - 2 MB | NPU
108
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA8775P | 310.466 ms | 0 - 8 MB | NPU
109
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS9075 | 291.599 ms | 0 - 29 MB | NPU
110
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCM6690 | 4113.051 ms | 0 - 6 MB | NPU
111
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA7255P | 517.1 ms | 1 - 9 MB | NPU
112
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 224.015 ms | 1 - 10 MB | NPU
113
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 7 Gen 4 Mobile | 467.841 ms | 1 - 7 MB | NPU
114
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 187.818 ms | 1 - 10 MB | NPU
115
- | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X2 Elite | 153.916 ms | 0 - 0 MB | NPU
116
 
117
  ## License
118
  * The license for the original implementation of Whisper-Small-Quantized can be found
 
15
  We have applied w8a16 quantization to significantly enhance performance and efficiency. HuggingFace Whisper-Small ASR (Automatic Speech Recognition) model is a state-of-the-art system designed for transcribing spoken language into written text. This model is based on the transformer architecture and has been optimized for edge inference by replacing Multi-Head Attention (MHA) with Single-Head Attention (SHA) and linear layers with convolutional (conv) layers. It exhibits robust performance in realistic, noisy environments, making it highly reliable for real-world applications. Specifically, it excels in long-form transcription, capable of accurately transcribing audio clips up to 30 seconds long. Time to the first token is the encoder's latency, while time to each additional token is decoder's latency, where we assume a max decoded length specified below.
16
 
17
  This is based on the implementation of Whisper-Small-Quantized found [here](https://github.com/huggingface/transformers/tree/v4.42.3/src/transformers/models/whisper).
18
+ This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the [Qualcomm® AI Hub Models](https://github.com/qualcomm/ai-hub-models/blob/main/qai_hub_models/models/whisper_small_quantized) library to export with custom configurations. More details on model performance across various devices, can be found [here](#performance-summary).
19
 
20
  Qualcomm AI Hub Models uses [Qualcomm AI Hub Workbench](https://workbench.aihub.qualcomm.com) to compile, profile, and evaluate this model. [Sign up](https://myaccount.qualcomm.com/signup) to run these models on a hosted Qualcomm® device.
21
 
 
28
 
29
  | Runtime | Precision | Chipset | SDK Versions | Download |
30
  |---|---|---|---|---|
31
+ | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X2 Elite | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_x2_elite.zip)
32
+ | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X Elite | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_x_elite.zip)
33
+ | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Gen 3 Mobile | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_8gen3.zip)
34
+ | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS8550 (Proxy) | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_qcs8550_proxy.zip)
35
+ | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_8_elite_for_galaxy.zip)
36
+ | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 7 Gen 4 Mobile | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_7gen4.zip)
37
+ | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_snapdragon_8_elite_gen5.zip)
38
+ | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCM6690 | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_qcm6690.zip)
39
+ | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS9075 | QAIRT 2.42, ONNX Runtime 1.24.1 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-precompiled_qnn_onnx-w8a16-qualcomm_qcs9075.zip)
40
+ | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X2 Elite | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_x2_elite.zip)
41
+ | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X Elite | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_x_elite.zip)
42
+ | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Gen 3 Mobile | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_8gen3.zip)
43
+ | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8550 (Proxy) | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_qcs8550_proxy.zip)
44
+ | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA8775P | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_sa8775p.zip)
45
+ | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_8_elite_for_galaxy.zip)
46
+ | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 7 Gen 4 Mobile | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_7gen4.zip)
47
+ | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_snapdragon_8_elite_gen5.zip)
48
+ | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA7255P | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_sa7255p.zip)
49
+ | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCM6690 | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_qcm6690.zip)
50
+ | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS9075 | QAIRT 2.43 | [Download](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/models/whisper_small_quantized/releases/v0.48.0/whisper_small_quantized-qnn_context_binary-w8a16-qualcomm_qcs9075.zip)
51
 
52
  For more device-specific assets and performance metrics, visit **[Whisper-Small-Quantized on Qualcomm® AI Hub](https://aihub.qualcomm.com/models/whisper_small_quantized)**.
53
 
54
 
55
  ### Option 2: Export with Custom Configurations
56
 
57
+ Use the [Qualcomm® AI Hub Models](https://github.com/qualcomm/ai-hub-models/blob/main/qai_hub_models/models/whisper_small_quantized) Python library to compile and export the model with your own:
58
  - Custom weights (e.g., fine-tuned checkpoints)
59
  - Custom input shapes
60
  - Target device and runtime configurations
61
 
62
  This option is ideal if you need to customize the model beyond the default configuration provided here.
63
 
64
+ See our repository for [Whisper-Small-Quantized on GitHub](https://github.com/qualcomm/ai-hub-models/blob/main/qai_hub_models/models/whisper_small_quantized) for usage instructions.
65
 
66
  ## Model Details
67
 
 
75
  ## Performance Summary
76
  | Model | Runtime | Precision | Chipset | Inference Time (ms) | Peak Memory Range (MB) | Primary Compute Unit
77
  |---|---|---|---|---|---|---
78
+ | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X2 Elite | 3.84 ms | 185 - 185 MB | NPU
79
+ | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X Elite | 8.162 ms | 185 - 185 MB | NPU
80
+ | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Gen 3 Mobile | 6.332 ms | 38 - 45 MB | NPU
81
+ | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS8550 (Proxy) | 8.365 ms | 29 - 30 MB | NPU
82
+ | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS9075 | 9.108 ms | 24 - 57 MB | NPU
83
+ | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCM6690 | 30.98 ms | 29 - 38 MB | NPU
84
+ | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 4.78 ms | 25 - 37 MB | NPU
85
+ | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 7 Gen 4 Mobile | 10.969 ms | 29 - 35 MB | NPU
86
+ | WhisperSmallDecoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 4.004 ms | 30 - 40 MB | NPU
87
+ | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X2 Elite | 4.233 ms | 30 - 30 MB | NPU
88
+ | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X Elite | 7.599 ms | 30 - 30 MB | NPU
89
+ | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Gen 3 Mobile | 6.43 ms | 30 - 38 MB | NPU
90
+ | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8275 (Proxy) | 13.592 ms | 29 - 37 MB | NPU
91
+ | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8550 (Proxy) | 8.259 ms | 30 - 32 MB | NPU
92
+ | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA8775P | 9.279 ms | 30 - 40 MB | NPU
93
+ | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS9075 | 8.925 ms | 25 - 60 MB | NPU
94
+ | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCM6690 | 30.618 ms | 29 - 36 MB | NPU
95
+ | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA7255P | 13.592 ms | 29 - 37 MB | NPU
96
+ | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 4.847 ms | 8 - 17 MB | NPU
97
+ | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 7 Gen 4 Mobile | 10.88 ms | 30 - 37 MB | NPU
98
+ | WhisperSmallDecoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 3.987 ms | 30 - 41 MB | NPU
99
+ | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X2 Elite | 154.62 ms | 127 - 127 MB | NPU
100
+ | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® X Elite | 263.634 ms | 127 - 127 MB | NPU
101
+ | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Gen 3 Mobile | 244.786 ms | 56 - 62 MB | NPU
102
+ | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS8550 (Proxy) | 342.463 ms | 0 - 130 MB | NPU
103
+ | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCS9075 | 254.265 ms | 63 - 67 MB | NPU
104
+ | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Qualcomm® QCM6690 | 4224.409 ms | 2 - 12 MB | NPU
105
+ | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 202.351 ms | 63 - 76 MB | NPU
106
+ | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 7 Gen 4 Mobile | 460.872 ms | 54 - 65 MB | NPU
107
+ | WhisperSmallEncoderQuantizable | PRECOMPILED_QNN_ONNX | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 176.227 ms | 63 - 73 MB | NPU
108
+ | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X2 Elite | 155.158 ms | 0 - 0 MB | NPU
109
+ | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® X Elite | 296.792 ms | 0 - 0 MB | NPU
110
+ | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Gen 3 Mobile | 269.514 ms | 1 - 8 MB | NPU
111
+ | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8275 (Proxy) | 516.472 ms | 1 - 9 MB | NPU
112
+ | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS8550 (Proxy) | 366.644 ms | 1 - 2 MB | NPU
113
+ | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA8775P | 311.056 ms | 0 - 9 MB | NPU
114
+ | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCS9075 | 292.146 ms | 0 - 29 MB | NPU
115
+ | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® QCM6690 | 4164.003 ms | 0 - 7 MB | NPU
116
+ | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Qualcomm® SA7255P | 516.472 ms | 1 - 9 MB | NPU
117
+ | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 224.338 ms | 1 - 10 MB | NPU
118
+ | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 7 Gen 4 Mobile | 474.795 ms | 1 - 7 MB | NPU
119
+ | WhisperSmallEncoderQuantizable | QNN_CONTEXT_BINARY | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 174.107 ms | 1 - 10 MB | NPU
120
 
121
  ## License
122
  * The license for the original implementation of Whisper-Small-Quantized can be found