Distil-Whisper: Optimized for Qualcomm Devices

Distil-Whisper Small English is a distilled version of Whisper Small, optimized for fast and efficient automatic speech recognition.

This is based on the implementation of Distil-Whisper found here. This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the Qualcomm® AI Hub Models library to export with custom configurations. More details on model performance across various devices, can be found here.

Qualcomm AI Hub Models uses Qualcomm AI Hub Workbench to compile, profile, and evaluate this model. Sign up to run these models on a hosted Qualcomm® device.

Getting Started

There are two ways to deploy this model on your device:

Option 1: Download Pre-Exported Models

Below are pre-exported model assets ready for deployment.

Runtime Precision Chipset SDK Versions Download
ONNX float Universal QAIRT 2.42, ONNX Runtime 1.24.3 Download
QNN_DLC float Universal QAIRT 2.45 Download
TFLITE float Universal Download

For more device-specific assets and performance metrics, visit Distil-Whisper on Qualcomm® AI Hub.

Option 2: Export with Custom Configurations

Use the Qualcomm® AI Hub Models Python library to compile and export the model with your own:

  • Custom weights (e.g., fine-tuned checkpoints)
  • Custom input shapes
  • Target device and runtime configurations

This option is ideal if you need to customize the model beyond the default configuration provided here.

See our repository for Distil-Whisper on GitHub for usage instructions.

Model Details

Model Type: Model_use_case.speech_recognition

Model Stats:

  • Model checkpoint: distil-whisper/distil-small.en
  • Input resolution: 80x3000 (30 seconds audio)
  • Max decoded sequence length: 200 tokens
  • Number of parameters (encoder): 166M
  • Model size (encoder) (float): 332 MB
  • Number of parameters (decoder): 211M
  • Model size (decoder) (float): 450MB

Performance Summary

Model Runtime Precision Chipset Inference Time (ms) Peak Memory Range (MB) Primary Compute Unit
decoder ONNX float Snapdragon® 8 Elite Gen 5 Mobile 5.546 ms 52 - 375 MB NPU
decoder ONNX float Snapdragon® 8 Elite Mobile 7.178 ms 16 - 472 MB NPU
decoder ONNX float Snapdragon® X2 Elite 5.083 ms 178 - 178 MB NPU
decoder ONNX float Snapdragon® X Elite 11.41 ms 178 - 178 MB NPU
decoder ONNX float Snapdragon® X Elite 11.41 ms 178 - 178 MB NPU
decoder ONNX float Snapdragon® 8 Gen 3 Mobile 8.67 ms 52 - 426 MB NPU
decoder ONNX float Qualcomm® QCS8550 (Proxy) 11.746 ms 0 - 184 MB NPU
decoder ONNX float Qualcomm® QCS9075 13.188 ms 40 - 82 MB NPU
decoder ONNX float Snapdragon® 8 Elite For Galaxy Mobile 7.178 ms 16 - 472 MB NPU
decoder QNN_DLC float Snapdragon® 8 Elite Gen 5 Mobile 5.92 ms 5 - 504 MB NPU
decoder QNN_DLC float Snapdragon® 8 Elite Mobile 7.295 ms 5 - 548 MB NPU
decoder QNN_DLC float Snapdragon® X2 Elite 6.086 ms 40 - 40 MB NPU
decoder QNN_DLC float Snapdragon® X Elite 10.901 ms 40 - 40 MB NPU
decoder QNN_DLC float Snapdragon® X Elite 10.901 ms 40 - 40 MB NPU
decoder QNN_DLC float Snapdragon® 8 Gen 3 Mobile 8.613 ms 0 - 601 MB NPU
decoder QNN_DLC float Qualcomm® QCS8275 (Proxy) 19.248 ms 28 - 525 MB NPU
decoder QNN_DLC float Qualcomm® QCS8550 (Proxy) 11.477 ms 40 - 43 MB NPU
decoder QNN_DLC float Qualcomm® SA8775P 12.828 ms 30 - 522 MB NPU
decoder QNN_DLC float Qualcomm® SA8775P 12.828 ms 30 - 522 MB NPU
decoder QNN_DLC float Qualcomm® SA8775P 12.828 ms 30 - 522 MB NPU
decoder QNN_DLC float Qualcomm® QCS9075 16.542 ms 40 - 86 MB NPU
decoder QNN_DLC float Qualcomm® QCS8450 (Proxy) 18.114 ms 26 - 330 MB NPU
decoder QNN_DLC float Qualcomm® SA7255P 19.248 ms 28 - 525 MB NPU
decoder QNN_DLC float Qualcomm® SA8295P 14.11 ms 34 - 276 MB NPU
decoder QNN_DLC float Snapdragon® 8 Elite For Galaxy Mobile 7.295 ms 5 - 548 MB NPU
decoder TFLITE float Snapdragon® 8 Elite Gen 5 Mobile 5.848 ms 4 - 570 MB NPU
decoder TFLITE float Snapdragon® 8 Elite Mobile 7.215 ms 4 - 572 MB NPU
decoder TFLITE float Snapdragon® 8 Gen 3 Mobile 8.619 ms 4 - 749 MB NPU
decoder TFLITE float Qualcomm® QCS8275 (Proxy) 19.275 ms 4 - 537 MB NPU
decoder TFLITE float Qualcomm® QCS8550 (Proxy) 11.713 ms 5 - 7 MB NPU
decoder TFLITE float Qualcomm® SA8775P 13.037 ms 5 - 537 MB NPU
decoder TFLITE float Qualcomm® SA8775P 13.037 ms 5 - 537 MB NPU
decoder TFLITE float Qualcomm® SA8775P 13.037 ms 5 - 537 MB NPU
decoder TFLITE float Qualcomm® QCS9075 16.188 ms 0 - 265 MB NPU
decoder TFLITE float Qualcomm® QCS8450 (Proxy) 18.522 ms 5 - 466 MB NPU
decoder TFLITE float Qualcomm® SA7255P 19.275 ms 4 - 537 MB NPU
decoder TFLITE float Qualcomm® SA8295P 14.231 ms 5 - 297 MB NPU
decoder TFLITE float Snapdragon® 8 Elite For Galaxy Mobile 7.215 ms 4 - 572 MB NPU
encoder ONNX float Snapdragon® 8 Elite Gen 5 Mobile 49.731 ms 80 - 839 MB NPU
encoder ONNX float Snapdragon® 8 Elite Mobile 60.575 ms 80 - 734 MB NPU
encoder ONNX float Snapdragon® X2 Elite 50.437 ms 183 - 183 MB NPU
encoder ONNX float Snapdragon® X Elite 123.445 ms 182 - 182 MB NPU
encoder ONNX float Snapdragon® X Elite 123.445 ms 182 - 182 MB NPU
encoder ONNX float Snapdragon® 8 Gen 3 Mobile 82.214 ms 80 - 1239 MB NPU
encoder ONNX float Qualcomm® QCS8550 (Proxy) 118.816 ms 0 - 202 MB NPU
encoder ONNX float Qualcomm® QCS9075 150.604 ms 79 - 83 MB NPU
encoder ONNX float Snapdragon® 8 Elite For Galaxy Mobile 60.575 ms 80 - 734 MB NPU
encoder QNN_DLC float Snapdragon® 8 Elite Gen 5 Mobile 58.551 ms 1 - 712 MB NPU
encoder QNN_DLC float Snapdragon® 8 Elite Mobile 71.602 ms 1 - 692 MB NPU
encoder QNN_DLC float Snapdragon® X2 Elite 60.24 ms 1 - 1 MB NPU
encoder QNN_DLC float Snapdragon® X Elite 139.101 ms 1 - 1 MB NPU
encoder QNN_DLC float Snapdragon® X Elite 139.101 ms 1 - 1 MB NPU
encoder QNN_DLC float Snapdragon® 8 Gen 3 Mobile 97.219 ms 0 - 962 MB NPU
encoder QNN_DLC float Qualcomm® QCS8275 (Proxy) 437.836 ms 1 - 696 MB NPU
encoder QNN_DLC float Qualcomm® QCS8550 (Proxy) 135.253 ms 0 - 7 MB NPU
encoder QNN_DLC float Qualcomm® SA8775P 153.502 ms 1 - 687 MB NPU
encoder QNN_DLC float Qualcomm® SA8775P 153.502 ms 1 - 687 MB NPU
encoder QNN_DLC float Qualcomm® SA8775P 153.502 ms 1 - 687 MB NPU
encoder QNN_DLC float Qualcomm® QCS9075 170.214 ms 1 - 39 MB NPU
encoder QNN_DLC float Qualcomm® QCS8450 (Proxy) 269.206 ms 0 - 823 MB NPU
encoder QNN_DLC float Qualcomm® SA7255P 437.836 ms 1 - 696 MB NPU
encoder QNN_DLC float Qualcomm® SA8295P 192.976 ms 1 - 611 MB NPU
encoder QNN_DLC float Snapdragon® 8 Elite For Galaxy Mobile 71.602 ms 1 - 692 MB NPU
encoder TFLITE float Snapdragon® 8 Elite Gen 5 Mobile 403.997 ms 42 - 85 MB GPU
encoder TFLITE float Snapdragon® 8 Elite Mobile 409.903 ms 40 - 80 MB GPU
encoder TFLITE float Snapdragon® 8 Gen 3 Mobile 475.908 ms 42 - 184 MB GPU
encoder TFLITE float Qualcomm® QCS8275 (Proxy) 3135.306 ms 24 - 69 MB GPU
encoder TFLITE float Qualcomm® QCS8550 (Proxy) 657.568 ms 0 - 318 MB GPU
encoder TFLITE float Qualcomm® SA8775P 1316.653 ms 20 - 64 MB GPU
encoder TFLITE float Qualcomm® SA8775P 1316.653 ms 20 - 64 MB GPU
encoder TFLITE float Qualcomm® SA8775P 1316.653 ms 20 - 64 MB GPU
encoder TFLITE float Qualcomm® QCS9075 1271.896 ms 0 - 40 MB GPU
encoder TFLITE float Qualcomm® QCS8450 (Proxy) 852.126 ms 39 - 193 MB GPU
encoder TFLITE float Qualcomm® SA7255P 3135.306 ms 24 - 69 MB GPU
encoder TFLITE float Qualcomm® SA8295P 671.062 ms 38 - 81 MB GPU
encoder TFLITE float Snapdragon® 8 Elite For Galaxy Mobile 409.903 ms 40 - 80 MB GPU

License

  • The license for the original implementation of Distil-Whisper can be found here.

References

Community

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for qualcomm/Distil-Whisper