Distil-Whisper: Optimized for Qualcomm Devices

Distil-Whisper Small English is a distilled version of Whisper Small, optimized for fast and efficient automatic speech recognition.

This is based on the implementation of Distil-Whisper found here. This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the Qualcomm® AI Hub Models library to export with custom configurations. More details on model performance across various devices, can be found here.

Qualcomm AI Hub Models uses Qualcomm AI Hub Workbench to compile, profile, and evaluate this model. Sign up to run these models on a hosted Qualcomm® device.

Getting Started

There are two ways to deploy this model on your device:

Option 1: Download Pre-Exported Models

Below are pre-exported model assets ready for deployment.

Runtime	Precision	Chipset	SDK Versions	Download
ONNX	float	Universal	QAIRT 2.45, ONNX Runtime 1.25.0	Download
QNN_DLC	float	Universal	QAIRT 2.45	Download
TFLITE	float	Universal		Download

For more device-specific assets and performance metrics, visit Distil-Whisper on Qualcomm® AI Hub.

Option 2: Export with Custom Configurations

Use the Qualcomm® AI Hub Models Python library to compile and export the model with your own:

Custom weights (e.g., fine-tuned checkpoints)
Custom input shapes
Target device and runtime configurations

This option is ideal if you need to customize the model beyond the default configuration provided here.

See our repository for Distil-Whisper on GitHub for usage instructions.

Model Details

Model Type: Model_use_case.speech_recognition

Model Stats:

Model checkpoint: distil-whisper/distil-small.en
Input resolution: 80x3000 (30 seconds audio)
Max decoded sequence length: 200 tokens
Number of parameters (encoder): 166M
Model size (encoder) (float): 332 MB
Number of parameters (decoder): 211M
Model size (decoder) (float): 450MB

Performance Summary

Model	Runtime	Precision	Chipset	Inference Time (ms)	Peak Memory Range (MB)	Primary Compute Unit
decoder	ONNX	float	Snapdragon® X2 Elite	5.285 ms	40 - 40 MB	NPU
decoder	ONNX	float	Snapdragon® X Elite	10.717 ms	179 - 179 MB	NPU
decoder	ONNX	float	Snapdragon® 8 Gen 3 Mobile	8.795 ms	2 - 489 MB	NPU
decoder	ONNX	float	Snapdragon® 8 Gen 1 Mobile	18.22 ms	50 - 396 MB	NPU
decoder	ONNX	float	Qualcomm® Dragonwing™ QCS8550 (Proxy)	11.846 ms	0 - 183 MB	NPU
decoder	ONNX	float	Qualcomm® QCS8450	18.22 ms	50 - 396 MB	NPU
decoder	ONNX	float	Qualcomm® Dragonwing™ IQ-9075	15.344 ms	40 - 82 MB	NPU
decoder	ONNX	float	Snapdragon® 8 Elite Gen 5 Mobile	5.737 ms	2 - 542 MB	NPU
decoder	ONNX	float	Snapdragon® 8 Elite Mobile	7.259 ms	15 - 567 MB	NPU
decoder	ONNX	float	Qualcomm® Dragonwing™ Q-8750	7.259 ms	15 - 567 MB	NPU
decoder	ONNX	float	Qualcomm® Dragonwing™ IQ-X7181	10.717 ms	179 - 179 MB	NPU
decoder	QNN_DLC	float	Snapdragon® X2 Elite	6.077 ms	40 - 40 MB	NPU
decoder	QNN_DLC	float	Snapdragon® X Elite	11.027 ms	40 - 40 MB	NPU
decoder	QNN_DLC	float	Snapdragon® 8 Gen 3 Mobile	8.627 ms	39 - 640 MB	NPU
decoder	QNN_DLC	float	Snapdragon® 8 Gen 1 Mobile	18.132 ms	36 - 338 MB	NPU
decoder	QNN_DLC	float	Qualcomm® QCS8275	19.086 ms	29 - 526 MB	NPU
decoder	QNN_DLC	float	Qualcomm® Dragonwing™ QCS8550 (Proxy)	11.513 ms	40 - 541 MB	NPU
decoder	QNN_DLC	float	Qualcomm® SA8775P	12.944 ms	30 - 521 MB	NPU
decoder	QNN_DLC	float	Qualcomm® SA8650P	12.944 ms	30 - 521 MB	NPU
decoder	QNN_DLC	float	Qualcomm® SA8255P	12.944 ms	30 - 521 MB	NPU
decoder	QNN_DLC	float	Qualcomm® QCS8450	18.132 ms	36 - 338 MB	NPU
decoder	QNN_DLC	float	Qualcomm® Dragonwing™ IQ-9075	12.65 ms	42 - 88 MB	NPU
decoder	QNN_DLC	float	Snapdragon® 8 Elite Gen 5 Mobile	5.928 ms	1 - 502 MB	NPU
decoder	QNN_DLC	float	Qualcomm® SA7255P	19.086 ms	29 - 526 MB	NPU
decoder	QNN_DLC	float	Snapdragon® 8 Elite Mobile	7.306 ms	0 - 543 MB	NPU
decoder	QNN_DLC	float	Qualcomm® SA8295P	14.042 ms	18 - 260 MB	NPU
decoder	QNN_DLC	float	Qualcomm® Dragonwing™ Q-8750	7.306 ms	0 - 543 MB	NPU
decoder	QNN_DLC	float	Qualcomm® Dragonwing™ IQ-X7181	11.027 ms	40 - 40 MB	NPU
decoder	TFLITE	float	Snapdragon® 8 Gen 3 Mobile	8.707 ms	0 - 743 MB	NPU
decoder	TFLITE	float	Snapdragon® 8 Gen 1 Mobile	18.449 ms	5 - 466 MB	NPU
decoder	TFLITE	float	Qualcomm® QCS8275	19.323 ms	5 - 539 MB	NPU
decoder	TFLITE	float	Qualcomm® Dragonwing™ QCS8550 (Proxy)	11.475 ms	5 - 8 MB	NPU
decoder	TFLITE	float	Qualcomm® SA8775P	13.02 ms	5 - 538 MB	NPU
decoder	TFLITE	float	Qualcomm® SA8650P	13.02 ms	5 - 538 MB	NPU
decoder	TFLITE	float	Qualcomm® SA8255P	13.02 ms	5 - 538 MB	NPU
decoder	TFLITE	float	Qualcomm® QCS8450	18.449 ms	5 - 466 MB	NPU
decoder	TFLITE	float	Qualcomm® Dragonwing™ IQ-9075	16.364 ms	0 - 265 MB	NPU
decoder	TFLITE	float	Snapdragon® 8 Elite Gen 5 Mobile	5.84 ms	4 - 575 MB	NPU
decoder	TFLITE	float	Qualcomm® SA7255P	19.323 ms	5 - 539 MB	NPU
decoder	TFLITE	float	Snapdragon® 8 Elite Mobile	7.226 ms	4 - 574 MB	NPU
decoder	TFLITE	float	Qualcomm® SA8295P	13.952 ms	5 - 297 MB	NPU
decoder	TFLITE	float	Qualcomm® Dragonwing™ Q-8750	7.226 ms	4 - 574 MB	NPU
encoder	ONNX	float	Snapdragon® X2 Elite	60.235 ms	133 - 133 MB	NPU
encoder	ONNX	float	Snapdragon® X Elite	133.488 ms	237 - 237 MB	NPU
encoder	ONNX	float	Snapdragon® 8 Gen 3 Mobile	97.932 ms	81 - 1156 MB	NPU
encoder	ONNX	float	Snapdragon® 8 Gen 1 Mobile	269.405 ms	72 - 1006 MB	NPU
encoder	ONNX	float	Qualcomm® Dragonwing™ QCS8550 (Proxy)	129.944 ms	0 - 260 MB	NPU
encoder	ONNX	float	Qualcomm® QCS8450	269.405 ms	72 - 1006 MB	NPU
encoder	ONNX	float	Qualcomm® Dragonwing™ IQ-9075	153.353 ms	81 - 84 MB	NPU
encoder	ONNX	float	Snapdragon® 8 Elite Gen 5 Mobile	57.909 ms	46 - 775 MB	NPU
encoder	ONNX	float	Snapdragon® 8 Elite Mobile	71.501 ms	79 - 785 MB	NPU
encoder	ONNX	float	Qualcomm® Dragonwing™ Q-8750	71.501 ms	79 - 785 MB	NPU
encoder	ONNX	float	Qualcomm® Dragonwing™ IQ-X7181	133.488 ms	237 - 237 MB	NPU
encoder	QNN_DLC	float	Snapdragon® X2 Elite	60.347 ms	1 - 1 MB	NPU
encoder	QNN_DLC	float	Snapdragon® X Elite	139.879 ms	1 - 1 MB	NPU
encoder	QNN_DLC	float	Snapdragon® 8 Gen 3 Mobile	97.308 ms	0 - 964 MB	NPU
encoder	QNN_DLC	float	Snapdragon® 8 Gen 1 Mobile	268.383 ms	0 - 823 MB	NPU
encoder	QNN_DLC	float	Qualcomm® QCS8275	437.616 ms	1 - 696 MB	NPU
encoder	QNN_DLC	float	Qualcomm® Dragonwing™ QCS8550 (Proxy)	135.106 ms	0 - 548 MB	NPU
encoder	QNN_DLC	float	Qualcomm® SA8775P	154.308 ms	1 - 687 MB	NPU
encoder	QNN_DLC	float	Qualcomm® SA8650P	154.308 ms	1 - 687 MB	NPU
encoder	QNN_DLC	float	Qualcomm® SA8255P	154.308 ms	1 - 687 MB	NPU
encoder	QNN_DLC	float	Qualcomm® QCS8450	268.383 ms	0 - 823 MB	NPU
encoder	QNN_DLC	float	Qualcomm® Dragonwing™ IQ-9075	153.453 ms	1 - 39 MB	NPU
encoder	QNN_DLC	float	Snapdragon® 8 Elite Gen 5 Mobile	58.432 ms	0 - 718 MB	NPU
encoder	QNN_DLC	float	Qualcomm® SA7255P	437.616 ms	1 - 696 MB	NPU
encoder	QNN_DLC	float	Snapdragon® 8 Elite Mobile	71.17 ms	1 - 691 MB	NPU
encoder	QNN_DLC	float	Qualcomm® SA8295P	193.09 ms	1 - 611 MB	NPU
encoder	QNN_DLC	float	Qualcomm® Dragonwing™ Q-8750	71.17 ms	1 - 691 MB	NPU
encoder	QNN_DLC	float	Qualcomm® Dragonwing™ IQ-X7181	139.879 ms	1 - 1 MB	NPU
encoder	TFLITE	float	Snapdragon® 8 Gen 3 Mobile	475.47 ms	42 - 187 MB	GPU
encoder	TFLITE	float	Snapdragon® 8 Gen 1 Mobile	836.934 ms	41 - 194 MB	GPU
encoder	TFLITE	float	Qualcomm® QCS8275	3113.712 ms	36 - 81 MB	GPU
encoder	TFLITE	float	Qualcomm® Dragonwing™ QCS8550 (Proxy)	653.403 ms	0 - 281 MB	GPU
encoder	TFLITE	float	Qualcomm® SA8775P	1324.874 ms	32 - 75 MB	GPU
encoder	TFLITE	float	Qualcomm® SA8650P	1324.874 ms	32 - 75 MB	GPU
encoder	TFLITE	float	Qualcomm® SA8255P	1324.874 ms	32 - 75 MB	GPU
encoder	TFLITE	float	Qualcomm® QCS8450	836.934 ms	41 - 194 MB	GPU
encoder	TFLITE	float	Qualcomm® Dragonwing™ IQ-9075	1264.937 ms	0 - 40 MB	GPU
encoder	TFLITE	float	Snapdragon® 8 Elite Gen 5 Mobile	359.55 ms	42 - 82 MB	GPU
encoder	TFLITE	float	Qualcomm® SA7255P	3113.712 ms	36 - 81 MB	GPU
encoder	TFLITE	float	Snapdragon® 8 Elite Mobile	400.716 ms	40 - 79 MB	GPU
encoder	TFLITE	float	Qualcomm® SA8295P	672.024 ms	40 - 84 MB	GPU
encoder	TFLITE	float	Qualcomm® Dragonwing™ Q-8750	400.716 ms	40 - 79 MB	GPU

License

The license for the original implementation of Distil-Whisper can be found here.

References

Community

Join our AI Hub Slack community to collaborate, post questions and learn more about on-device AI.
For questions or feedback please reach out to us.

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for qualcomm/Distil-Whisper

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Paper • 2311.00430 • Published Nov 1, 2023 • 56