Bert-Base-Uncased-Hf: Optimized for Qualcomm Devices
Bert is a lightweight BERT model designed for efficient self-supervised learning of language representations. It can be used for masked language modeling and as a backbone for various NLP tasks.
This is based on the implementation of Bert-Base-Uncased-Hf found here. This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the Qualcomm® AI Hub Models library to export with custom configurations. More details on model performance across various devices, can be found here.
Qualcomm AI Hub Models uses Qualcomm AI Hub Workbench to compile, profile, and evaluate this model. Sign up to run these models on a hosted Qualcomm® device.
Getting Started
There are two ways to deploy this model on your device:
Option 1: Download Pre-Exported Models
Below are pre-exported model assets ready for deployment.
| Runtime | Precision | Chipset | SDK Versions | Download |
|---|---|---|---|---|
| ONNX | float | Universal | QAIRT 2.42, ONNX Runtime 1.24.1 | Download |
| ONNX | w8a16 | Universal | QAIRT 2.42, ONNX Runtime 1.24.1 | Download |
| QNN_DLC | float | Universal | QAIRT 2.43 | Download |
| QNN_DLC | w8a16 | Universal | QAIRT 2.43 | Download |
| TFLITE | float | Universal | QAIRT 2.43, TFLite 2.17.0 | Download |
For more device-specific assets and performance metrics, visit Bert-Base-Uncased-Hf on Qualcomm® AI Hub.
Option 2: Export with Custom Configurations
Use the Qualcomm® AI Hub Models Python library to compile and export the model with your own:
- Custom weights (e.g., fine-tuned checkpoints)
- Custom input shapes
- Target device and runtime configurations
This option is ideal if you need to customize the model beyond the default configuration provided here.
See our repository for Bert-Base-Uncased-Hf on GitHub for usage instructions.
Model Details
Model Type: Model_use_case.text_generation
Model Stats:
- Model checkpoint: google-bert/bert-base-uncased
- Input resolution: 1x384
- Number of parameters: 110M
- Model size (float): 418 MB
Performance Summary
| Model | Runtime | Precision | Chipset | Inference Time (ms) | Peak Memory Range (MB) | Primary Compute Unit |
|---|---|---|---|---|---|---|
| Bert-Base-Uncased-Hf | ONNX | float | Snapdragon® 8 Elite Gen 5 Mobile | 13.772 ms | 0 - 718 MB | NPU |
| Bert-Base-Uncased-Hf | ONNX | float | Snapdragon® X2 Elite | 14.671 ms | 265 - 265 MB | NPU |
| Bert-Base-Uncased-Hf | ONNX | float | Snapdragon® X Elite | 31.087 ms | 265 - 265 MB | NPU |
| Bert-Base-Uncased-Hf | ONNX | float | Snapdragon® 8 Gen 3 Mobile | 23.588 ms | 0 - 748 MB | NPU |
| Bert-Base-Uncased-Hf | ONNX | float | Qualcomm® QCS8550 (Proxy) | 31.393 ms | 0 - 322 MB | NPU |
| Bert-Base-Uncased-Hf | ONNX | float | Qualcomm® QCS9075 | 35.899 ms | 0 - 3 MB | NPU |
| Bert-Base-Uncased-Hf | ONNX | float | Snapdragon® 8 Elite For Galaxy Mobile | 16.783 ms | 0 - 660 MB | NPU |
| Bert-Base-Uncased-Hf | ONNX | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 8.343 ms | 0 - 410 MB | NPU |
| Bert-Base-Uncased-Hf | ONNX | w8a16 | Snapdragon® X2 Elite | 8.75 ms | 154 - 154 MB | NPU |
| Bert-Base-Uncased-Hf | ONNX | w8a16 | Snapdragon® X Elite | 20.908 ms | 154 - 154 MB | NPU |
| Bert-Base-Uncased-Hf | ONNX | w8a16 | Snapdragon® 8 Gen 3 Mobile | 14.824 ms | 0 - 592 MB | NPU |
| Bert-Base-Uncased-Hf | ONNX | w8a16 | Qualcomm® QCS6490 | 2296.756 ms | 188 - 280 MB | CPU |
| Bert-Base-Uncased-Hf | ONNX | w8a16 | Qualcomm® QCS8550 (Proxy) | 19.907 ms | 0 - 203 MB | NPU |
| Bert-Base-Uncased-Hf | ONNX | w8a16 | Qualcomm® QCS9075 | 20.474 ms | 0 - 3 MB | NPU |
| Bert-Base-Uncased-Hf | ONNX | w8a16 | Qualcomm® QCM6690 | 1212.598 ms | 198 - 212 MB | CPU |
| Bert-Base-Uncased-Hf | ONNX | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 10.874 ms | 0 - 417 MB | NPU |
| Bert-Base-Uncased-Hf | ONNX | w8a16 | Snapdragon® 7 Gen 4 Mobile | 1189.815 ms | 205 - 220 MB | CPU |
| Bert-Base-Uncased-Hf | QNN_DLC | float | Snapdragon® 8 Elite Gen 5 Mobile | 9.467 ms | 0 - 538 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | float | Snapdragon® X2 Elite | 10.614 ms | 1 - 1 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | float | Snapdragon® X Elite | 22.514 ms | 0 - 0 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | float | Snapdragon® 8 Gen 3 Mobile | 16.965 ms | 0 - 584 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | float | Qualcomm® QCS8275 (Proxy) | 81.5 ms | 0 - 519 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | float | Qualcomm® QCS8550 (Proxy) | 23.435 ms | 0 - 287 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | float | Qualcomm® SA8775P | 28.695 ms | 0 - 520 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | float | Qualcomm® QCS9075 | 28.729 ms | 0 - 2 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | float | Qualcomm® QCS8450 (Proxy) | 47.754 ms | 0 - 567 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | float | Qualcomm® SA7255P | 81.5 ms | 0 - 519 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | float | Qualcomm® SA8295P | 35.989 ms | 0 - 501 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | float | Snapdragon® 8 Elite For Galaxy Mobile | 11.864 ms | 0 - 519 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 5.138 ms | 0 - 414 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | w8a16 | Snapdragon® X2 Elite | 6.083 ms | 1 - 1 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | w8a16 | Snapdragon® X Elite | 13.941 ms | 0 - 0 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | w8a16 | Snapdragon® 8 Gen 3 Mobile | 9.14 ms | 0 - 499 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | w8a16 | Qualcomm® QCS8275 (Proxy) | 30.762 ms | 0 - 408 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | w8a16 | Qualcomm® QCS8550 (Proxy) | 13.264 ms | 0 - 2 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | w8a16 | Qualcomm® SA8775P | 60.294 ms | 0 - 408 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | w8a16 | Qualcomm® QCS9075 | 15.641 ms | 0 - 2 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | w8a16 | Qualcomm® SA7255P | 30.762 ms | 0 - 408 MB | NPU |
| Bert-Base-Uncased-Hf | QNN_DLC | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 7.284 ms | 0 - 409 MB | NPU |
| Bert-Base-Uncased-Hf | TFLITE | float | Snapdragon® 8 Elite Gen 5 Mobile | 9.746 ms | 0 - 546 MB | NPU |
| Bert-Base-Uncased-Hf | TFLITE | float | Snapdragon® 8 Gen 3 Mobile | 17.219 ms | 0 - 597 MB | NPU |
| Bert-Base-Uncased-Hf | TFLITE | float | Qualcomm® QCS8275 (Proxy) | 81.834 ms | 0 - 532 MB | NPU |
| Bert-Base-Uncased-Hf | TFLITE | float | Qualcomm® QCS8550 (Proxy) | 22.58 ms | 0 - 3 MB | NPU |
| Bert-Base-Uncased-Hf | TFLITE | float | Qualcomm® SA8775P | 28.883 ms | 0 - 533 MB | NPU |
| Bert-Base-Uncased-Hf | TFLITE | float | Qualcomm® QCS9075 | 28.838 ms | 0 - 259 MB | NPU |
| Bert-Base-Uncased-Hf | TFLITE | float | Qualcomm® QCS8450 (Proxy) | 32.538 ms | 0 - 565 MB | NPU |
| Bert-Base-Uncased-Hf | TFLITE | float | Qualcomm® SA7255P | 81.834 ms | 0 - 532 MB | NPU |
| Bert-Base-Uncased-Hf | TFLITE | float | Qualcomm® SA8295P | 35.491 ms | 0 - 503 MB | NPU |
| Bert-Base-Uncased-Hf | TFLITE | float | Snapdragon® 8 Elite For Galaxy Mobile | 12.214 ms | 0 - 536 MB | NPU |
License
- The license for the original implementation of Bert-Base-Uncased-Hf can be found here.
References
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Source Model Implementation
Community
- Join our AI Hub Slack community to collaborate, post questions and learn more about on-device AI.
- For questions or feedback please reach out to us.
