Mobile-Bert-Uncased-Google: Optimized for Qualcomm Devices

MOBILEBERT is a lightweight BERT model designed for efficient self-supervised learning of language representations. It can be used for masked language modeling and as a backbone for various NLP tasks.

This is based on the implementation of Mobile-Bert-Uncased-Google found here. This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the Qualcomm® AI Hub Models library to export with custom configurations. More details on model performance across various devices, can be found here.

Qualcomm AI Hub Models uses Qualcomm AI Hub Workbench to compile, profile, and evaluate this model. Sign up to run these models on a hosted Qualcomm® device.

Getting Started

There are two ways to deploy this model on your device:

Option 1: Download Pre-Exported Models

Below are pre-exported model assets ready for deployment.

Runtime Precision Chipset SDK Versions Download
ONNX float Universal QAIRT 2.42, ONNX Runtime 1.24.1 Download
ONNX w8a16 Universal QAIRT 2.42, ONNX Runtime 1.24.1 Download
QNN_DLC float Universal QAIRT 2.43 Download
QNN_DLC w8a16 Universal QAIRT 2.43 Download
TFLITE float Universal QAIRT 2.43, TFLite 2.17.0 Download

For more device-specific assets and performance metrics, visit Mobile-Bert-Uncased-Google on Qualcomm® AI Hub.

Option 2: Export with Custom Configurations

Use the Qualcomm® AI Hub Models Python library to compile and export the model with your own:

  • Custom weights (e.g., fine-tuned checkpoints)
  • Custom input shapes
  • Target device and runtime configurations

This option is ideal if you need to customize the model beyond the default configuration provided here.

See our repository for Mobile-Bert-Uncased-Google on GitHub for usage instructions.

Model Details

Model Type: Model_use_case.text_generation

Model Stats:

  • Model checkpoint: mobile_bert_uncased_google
  • Input resolution: 1x384
  • Number of parameters: 25.3M
  • Model size (float): 130 MB

Performance Summary

Model Runtime Precision Chipset Inference Time (ms) Peak Memory Range (MB) Primary Compute Unit
Mobile-Bert-Uncased-Google ONNX float Snapdragon® 8 Elite Gen 5 Mobile 14.096 ms 0 - 230 MB NPU
Mobile-Bert-Uncased-Google ONNX float Snapdragon® X2 Elite 14.312 ms 80 - 80 MB NPU
Mobile-Bert-Uncased-Google ONNX float Snapdragon® X Elite 25.782 ms 80 - 80 MB NPU
Mobile-Bert-Uncased-Google ONNX float Snapdragon® 8 Gen 3 Mobile 20.089 ms 0 - 403 MB NPU
Mobile-Bert-Uncased-Google ONNX float Qualcomm® QCS8550 (Proxy) 26.339 ms 0 - 91 MB NPU
Mobile-Bert-Uncased-Google ONNX float Qualcomm® QCS9075 28.822 ms 0 - 3 MB NPU
Mobile-Bert-Uncased-Google ONNX float Snapdragon® 8 Elite For Galaxy Mobile 15.467 ms 0 - 206 MB NPU
Mobile-Bert-Uncased-Google ONNX w8a16 Snapdragon® 8 Elite Gen 5 Mobile 10.877 ms 0 - 412 MB NPU
Mobile-Bert-Uncased-Google ONNX w8a16 Snapdragon® X2 Elite 11.162 ms 46 - 46 MB NPU
Mobile-Bert-Uncased-Google ONNX w8a16 Snapdragon® X Elite 18.306 ms 47 - 47 MB NPU
Mobile-Bert-Uncased-Google ONNX w8a16 Snapdragon® 8 Gen 3 Mobile 12.924 ms 0 - 536 MB NPU
Mobile-Bert-Uncased-Google ONNX w8a16 Qualcomm® QCS6490 1259.219 ms 124 - 140 MB CPU
Mobile-Bert-Uncased-Google ONNX w8a16 Qualcomm® QCS8550 (Proxy) 17.56 ms 0 - 61 MB NPU
Mobile-Bert-Uncased-Google ONNX w8a16 Qualcomm® QCS9075 21.229 ms 0 - 3 MB NPU
Mobile-Bert-Uncased-Google ONNX w8a16 Qualcomm® QCM6690 564.377 ms 231 - 263 MB CPU
Mobile-Bert-Uncased-Google ONNX w8a16 Snapdragon® 8 Elite For Galaxy Mobile 11.608 ms 0 - 389 MB NPU
Mobile-Bert-Uncased-Google ONNX w8a16 Snapdragon® 7 Gen 4 Mobile 519.5 ms 219 - 250 MB CPU
Mobile-Bert-Uncased-Google QNN_DLC float Snapdragon® 8 Elite Gen 5 Mobile 14.022 ms 0 - 208 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC float Snapdragon® X2 Elite 14.707 ms 1 - 1 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC float Snapdragon® X Elite 26.062 ms 0 - 0 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC float Snapdragon® 8 Gen 3 Mobile 20.052 ms 0 - 319 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC float Qualcomm® QCS8275 (Proxy) 57.495 ms 0 - 205 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC float Qualcomm® QCS8550 (Proxy) 26.454 ms 0 - 29 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC float Qualcomm® SA8775P 28.682 ms 0 - 205 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC float Qualcomm® QCS9075 29.012 ms 0 - 2 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC float Qualcomm® QCS8450 (Proxy) 45.987 ms 0 - 315 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC float Qualcomm® SA7255P 57.495 ms 0 - 205 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC float Qualcomm® SA8295P 33.831 ms 0 - 204 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC float Snapdragon® 8 Elite For Galaxy Mobile 15.485 ms 0 - 209 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC w8a16 Snapdragon® 8 Elite Gen 5 Mobile 10.79 ms 0 - 356 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC w8a16 Snapdragon® X2 Elite 11.4 ms 1 - 1 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC w8a16 Snapdragon® X Elite 18.548 ms 0 - 0 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC w8a16 Snapdragon® 8 Gen 3 Mobile 12.699 ms 0 - 386 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC w8a16 Qualcomm® QCS8275 (Proxy) 33.265 ms 0 - 364 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC w8a16 Qualcomm® QCS8550 (Proxy) 17.71 ms 0 - 2 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC w8a16 Qualcomm® SA8775P 18.204 ms 0 - 304 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC w8a16 Qualcomm® QCS9075 21.083 ms 0 - 2 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC w8a16 Qualcomm® QCS8450 (Proxy) 22.253 ms 0 - 384 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC w8a16 Qualcomm® SA7255P 33.265 ms 0 - 364 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC w8a16 Qualcomm® SA8295P 21.196 ms 0 - 368 MB NPU
Mobile-Bert-Uncased-Google QNN_DLC w8a16 Snapdragon® 8 Elite For Galaxy Mobile 11.605 ms 0 - 360 MB NPU
Mobile-Bert-Uncased-Google TFLITE float Snapdragon® 8 Elite Gen 5 Mobile 13.706 ms 0 - 267 MB NPU
Mobile-Bert-Uncased-Google TFLITE float Snapdragon® 8 Gen 3 Mobile 19.218 ms 0 - 355 MB NPU
Mobile-Bert-Uncased-Google TFLITE float Qualcomm® QCS8275 (Proxy) 55.12 ms 1 - 248 MB NPU
Mobile-Bert-Uncased-Google TFLITE float Qualcomm® QCS8550 (Proxy) 25.64 ms 0 - 3 MB NPU
Mobile-Bert-Uncased-Google TFLITE float Qualcomm® SA8775P 27.822 ms 0 - 248 MB NPU
Mobile-Bert-Uncased-Google TFLITE float Qualcomm® QCS9075 28.158 ms 0 - 82 MB NPU
Mobile-Bert-Uncased-Google TFLITE float Qualcomm® QCS8450 (Proxy) 44.357 ms 0 - 343 MB NPU
Mobile-Bert-Uncased-Google TFLITE float Qualcomm® SA7255P 55.12 ms 1 - 248 MB NPU
Mobile-Bert-Uncased-Google TFLITE float Qualcomm® SA8295P 32.512 ms 0 - 236 MB NPU
Mobile-Bert-Uncased-Google TFLITE float Snapdragon® 8 Elite For Galaxy Mobile 14.901 ms 0 - 272 MB NPU

License

  • The license for the original implementation of Mobile-Bert-Uncased-Google can be found here.

References

Community

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for qualcomm/Mobile-Bert-Uncased-Google