library_name: pytorch
license: other
tags:
- backbone
- android
pipeline_tag: text-generation
Albert-Base-V2-Hf: Optimized for Qualcomm Devices
ALBERT is a lightweight BERT model designed for efficient self-supervised learning of language representations. It can be used for masked language modeling and as a backbone for various NLP tasks.
This is based on the implementation of Albert-Base-V2-Hf found here. This repository contains pre-exported model files optimized for Qualcomm® devices. You can use the Qualcomm® AI Hub Models library to export with custom configurations. More details on model performance across various devices, can be found here.
Qualcomm AI Hub Models uses Qualcomm AI Hub Workbench to compile, profile, and evaluate this model. Sign up to run these models on a hosted Qualcomm® device.
Getting Started
There are two ways to deploy this model on your device:
Option 1: Download Pre-Exported Models
Below are pre-exported model assets ready for deployment.
| Runtime | Precision | Chipset | SDK Versions | Download |
|---|---|---|---|---|
| ONNX | float | Universal | QAIRT 2.37, ONNX Runtime 1.23.0 | Download |
| QNN_DLC | float | Universal | QAIRT 2.42 | Download |
| QNN_DLC | w8a16 | Universal | QAIRT 2.42 | Download |
| TFLITE | float | Universal | QAIRT 2.42, TFLite 2.17.0 | Download |
For more device-specific assets and performance metrics, visit Albert-Base-V2-Hf on Qualcomm® AI Hub.
Option 2: Export with Custom Configurations
Use the Qualcomm® AI Hub Models Python library to compile and export the model with your own:
- Custom weights (e.g., fine-tuned checkpoints)
- Custom input shapes
- Target device and runtime configurations
This option is ideal if you need to customize the model beyond the default configuration provided here.
See our repository for Albert-Base-V2-Hf on GitHub for usage instructions.
Model Details
Model Type: Model_use_case.text_generation
Model Stats:
- Model checkpoint: albert/albert-base-v2
- Input resolution: 1x384
- Number of parameters: 11.8M
- Model size (float): 43.9 MB
Performance Summary
| Model | Runtime | Precision | Chipset | Inference Time (ms) | Peak Memory Range (MB) | Primary Compute Unit |
|---|---|---|---|---|---|---|
| Albert-Base-V2-Hf | ONNX | float | Snapdragon® X Elite | 28.987 ms | 33 - 33 MB | NPU |
| Albert-Base-V2-Hf | ONNX | float | Snapdragon® 8 Gen 3 Mobile | 22.278 ms | 0 - 385 MB | NPU |
| Albert-Base-V2-Hf | ONNX | float | Qualcomm® QCS8550 (Proxy) | 30.096 ms | 0 - 317 MB | NPU |
| Albert-Base-V2-Hf | ONNX | float | Qualcomm® QCS9075 | 32.316 ms | 0 - 3 MB | NPU |
| Albert-Base-V2-Hf | ONNX | float | Snapdragon® 8 Elite For Galaxy Mobile | 17.079 ms | 0 - 326 MB | NPU |
| Albert-Base-V2-Hf | ONNX | float | Snapdragon® 8 Elite Gen 5 Mobile | 14.437 ms | 0 - 335 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | float | Snapdragon® X Elite | 22.358 ms | 0 - 0 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | float | Snapdragon® 8 Gen 3 Mobile | 17.408 ms | 0 - 375 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | float | Qualcomm® QCS8275 (Proxy) | 74.867 ms | 0 - 317 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | float | Qualcomm® QCS8550 (Proxy) | 23.222 ms | 0 - 2 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | float | Qualcomm® SA8775P | 27.658 ms | 0 - 316 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | float | Qualcomm® QCS9075 | 26.841 ms | 0 - 2 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | float | Qualcomm® QCS8450 (Proxy) | 35.81 ms | 0 - 421 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | float | Qualcomm® SA7255P | 74.867 ms | 0 - 317 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | float | Qualcomm® SA8295P | 33.432 ms | 0 - 376 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | float | Snapdragon® 8 Elite For Galaxy Mobile | 11.917 ms | 0 - 386 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | float | Snapdragon® 8 Elite Gen 5 Mobile | 9.823 ms | 0 - 391 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | w8a16 | Snapdragon® X Elite | 13.997 ms | 0 - 0 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | w8a16 | Snapdragon® 8 Gen 3 Mobile | 9.558 ms | 0 - 295 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | w8a16 | Qualcomm® QCS8275 (Proxy) | 29.59 ms | 0 - 257 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | w8a16 | Qualcomm® QCS8550 (Proxy) | 13.542 ms | 0 - 236 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | w8a16 | Qualcomm® SA8775P | 13.392 ms | 0 - 249 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | w8a16 | Qualcomm® QCS9075 | 15.845 ms | 0 - 2 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | w8a16 | Qualcomm® SA7255P | 29.59 ms | 0 - 257 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | w8a16 | Snapdragon® 8 Elite For Galaxy Mobile | 7.916 ms | 0 - 248 MB | NPU |
| Albert-Base-V2-Hf | QNN_DLC | w8a16 | Snapdragon® 8 Elite Gen 5 Mobile | 5.495 ms | 0 - 265 MB | NPU |
| Albert-Base-V2-Hf | TFLITE | float | Snapdragon® 8 Gen 3 Mobile | 17.443 ms | 0 - 387 MB | NPU |
| Albert-Base-V2-Hf | TFLITE | float | Qualcomm® QCS8275 (Proxy) | 74.61 ms | 0 - 321 MB | NPU |
| Albert-Base-V2-Hf | TFLITE | float | Qualcomm® QCS8550 (Proxy) | 22.268 ms | 0 - 343 MB | NPU |
| Albert-Base-V2-Hf | TFLITE | float | Qualcomm® SA8775P | 27.317 ms | 0 - 321 MB | NPU |
| Albert-Base-V2-Hf | TFLITE | float | Qualcomm® QCS9075 | 27.14 ms | 0 - 33 MB | NPU |
| Albert-Base-V2-Hf | TFLITE | float | Qualcomm® QCS8450 (Proxy) | 40.475 ms | 0 - 419 MB | NPU |
| Albert-Base-V2-Hf | TFLITE | float | Qualcomm® SA7255P | 74.61 ms | 0 - 321 MB | NPU |
| Albert-Base-V2-Hf | TFLITE | float | Qualcomm® SA8295P | 34.418 ms | 0 - 377 MB | NPU |
| Albert-Base-V2-Hf | TFLITE | float | Snapdragon® 8 Elite For Galaxy Mobile | 12.431 ms | 0 - 391 MB | NPU |
| Albert-Base-V2-Hf | TFLITE | float | Snapdragon® 8 Elite Gen 5 Mobile | 10.065 ms | 0 - 388 MB | NPU |
License
- The license for the original implementation of Albert-Base-V2-Hf can be found here.
References
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
- Source Model Implementation
Community
- Join our AI Hub Slack community to collaborate, post questions and learn more about on-device AI.
- For questions or feedback please reach out to us.
