File size: 3,685 Bytes
e148d8a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | ---
license: other
tags:
- qualcomm
- qcs9075
- edge-ai
- genie
- qnn-htp
- llm
base_model: meta-llama/Llama-3.2-3B-Instruct
---
# Llama-3.2-3B-Instruct-QCS9075-HTP
This is a pre-compiled version of [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) optimized for the **Qualcomm QCS9075 SoC** using the **Qualcomm Genie SDK**.
## Model Details
- **Base Model**: [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
- **Target Hardware**: Qualcomm QCS9075 (IQ-9075 EVK)
- **Backend**: QnnHtp (NPU)
- **Quantization**: W4A16
- **Compilation**: Qualcomm AI Hub (QAIRT 2.42)
## Performance
| Model | Backend | Performance | Size |
|-------|---------|-------------|------|
| Llama-3.2-3B-Instruct-QCS9075-HTP | QnnHtp (NPU) | ~18.7 TPS on QCS9075 | 2.5G |
*TPS = Tokens Per Second (generation speed)*
## Hardware Requirements
- **Device**: Qualcomm IQ-9075 EVK or QCS9075-based device
- **OS**: Ubuntu 22.04 (recommended)
- **SDK**: Qualcomm Genie SDK
- **QAIRT**: Version 2.42 or later
## Usage
### Prerequisites
1. Install the Qualcomm Genie SDK on your QCS9075 device
2. Download all model files from this repository
3. Ensure QAIRT 2.42 libraries are available
### Environment Setup
For HTP models, the LD_LIBRARY_PATH ordering is critical:
```bash
export LD_LIBRARY_PATH=/opt/qcom/aistack/qairt/2.42.0.250923/lib/aarch64-linux-gnu:/opt/qcom/aistack/genie/qnn/libs:$LD_LIBRARY_PATH
```
### Configuration
Create a `genie_config.json` file:
```json
{
"model_path": "/path/to/model/files",
"backend": "QnnHtp",
"device": "0"
}
```
### Running the Model
```bash
# Using the Genie server
python3 /opt/qcom/aistack/genie/examples/server_persistent.py \
--config genie_config.json \
--port 8000
```
### Kubernetes Deployment
For deploying on Kubernetes clusters with QCS9075 nodes, refer to the deployment pattern:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: genie-llm-server
spec:
containers:
- name: genie
image: your-registry/genie-runtime:latest
env:
- name: LD_LIBRARY_PATH
value: "'/opt/qcom/aistack/qairt/2.42.0.250923/lib/aarch64-linux-gnu:/opt/qcom/aistack/genie/qnn/libs'"
volumeMounts:
- name: model-storage
mountPath: /models
- name: qcom-libs
mountPath: /opt/qcom/aistack
volumes:
- name: model-storage
hostPath:
path: /mnt/models/llama-3.2-3b-instruct-qcs9075-htp
- name: qcom-libs
hostPath:
path: /opt/qcom/aistack
```
## File Structure
This repository contains:
- Compiled model artifacts (.bin files)
- Configuration files (genie_config.json)
- QNN HTP context binaries
## Benchmarking Notes
- Performance metrics measured on Qualcomm IQ-9075 EVK
- TPS (Tokens Per Second) measured during generation phase
- Results may vary based on prompt length and complexity
- HTP backend utilizes the NPU for acceleration
## License
This model follows the license of the base model [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct). Please refer to the original model card for license details.
## Acknowledgments
- Base model: [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
- Compiled using Qualcomm AI Hub with QAIRT 2.42
- Target hardware: Qualcomm QCS9075 SoC
## Support
For issues related to:
- **Model compilation**: Contact Qualcomm AI Hub support
- **Genie SDK**: Refer to Qualcomm Genie documentation
- **Deployment**: Open an issue in this repository
---
*This model is optimized for edge deployment on Qualcomm QCS9075 devices and may not work on other hardware platforms.*
|