Hariharasubramanian's picture
Upload Llama-3.2-3B-Instruct model for QCS9075 (HTP backend)
e148d8a verified
---
license: other
tags:
- qualcomm
- qcs9075
- edge-ai
- genie
- qnn-htp
- llm
base_model: meta-llama/Llama-3.2-3B-Instruct
---
# Llama-3.2-3B-Instruct-QCS9075-HTP
This is a pre-compiled version of [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) optimized for the **Qualcomm QCS9075 SoC** using the **Qualcomm Genie SDK**.
## Model Details
- **Base Model**: [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
- **Target Hardware**: Qualcomm QCS9075 (IQ-9075 EVK)
- **Backend**: QnnHtp (NPU)
- **Quantization**: W4A16
- **Compilation**: Qualcomm AI Hub (QAIRT 2.42)
## Performance
| Model | Backend | Performance | Size |
|-------|---------|-------------|------|
| Llama-3.2-3B-Instruct-QCS9075-HTP | QnnHtp (NPU) | ~18.7 TPS on QCS9075 | 2.5G |
*TPS = Tokens Per Second (generation speed)*
## Hardware Requirements
- **Device**: Qualcomm IQ-9075 EVK or QCS9075-based device
- **OS**: Ubuntu 22.04 (recommended)
- **SDK**: Qualcomm Genie SDK
- **QAIRT**: Version 2.42 or later
## Usage
### Prerequisites
1. Install the Qualcomm Genie SDK on your QCS9075 device
2. Download all model files from this repository
3. Ensure QAIRT 2.42 libraries are available
### Environment Setup
For HTP models, the LD_LIBRARY_PATH ordering is critical:
```bash
export LD_LIBRARY_PATH=/opt/qcom/aistack/qairt/2.42.0.250923/lib/aarch64-linux-gnu:/opt/qcom/aistack/genie/qnn/libs:$LD_LIBRARY_PATH
```
### Configuration
Create a `genie_config.json` file:
```json
{
"model_path": "/path/to/model/files",
"backend": "QnnHtp",
"device": "0"
}
```
### Running the Model
```bash
# Using the Genie server
python3 /opt/qcom/aistack/genie/examples/server_persistent.py \
--config genie_config.json \
--port 8000
```
### Kubernetes Deployment
For deploying on Kubernetes clusters with QCS9075 nodes, refer to the deployment pattern:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: genie-llm-server
spec:
containers:
- name: genie
image: your-registry/genie-runtime:latest
env:
- name: LD_LIBRARY_PATH
value: "'/opt/qcom/aistack/qairt/2.42.0.250923/lib/aarch64-linux-gnu:/opt/qcom/aistack/genie/qnn/libs'"
volumeMounts:
- name: model-storage
mountPath: /models
- name: qcom-libs
mountPath: /opt/qcom/aistack
volumes:
- name: model-storage
hostPath:
path: /mnt/models/llama-3.2-3b-instruct-qcs9075-htp
- name: qcom-libs
hostPath:
path: /opt/qcom/aistack
```
## File Structure
This repository contains:
- Compiled model artifacts (.bin files)
- Configuration files (genie_config.json)
- QNN HTP context binaries
## Benchmarking Notes
- Performance metrics measured on Qualcomm IQ-9075 EVK
- TPS (Tokens Per Second) measured during generation phase
- Results may vary based on prompt length and complexity
- HTP backend utilizes the NPU for acceleration
## License
This model follows the license of the base model [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct). Please refer to the original model card for license details.
## Acknowledgments
- Base model: [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
- Compiled using Qualcomm AI Hub with QAIRT 2.42
- Target hardware: Qualcomm QCS9075 SoC
## Support
For issues related to:
- **Model compilation**: Contact Qualcomm AI Hub support
- **Genie SDK**: Refer to Qualcomm Genie documentation
- **Deployment**: Open an issue in this repository
---
*This model is optimized for edge deployment on Qualcomm QCS9075 devices and may not work on other hardware platforms.*