File size: 3,685 Bytes
e148d8a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
license: other
tags:
  - qualcomm
  - qcs9075
  - edge-ai
  - genie
  - qnn-htp
  - llm
base_model: meta-llama/Llama-3.2-3B-Instruct
---

# Llama-3.2-3B-Instruct-QCS9075-HTP

This is a pre-compiled version of [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) optimized for the **Qualcomm QCS9075 SoC** using the **Qualcomm Genie SDK**.

## Model Details

- **Base Model**: [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
- **Target Hardware**: Qualcomm QCS9075 (IQ-9075 EVK)
- **Backend**: QnnHtp (NPU)
- **Quantization**: W4A16
- **Compilation**: Qualcomm AI Hub (QAIRT 2.42)

## Performance

| Model | Backend | Performance | Size |
|-------|---------|-------------|------|
| Llama-3.2-3B-Instruct-QCS9075-HTP | QnnHtp (NPU) | ~18.7 TPS on QCS9075 | 2.5G |

*TPS = Tokens Per Second (generation speed)*

## Hardware Requirements

- **Device**: Qualcomm IQ-9075 EVK or QCS9075-based device
- **OS**: Ubuntu 22.04 (recommended)
- **SDK**: Qualcomm Genie SDK
- **QAIRT**: Version 2.42 or later

## Usage

### Prerequisites

1. Install the Qualcomm Genie SDK on your QCS9075 device
2. Download all model files from this repository
3. Ensure QAIRT 2.42 libraries are available

### Environment Setup

For HTP models, the LD_LIBRARY_PATH ordering is critical:

```bash
export LD_LIBRARY_PATH=/opt/qcom/aistack/qairt/2.42.0.250923/lib/aarch64-linux-gnu:/opt/qcom/aistack/genie/qnn/libs:$LD_LIBRARY_PATH
```

### Configuration

Create a `genie_config.json` file:

```json
{
  "model_path": "/path/to/model/files",
  "backend": "QnnHtp",
  "device": "0"
}
```

### Running the Model

```bash
# Using the Genie server
python3 /opt/qcom/aistack/genie/examples/server_persistent.py \
  --config genie_config.json \
  --port 8000
```

### Kubernetes Deployment

For deploying on Kubernetes clusters with QCS9075 nodes, refer to the deployment pattern:

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: genie-llm-server
spec:
  containers:
  - name: genie
    image: your-registry/genie-runtime:latest
    env:
    - name: LD_LIBRARY_PATH
      value: "'/opt/qcom/aistack/qairt/2.42.0.250923/lib/aarch64-linux-gnu:/opt/qcom/aistack/genie/qnn/libs'"
    volumeMounts:
    - name: model-storage
      mountPath: /models
    - name: qcom-libs
      mountPath: /opt/qcom/aistack
  volumes:
  - name: model-storage
    hostPath:
      path: /mnt/models/llama-3.2-3b-instruct-qcs9075-htp
  - name: qcom-libs
    hostPath:
      path: /opt/qcom/aistack
```

## File Structure

This repository contains:
- Compiled model artifacts (.bin files)
- Configuration files (genie_config.json)
- QNN HTP context binaries

## Benchmarking Notes

- Performance metrics measured on Qualcomm IQ-9075 EVK
- TPS (Tokens Per Second) measured during generation phase
- Results may vary based on prompt length and complexity
- HTP backend utilizes the NPU for acceleration

## License

This model follows the license of the base model [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct). Please refer to the original model card for license details.

## Acknowledgments

- Base model: [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)
- Compiled using Qualcomm AI Hub with QAIRT 2.42
- Target hardware: Qualcomm QCS9075 SoC

## Support

For issues related to:
- **Model compilation**: Contact Qualcomm AI Hub support
- **Genie SDK**: Refer to Qualcomm Genie documentation
- **Deployment**: Open an issue in this repository

---

*This model is optimized for edge deployment on Qualcomm QCS9075 devices and may not work on other hardware platforms.*