| --- |
| license: other |
| tags: |
| - qualcomm |
| - qcs9075 |
| - edge-ai |
| - genie |
| - qnn-htp |
| - llm |
| base_model: meta-llama/Llama-3.2-3B-Instruct |
| --- |
| |
| # Llama-3.2-3B-Instruct-QCS9075-HTP |
|
|
| This is a pre-compiled version of [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) optimized for the **Qualcomm QCS9075 SoC** using the **Qualcomm Genie SDK**. |
|
|
| ## Model Details |
|
|
| - **Base Model**: [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) |
| - **Target Hardware**: Qualcomm QCS9075 (IQ-9075 EVK) |
| - **Backend**: QnnHtp (NPU) |
| - **Quantization**: W4A16 |
| - **Compilation**: Qualcomm AI Hub (QAIRT 2.42) |
|
|
| ## Performance |
|
|
| | Model | Backend | Performance | Size | |
| |-------|---------|-------------|------| |
| | Llama-3.2-3B-Instruct-QCS9075-HTP | QnnHtp (NPU) | ~18.7 TPS on QCS9075 | 2.5G | |
|
|
| *TPS = Tokens Per Second (generation speed)* |
|
|
| ## Hardware Requirements |
|
|
| - **Device**: Qualcomm IQ-9075 EVK or QCS9075-based device |
| - **OS**: Ubuntu 22.04 (recommended) |
| - **SDK**: Qualcomm Genie SDK |
| - **QAIRT**: Version 2.42 or later |
|
|
| ## Usage |
|
|
| ### Prerequisites |
|
|
| 1. Install the Qualcomm Genie SDK on your QCS9075 device |
| 2. Download all model files from this repository |
| 3. Ensure QAIRT 2.42 libraries are available |
|
|
| ### Environment Setup |
|
|
| For HTP models, the LD_LIBRARY_PATH ordering is critical: |
|
|
| ```bash |
| export LD_LIBRARY_PATH=/opt/qcom/aistack/qairt/2.42.0.250923/lib/aarch64-linux-gnu:/opt/qcom/aistack/genie/qnn/libs:$LD_LIBRARY_PATH |
| ``` |
|
|
| ### Configuration |
|
|
| Create a `genie_config.json` file: |
|
|
| ```json |
| { |
| "model_path": "/path/to/model/files", |
| "backend": "QnnHtp", |
| "device": "0" |
| } |
| ``` |
|
|
| ### Running the Model |
|
|
| ```bash |
| # Using the Genie server |
| python3 /opt/qcom/aistack/genie/examples/server_persistent.py \ |
| --config genie_config.json \ |
| --port 8000 |
| ``` |
|
|
| ### Kubernetes Deployment |
|
|
| For deploying on Kubernetes clusters with QCS9075 nodes, refer to the deployment pattern: |
|
|
| ```yaml |
| apiVersion: v1 |
| kind: Pod |
| metadata: |
| name: genie-llm-server |
| spec: |
| containers: |
| - name: genie |
| image: your-registry/genie-runtime:latest |
| env: |
| - name: LD_LIBRARY_PATH |
| value: "'/opt/qcom/aistack/qairt/2.42.0.250923/lib/aarch64-linux-gnu:/opt/qcom/aistack/genie/qnn/libs'" |
| volumeMounts: |
| - name: model-storage |
| mountPath: /models |
| - name: qcom-libs |
| mountPath: /opt/qcom/aistack |
| volumes: |
| - name: model-storage |
| hostPath: |
| path: /mnt/models/llama-3.2-3b-instruct-qcs9075-htp |
| - name: qcom-libs |
| hostPath: |
| path: /opt/qcom/aistack |
| ``` |
|
|
| ## File Structure |
|
|
| This repository contains: |
| - Compiled model artifacts (.bin files) |
| - Configuration files (genie_config.json) |
| - QNN HTP context binaries |
| |
| ## Benchmarking Notes |
| |
| - Performance metrics measured on Qualcomm IQ-9075 EVK |
| - TPS (Tokens Per Second) measured during generation phase |
| - Results may vary based on prompt length and complexity |
| - HTP backend utilizes the NPU for acceleration |
| |
| ## License |
| |
| This model follows the license of the base model [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct). Please refer to the original model card for license details. |
| |
| ## Acknowledgments |
| |
| - Base model: [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) |
| - Compiled using Qualcomm AI Hub with QAIRT 2.42 |
| - Target hardware: Qualcomm QCS9075 SoC |
| |
| ## Support |
| |
| For issues related to: |
| - **Model compilation**: Contact Qualcomm AI Hub support |
| - **Genie SDK**: Refer to Qualcomm Genie documentation |
| - **Deployment**: Open an issue in this repository |
| |
| --- |
| |
| *This model is optimized for edge deployment on Qualcomm QCS9075 devices and may not work on other hardware platforms.* |
| |