--- license: other tags: - qualcomm - qcs9075 - edge-ai - genie - qnn-htp - llm base_model: meta-llama/Llama-3.2-3B-Instruct --- # Llama-3.2-3B-Instruct-QCS9075-HTP This is a pre-compiled version of [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) optimized for the **Qualcomm QCS9075 SoC** using the **Qualcomm Genie SDK**. ## Model Details - **Base Model**: [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) - **Target Hardware**: Qualcomm QCS9075 (IQ-9075 EVK) - **Backend**: QnnHtp (NPU) - **Quantization**: W4A16 - **Compilation**: Qualcomm AI Hub (QAIRT 2.42) ## Performance | Model | Backend | Performance | Size | |-------|---------|-------------|------| | Llama-3.2-3B-Instruct-QCS9075-HTP | QnnHtp (NPU) | ~18.7 TPS on QCS9075 | 2.5G | *TPS = Tokens Per Second (generation speed)* ## Hardware Requirements - **Device**: Qualcomm IQ-9075 EVK or QCS9075-based device - **OS**: Ubuntu 22.04 (recommended) - **SDK**: Qualcomm Genie SDK - **QAIRT**: Version 2.42 or later ## Usage ### Prerequisites 1. Install the Qualcomm Genie SDK on your QCS9075 device 2. Download all model files from this repository 3. Ensure QAIRT 2.42 libraries are available ### Environment Setup For HTP models, the LD_LIBRARY_PATH ordering is critical: ```bash export LD_LIBRARY_PATH=/opt/qcom/aistack/qairt/2.42.0.250923/lib/aarch64-linux-gnu:/opt/qcom/aistack/genie/qnn/libs:$LD_LIBRARY_PATH ``` ### Configuration Create a `genie_config.json` file: ```json { "model_path": "/path/to/model/files", "backend": "QnnHtp", "device": "0" } ``` ### Running the Model ```bash # Using the Genie server python3 /opt/qcom/aistack/genie/examples/server_persistent.py \ --config genie_config.json \ --port 8000 ``` ### Kubernetes Deployment For deploying on Kubernetes clusters with QCS9075 nodes, refer to the deployment pattern: ```yaml apiVersion: v1 kind: Pod metadata: name: genie-llm-server spec: containers: - name: genie image: your-registry/genie-runtime:latest env: - name: LD_LIBRARY_PATH value: "'/opt/qcom/aistack/qairt/2.42.0.250923/lib/aarch64-linux-gnu:/opt/qcom/aistack/genie/qnn/libs'" volumeMounts: - name: model-storage mountPath: /models - name: qcom-libs mountPath: /opt/qcom/aistack volumes: - name: model-storage hostPath: path: /mnt/models/llama-3.2-3b-instruct-qcs9075-htp - name: qcom-libs hostPath: path: /opt/qcom/aistack ``` ## File Structure This repository contains: - Compiled model artifacts (.bin files) - Configuration files (genie_config.json) - QNN HTP context binaries ## Benchmarking Notes - Performance metrics measured on Qualcomm IQ-9075 EVK - TPS (Tokens Per Second) measured during generation phase - Results may vary based on prompt length and complexity - HTP backend utilizes the NPU for acceleration ## License This model follows the license of the base model [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct). Please refer to the original model card for license details. ## Acknowledgments - Base model: [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) - Compiled using Qualcomm AI Hub with QAIRT 2.42 - Target hardware: Qualcomm QCS9075 SoC ## Support For issues related to: - **Model compilation**: Contact Qualcomm AI Hub support - **Genie SDK**: Refer to Qualcomm Genie documentation - **Deployment**: Open an issue in this repository --- *This model is optimized for edge deployment on Qualcomm QCS9075 devices and may not work on other hardware platforms.*