RunAnywhere Genie NPU Models

Pre-compiled QNN context binaries for Qualcomm Genie SDK, optimized for Snapdragon 8 Elite NPU inference.

Models

Model Params Quantization Size NPU Performance
Llama 3.2 1B Instruct 1B W4 ~1.0 GB ~23 tok/s on S25
Qwen 2.5 7B Instruct 7B W8A16 ~3.9 GB ~5-8 tok/s on S25

Usage

These models are designed for the RunAnywhere SDK with the Genie NPU backend.

Download

# Llama 3.2 1B (recommended for fast inference)
wget https://huggingface.co/runanywhere/genie-npu-models/resolve/main/llama-3.2-1b-instruct-genie-w4.tar.gz

# Qwen 2.5 7B (higher quality, slower)
wget https://huggingface.co/runanywhere/genie-npu-models/resolve/main/qwen2.5-7b-instruct-genie-w8a16.tar.gz

Bundle Contents

Each tar.gz contains:

  • genie_config.json -- Genie SDK configuration
  • htp_backend_ext_config.json -- HTP backend config (Snapdragon 8 Elite, v79)
  • tokenizer.json -- Model tokenizer
  • *_part_N_of_M.bin -- QNN context binary parts

Requirements

  • Qualcomm Snapdragon 8 Elite (or compatible) device
  • QAIRT SDK 2.42.0 runtime libraries
  • RunAnywhere SDK with Genie backend AAR

Compilation Details

Models were compiled using Qualcomm AI Hub with:

  • QAIRT SDK: 2.42.0
  • Target: Snapdragon 8 Elite QRD (soc_model=69, dsp_arch=v79)
  • Context length: 4096 tokens
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support