RunAnywhere Genie NPU Models

Pre-compiled QNN context binaries for Qualcomm Genie SDK, optimized for Snapdragon 8 Elite NPU inference.

Models

Model	Params	Quantization	Size	NPU Performance
Llama 3.2 1B Instruct	1B	W4	~1.0 GB	~23 tok/s on S25
Qwen 2.5 7B Instruct	7B	W8A16	~3.9 GB	~5-8 tok/s on S25

Usage

These models are designed for the RunAnywhere SDK with the Genie NPU backend.

Download

# Llama 3.2 1B (recommended for fast inference)
wget https://huggingface.co/runanywhere/genie-npu-models/resolve/main/llama-3.2-1b-instruct-genie-w4.tar.gz

# Qwen 2.5 7B (higher quality, slower)
wget https://huggingface.co/runanywhere/genie-npu-models/resolve/main/qwen2.5-7b-instruct-genie-w8a16.tar.gz

Bundle Contents

Each tar.gz contains:

genie_config.json -- Genie SDK configuration
htp_backend_ext_config.json -- HTP backend config (Snapdragon 8 Elite, v79)
tokenizer.json -- Model tokenizer
*_part_N_of_M.bin -- QNN context binary parts

Requirements

Qualcomm Snapdragon 8 Elite (or compatible) device
QAIRT SDK 2.42.0 runtime libraries
RunAnywhere SDK with Genie backend AAR

Compilation Details

Models were compiled using Qualcomm AI Hub with:

QAIRT SDK: 2.42.0
Target: Snapdragon 8 Elite QRD (soc_model=69, dsp_arch=v79)
Context length: 4096 tokens

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support