YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

OpenVLA-OFT LIBERO — QNN HTP (FP16, aarch64)

This repository contains QNN HTP backend model artifacts for running OpenVLA-OFT on a Qualcomm Snapdragon device with the Hexagon Tensor Processor (HTP / DSP).

All three sub-models are converted from FP32 ONNX to FP16 QNN and compiled into aarch64 shared libraries (.so) for the aarch64-oe-linux-gcc11.2 target.

CPU (FP32) counterpart: xpuenabler/OpenVLA-OFT-LIBERO-130-QNN-CPU

Repository Structure

.
├── README.md
├── conversion/
│   ├── make_llm_v9.py              # Patch llm_backbone_v7.onnx → v9 (required before LLM conversion)
│   ├── convert_ve_htp.sh           # Vision Encoder: ONNX → QNN FP16 → libvision_encoder_sliced_htp.so
│   ├── convert_ah_htp.sh           # Action Head:   ONNX → QNN FP16 → libaction_head_htp.so
│   └── convert_llm_v9_htp.sh      # LLM Backbone:  ONNX → QNN FP16 → libllm_backbone_htp.so
│
├── onnx/
│   ├── vision_encoder.onnx         # VE ONNX (self-contained, 894 KB)
│   ├── action_head.onnx            # AH ONNX (self-contained, 4 KB)
│   ├── llm_backbone_v9.onnx        # LLM ONNX header (2.3 MB); requires external data files below
│   ├── embed_tokens.weight         # LLM external data (502 MB)
│   └── onnx__MatMul_*/             # LLM external data (~25 GB total, 289 files)
│
└── qnn_models/
    ├── libvision_encoder_sliced_htp.so   # VE compiled HTP library  (1.5 GB, aarch64)
    ├── libaction_head_htp.so             # AH compiled HTP library  (251 MB, aarch64)
    └── libllm_backbone_htp.so            # LLM compiled HTP library (13  GB, aarch64)

Prerequisites

Hardware / OS

Build host: Linux x86-64 (Ubuntu 20.04 tested)
Target device: Qualcomm SoC with HTP (Snapdragon 8 Gen 2 / 3 / Elite, etc.)

Software

Component	Version / Path
Qualcomm AI Runtime (QNN) SDK	2.43.0.260128
Python (QNN converter env)	3.10 (`miniforge3/envs/qnn`)
LLVM / clang++	17.0.6 (cross-compile to aarch64)
GCC aarch64 toolchain	11.2.1 (`aarch64-none-linux-gnu`)
ld.bfd (aarch64)	bundled with GCC toolchain

Python packages (`qnn` conda env)

onnx
onnxruntime
qti.aisw (from QNN SDK lib/python)

Required directories

QNN_ROOT=/path/to/qairt/2.43.0.260128
QENV=/path/to/miniforge3/envs/qnn
LLVM17=/path/to/llvm17/bin               # clang++ 17 with aarch64 target
AARCH64_PREFIX=/path/to/arm_gcc11/bin/aarch64-none-linux-gnu-
SYSROOT=/path/to/arm_gcc11/aarch64-none-linux-gnu/libc

Environment Setup

1. Set up QNN Python environment

# Activate the QNN conda env
conda activate qnn   # Python 3.10

# Add QNN Python bindings to PYTHONPATH
export QNN_ROOT=/path/to/qairt/2.43.0.260128
export DEPS=/path/to/qnn_py310_deps          # extra QNN runtime deps

export LD_LIBRARY_PATH=$QNN_ROOT/lib/x86_64-linux-clang:$LD_LIBRARY_PATH
export PYTHONPATH=$DEPS:$QNN_ROOT/lib/python
export PYTHONNOUSERSITE=1   # prevents user-site package conflicts
export TMPDIR=/tmp/qnn_tmp  # set to a large-capacity directory
mkdir -p $TMPDIR

2. Verify toolchain

# LLVM17 clang++ must support aarch64 target
clang++ --version          # → clang version 17.x
clang++ -print-targets | grep aarch64  # → aarch64

# aarch64-none-linux-gnu-g++ and ld.bfd must exist
aarch64-none-linux-gnu-g++ --version   # → gcc 11.2.1
aarch64-none-linux-gnu-ld.bfd --version

3. Prepare LLM ONNX (v9 patch)

The LLM backbone ONNX must be patched from the original v7 before conversion:

# Requires: llm_backbone_v7.onnx + external data files in the same directory
cd /path/to/onnx_models_fp32/
PYTHONNOUSERSITE=1 python3.10 make_llm_v9.py
# Produces: llm_backbone_v9.onnx

make_llm_v9.py is provided in conversion/.

Conversion

Edit the path variables at the top of each script to match your environment, then run:

Vision Encoder

bash conversion/convert_ve_htp.sh
# Output: $WORK/model_lib_aarch64/libs/aarch64-oe-linux-gcc11.2/libvision_encoder_sliced_htp.so
# Runtime: ~3 min

Key parameters:

Input: pixel_values shape 1,6,224,224
--float_bitwidth 16 (FP16 for HTP)

Action Head

bash conversion/convert_ah_htp.sh
# Output: $WORK/model_lib_aarch64/libs/aarch64-oe-linux-gcc11.2/libaction_head_htp.so
# Runtime: ~1 min

Key parameters:

Input: hidden_states shape 1,440,4096 (Note: ONNX model stores this as [1,440,4096], not [1,4096,440])
--float_bitwidth 16

LLM Backbone

bash conversion/convert_llm_v9_htp.sh
# Output: $WORK/model_lib_aarch64/libs/aarch64-oe-linux-gcc11.2/libllm_backbone_htp.so
# Runtime: qnn-onnx-converter ~5 min, compilation ~30 min (13 GB binary)

Key parameters:

Inputs: input_ids_no_stop 1,184, patch_features 1,256,4096, attention_mask 1,184
--float_bitwidth 16

Step-by-Step Conversion Detail

Each script performs the same three stages:

Stage 1 — ONNX → QNN source (qnn-onnx-converter)

$QENV/bin/python3.10 $QNN_ROOT/bin/x86_64-linux-clang/qnn-onnx-converter \
  --input_network <model>.onnx \
  --input_dim <name> <shape> \
  --output_path $WORK/<model> \
  --float_bitwidth 16

Outputs: <model> (C++ source, no extension) + <model>.bin (weight tarball).

Note: The converter produces the source file without a .cpp extension. The scripts copy it to <model>.cpp before compilation.

Stage 2 — Build directory setup

# Copy SDK helper files into jni/
SDK_JNI=$QNN_ROOT/share/QNN/converter/jni
cp $SDK_JNI/QnnModel.cpp $SDK_JNI/QnnModel.hpp ... $JNI/
cp $WORK/<model>.cpp $JNI/

# Copy + patch SDK Makefile
cp $QNN_ROOT/share/QNN/converter/Makefile.oe-linux-aarch64-gcc11.2 $BUILD/
# Patch: replace gnu++20 with c++11 + add -Wno-c99-designator
sed -i 's/-std=gnu++20 -D_LINUX_OE_SOURCE -fPIC -Wl,-lstdc++/-std=c++11 -Wno-c99-designator/' \
  $BUILD/Makefile.oe-linux-aarch64-gcc11.2
# LLM only: add -nostartfiles to LDFLAGS
sed -i 's/LDFLAGS += -shared -s/LDFLAGS += -shared -s -nostartfiles/' \
  $BUILD/Makefile.oe-linux-aarch64-gcc11.2

Stage 3 — Compile to aarch64 shared library

# VE / AH (standard linker)
CXX_CROSS="$LLVM17/clang++ --target=aarch64-linux-gnu \
           --sysroot=$SYSROOT --gcc-toolchain=$GCC_TOOLCHAIN"

# LLM (bfd linker required for 13 GB binary)
CXX_CROSS="$LLVM17/clang++ --target=aarch64-linux-gnu \
           --sysroot=$SYSROOT --gcc-toolchain=$GCC_TOOLCHAIN \
           -fuse-ld=${AARCH64_PREFIX}ld.bfd"

make -C $BUILD -f Makefile.oe-linux-aarch64-gcc11.2 \
  CXX="$CXX_CROSS" \
  TARGET_PREFIX=${AARCH64_PREFIX} \
  QNN_SDK_ROOT=$QNN_ROOT \
  QNN_MODEL_LIB_NAME=lib<model>_htp

Technical Notes

Why FP16 for HTP?

The Hexagon Tensor Processor natively accelerates FP16 operations. FP32 support on HTP is limited; using --float_bitwidth 16 is recommended for maximum HTP utilization.

`qnn-model-lib-generator` cannot be used for large models

qnn-model-lib-generator cannot compile the LLM backbone (13 GB binary) because it spawns a subprocess that cannot find the required cross-compiler toolchain environment variables (${TARGET_PREFIX}g++, ${SDKTARGETSYSROOT}). Manual Makefile-based compilation is used instead.

C++ standard: gnu++20 → c++11

The QNN SDK's Makefile.oe-linux-aarch64-gcc11.2 uses -std=gnu++20. However, QNN-generated C++ code mixes designated and non-designated initializers (a C99 extension not fully supported in C++20 strict mode). We patch the Makefile to use -std=c++11 -Wno-c99-designator.

aarch64 large model relocation issue (LLM only)

Compiling a 13 GB shared library with -mcmodel=large produces R_AARCH64_MOVW_UA relocations in .data.rel.ro that neither LLD nor gold can satisfy. The fix is to use the GNU BFD linker (aarch64-none-linux-gnu-ld.bfd) with -nostartfiles:

-fuse-ld=/path/to/aarch64-none-linux-gnu-ld.bfd -nostartfiles

This matches the approach used for the CPU x86-64 build where -fuse-ld=bfd -nostartfiles was needed.

PYTHONNOUSERSITE=1 is mandatory

Without PYTHONNOUSERSITE=1, a user-level numpy installation (Python 3.12 ABI) conflicts with the conda env's Python 3.10/3.11, causing an ImportError when the QNN converter tries to import onnx.

Model Information

Sub-model	ONNX Input	Precision	.so size
Vision Encoder	`pixel_values [1,6,224,224]`	FP16	1.5 GB
Action Head	`hidden_states [1,440,4096]`	FP16	251 MB
LLM Backbone	`input_ids_no_stop [1,184]`, `patch_features [1,256,4096]`, `attention_mask [1,184]`	FP16	13 GB

Base model: openvla/openvla-oft-libero QNN SDK: Qualcomm AI Runtime 2.43.0.260128 Target: aarch64-oe-linux-gcc11.2 (Snapdragon HTP)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

OpenVLA-OFT LIBERO — QNN HTP (FP16, aarch64)

Repository Structure

Prerequisites

Hardware / OS

Software

Python packages (qnn conda env)

Required directories

Environment Setup

1. Set up QNN Python environment

2. Verify toolchain

3. Prepare LLM ONNX (v9 patch)

Conversion

Vision Encoder

Action Head

LLM Backbone

Step-by-Step Conversion Detail

Stage 1 — ONNX → QNN source (qnn-onnx-converter)

Stage 2 — Build directory setup

Stage 3 — Compile to aarch64 shared library

Technical Notes

Why FP16 for HTP?

qnn-model-lib-generator cannot be used for large models

C++ standard: gnu++20 → c++11

aarch64 large model relocation issue (LLM only)

PYTHONNOUSERSITE=1 is mandatory

Model Information

Python packages (`qnn` conda env)

`qnn-model-lib-generator` cannot be used for large models