YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

OpenVLA-OFT LIBERO β€” QNN HTP (FP16, aarch64)

This repository contains QNN HTP backend model artifacts for running OpenVLA-OFT on a Qualcomm Snapdragon device with the Hexagon Tensor Processor (HTP / DSP).

All three sub-models are converted from FP32 ONNX to FP16 QNN and compiled into aarch64 shared libraries (.so) for the aarch64-oe-linux-gcc11.2 target.

CPU (FP32) counterpart: xpuenabler/OpenVLA-OFT-LIBERO-130-QNN-CPU


Repository Structure

.
β”œβ”€β”€ README.md
β”œβ”€β”€ conversion/
β”‚   β”œβ”€β”€ make_llm_v9.py              # Patch llm_backbone_v7.onnx β†’ v9 (required before LLM conversion)
β”‚   β”œβ”€β”€ convert_ve_htp.sh           # Vision Encoder: ONNX β†’ QNN FP16 β†’ libvision_encoder_sliced_htp.so
β”‚   β”œβ”€β”€ convert_ah_htp.sh           # Action Head:   ONNX β†’ QNN FP16 β†’ libaction_head_htp.so
β”‚   └── convert_llm_v9_htp.sh      # LLM Backbone:  ONNX β†’ QNN FP16 β†’ libllm_backbone_htp.so
β”‚
β”œβ”€β”€ onnx/
β”‚   β”œβ”€β”€ vision_encoder.onnx         # VE ONNX (self-contained, 894 KB)
β”‚   β”œβ”€β”€ action_head.onnx            # AH ONNX (self-contained, 4 KB)
β”‚   β”œβ”€β”€ llm_backbone_v9.onnx        # LLM ONNX header (2.3 MB); requires external data files below
β”‚   β”œβ”€β”€ embed_tokens.weight         # LLM external data (502 MB)
β”‚   └── onnx__MatMul_*/             # LLM external data (~25 GB total, 289 files)
β”‚
└── qnn_models/
    β”œβ”€β”€ libvision_encoder_sliced_htp.so   # VE compiled HTP library  (1.5 GB, aarch64)
    β”œβ”€β”€ libaction_head_htp.so             # AH compiled HTP library  (251 MB, aarch64)
    └── libllm_backbone_htp.so            # LLM compiled HTP library (13  GB, aarch64)

Prerequisites

Hardware / OS

  • Build host: Linux x86-64 (Ubuntu 20.04 tested)
  • Target device: Qualcomm SoC with HTP (Snapdragon 8 Gen 2 / 3 / Elite, etc.)

Software

Component Version / Path
Qualcomm AI Runtime (QNN) SDK 2.43.0.260128
Python (QNN converter env) 3.10 (miniforge3/envs/qnn)
LLVM / clang++ 17.0.6 (cross-compile to aarch64)
GCC aarch64 toolchain 11.2.1 (aarch64-none-linux-gnu)
ld.bfd (aarch64) bundled with GCC toolchain

Python packages (qnn conda env)

onnx
onnxruntime
qti.aisw (from QNN SDK lib/python)

Required directories

QNN_ROOT=/path/to/qairt/2.43.0.260128
QENV=/path/to/miniforge3/envs/qnn
LLVM17=/path/to/llvm17/bin               # clang++ 17 with aarch64 target
AARCH64_PREFIX=/path/to/arm_gcc11/bin/aarch64-none-linux-gnu-
SYSROOT=/path/to/arm_gcc11/aarch64-none-linux-gnu/libc

Environment Setup

1. Set up QNN Python environment

# Activate the QNN conda env
conda activate qnn   # Python 3.10

# Add QNN Python bindings to PYTHONPATH
export QNN_ROOT=/path/to/qairt/2.43.0.260128
export DEPS=/path/to/qnn_py310_deps          # extra QNN runtime deps

export LD_LIBRARY_PATH=$QNN_ROOT/lib/x86_64-linux-clang:$LD_LIBRARY_PATH
export PYTHONPATH=$DEPS:$QNN_ROOT/lib/python
export PYTHONNOUSERSITE=1   # prevents user-site package conflicts
export TMPDIR=/tmp/qnn_tmp  # set to a large-capacity directory
mkdir -p $TMPDIR

2. Verify toolchain

# LLVM17 clang++ must support aarch64 target
clang++ --version          # β†’ clang version 17.x
clang++ -print-targets | grep aarch64  # β†’ aarch64

# aarch64-none-linux-gnu-g++ and ld.bfd must exist
aarch64-none-linux-gnu-g++ --version   # β†’ gcc 11.2.1
aarch64-none-linux-gnu-ld.bfd --version

3. Prepare LLM ONNX (v9 patch)

The LLM backbone ONNX must be patched from the original v7 before conversion:

# Requires: llm_backbone_v7.onnx + external data files in the same directory
cd /path/to/onnx_models_fp32/
PYTHONNOUSERSITE=1 python3.10 make_llm_v9.py
# Produces: llm_backbone_v9.onnx

make_llm_v9.py is provided in conversion/.


Conversion

Edit the path variables at the top of each script to match your environment, then run:

Vision Encoder

bash conversion/convert_ve_htp.sh
# Output: $WORK/model_lib_aarch64/libs/aarch64-oe-linux-gcc11.2/libvision_encoder_sliced_htp.so
# Runtime: ~3 min

Key parameters:

  • Input: pixel_values shape 1,6,224,224
  • --float_bitwidth 16 (FP16 for HTP)

Action Head

bash conversion/convert_ah_htp.sh
# Output: $WORK/model_lib_aarch64/libs/aarch64-oe-linux-gcc11.2/libaction_head_htp.so
# Runtime: ~1 min

Key parameters:

  • Input: hidden_states shape 1,440,4096 (Note: ONNX model stores this as [1,440,4096], not [1,4096,440])
  • --float_bitwidth 16

LLM Backbone

bash conversion/convert_llm_v9_htp.sh
# Output: $WORK/model_lib_aarch64/libs/aarch64-oe-linux-gcc11.2/libllm_backbone_htp.so
# Runtime: qnn-onnx-converter ~5 min, compilation ~30 min (13 GB binary)

Key parameters:

  • Inputs: input_ids_no_stop 1,184, patch_features 1,256,4096, attention_mask 1,184
  • --float_bitwidth 16

Step-by-Step Conversion Detail

Each script performs the same three stages:

Stage 1 β€” ONNX β†’ QNN source (qnn-onnx-converter)

$QENV/bin/python3.10 $QNN_ROOT/bin/x86_64-linux-clang/qnn-onnx-converter \
  --input_network <model>.onnx \
  --input_dim <name> <shape> \
  --output_path $WORK/<model> \
  --float_bitwidth 16

Outputs: <model> (C++ source, no extension) + <model>.bin (weight tarball).

Note: The converter produces the source file without a .cpp extension. The scripts copy it to <model>.cpp before compilation.

Stage 2 β€” Build directory setup

# Copy SDK helper files into jni/
SDK_JNI=$QNN_ROOT/share/QNN/converter/jni
cp $SDK_JNI/QnnModel.cpp $SDK_JNI/QnnModel.hpp ... $JNI/
cp $WORK/<model>.cpp $JNI/

# Copy + patch SDK Makefile
cp $QNN_ROOT/share/QNN/converter/Makefile.oe-linux-aarch64-gcc11.2 $BUILD/
# Patch: replace gnu++20 with c++11 + add -Wno-c99-designator
sed -i 's/-std=gnu++20 -D_LINUX_OE_SOURCE -fPIC -Wl,-lstdc++/-std=c++11 -Wno-c99-designator/' \
  $BUILD/Makefile.oe-linux-aarch64-gcc11.2
# LLM only: add -nostartfiles to LDFLAGS
sed -i 's/LDFLAGS += -shared -s/LDFLAGS += -shared -s -nostartfiles/' \
  $BUILD/Makefile.oe-linux-aarch64-gcc11.2

Stage 3 β€” Compile to aarch64 shared library

# VE / AH (standard linker)
CXX_CROSS="$LLVM17/clang++ --target=aarch64-linux-gnu \
           --sysroot=$SYSROOT --gcc-toolchain=$GCC_TOOLCHAIN"

# LLM (bfd linker required for 13 GB binary)
CXX_CROSS="$LLVM17/clang++ --target=aarch64-linux-gnu \
           --sysroot=$SYSROOT --gcc-toolchain=$GCC_TOOLCHAIN \
           -fuse-ld=${AARCH64_PREFIX}ld.bfd"

make -C $BUILD -f Makefile.oe-linux-aarch64-gcc11.2 \
  CXX="$CXX_CROSS" \
  TARGET_PREFIX=${AARCH64_PREFIX} \
  QNN_SDK_ROOT=$QNN_ROOT \
  QNN_MODEL_LIB_NAME=lib<model>_htp

Technical Notes

Why FP16 for HTP?

The Hexagon Tensor Processor natively accelerates FP16 operations. FP32 support on HTP is limited; using --float_bitwidth 16 is recommended for maximum HTP utilization.

qnn-model-lib-generator cannot be used for large models

qnn-model-lib-generator cannot compile the LLM backbone (13 GB binary) because it spawns a subprocess that cannot find the required cross-compiler toolchain environment variables (${TARGET_PREFIX}g++, ${SDKTARGETSYSROOT}). Manual Makefile-based compilation is used instead.

C++ standard: gnu++20 β†’ c++11

The QNN SDK's Makefile.oe-linux-aarch64-gcc11.2 uses -std=gnu++20. However, QNN-generated C++ code mixes designated and non-designated initializers (a C99 extension not fully supported in C++20 strict mode). We patch the Makefile to use -std=c++11 -Wno-c99-designator.

aarch64 large model relocation issue (LLM only)

Compiling a 13 GB shared library with -mcmodel=large produces R_AARCH64_MOVW_UA relocations in .data.rel.ro that neither LLD nor gold can satisfy. The fix is to use the GNU BFD linker (aarch64-none-linux-gnu-ld.bfd) with -nostartfiles:

-fuse-ld=/path/to/aarch64-none-linux-gnu-ld.bfd -nostartfiles

This matches the approach used for the CPU x86-64 build where -fuse-ld=bfd -nostartfiles was needed.

PYTHONNOUSERSITE=1 is mandatory

Without PYTHONNOUSERSITE=1, a user-level numpy installation (Python 3.12 ABI) conflicts with the conda env's Python 3.10/3.11, causing an ImportError when the QNN converter tries to import onnx.


Model Information

Sub-model ONNX Input Precision .so size
Vision Encoder pixel_values [1,6,224,224] FP16 1.5 GB
Action Head hidden_states [1,440,4096] FP16 251 MB
LLM Backbone input_ids_no_stop [1,184], patch_features [1,256,4096], attention_mask [1,184] FP16 13 GB

Base model: openvla/openvla-oft-libero QNN SDK: Qualcomm AI Runtime 2.43.0.260128 Target: aarch64-oe-linux-gcc11.2 (Snapdragon HTP)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support