YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
OpenVLA-OFT LIBERO β QNN HTP (FP16, aarch64)
This repository contains QNN HTP backend model artifacts for running OpenVLA-OFT on a Qualcomm Snapdragon device with the Hexagon Tensor Processor (HTP / DSP).
All three sub-models are converted from FP32 ONNX to FP16 QNN and
compiled into aarch64 shared libraries (.so) for the
aarch64-oe-linux-gcc11.2 target.
CPU (FP32) counterpart:
xpuenabler/OpenVLA-OFT-LIBERO-130-QNN-CPU
Repository Structure
.
βββ README.md
βββ conversion/
β βββ make_llm_v9.py # Patch llm_backbone_v7.onnx β v9 (required before LLM conversion)
β βββ convert_ve_htp.sh # Vision Encoder: ONNX β QNN FP16 β libvision_encoder_sliced_htp.so
β βββ convert_ah_htp.sh # Action Head: ONNX β QNN FP16 β libaction_head_htp.so
β βββ convert_llm_v9_htp.sh # LLM Backbone: ONNX β QNN FP16 β libllm_backbone_htp.so
β
βββ onnx/
β βββ vision_encoder.onnx # VE ONNX (self-contained, 894 KB)
β βββ action_head.onnx # AH ONNX (self-contained, 4 KB)
β βββ llm_backbone_v9.onnx # LLM ONNX header (2.3 MB); requires external data files below
β βββ embed_tokens.weight # LLM external data (502 MB)
β βββ onnx__MatMul_*/ # LLM external data (~25 GB total, 289 files)
β
βββ qnn_models/
βββ libvision_encoder_sliced_htp.so # VE compiled HTP library (1.5 GB, aarch64)
βββ libaction_head_htp.so # AH compiled HTP library (251 MB, aarch64)
βββ libllm_backbone_htp.so # LLM compiled HTP library (13 GB, aarch64)
Prerequisites
Hardware / OS
- Build host: Linux x86-64 (Ubuntu 20.04 tested)
- Target device: Qualcomm SoC with HTP (Snapdragon 8 Gen 2 / 3 / Elite, etc.)
Software
| Component | Version / Path |
|---|---|
| Qualcomm AI Runtime (QNN) SDK | 2.43.0.260128 |
| Python (QNN converter env) | 3.10 (miniforge3/envs/qnn) |
| LLVM / clang++ | 17.0.6 (cross-compile to aarch64) |
| GCC aarch64 toolchain | 11.2.1 (aarch64-none-linux-gnu) |
| ld.bfd (aarch64) | bundled with GCC toolchain |
Python packages (qnn conda env)
onnx
onnxruntime
qti.aisw (from QNN SDK lib/python)
Required directories
QNN_ROOT=/path/to/qairt/2.43.0.260128
QENV=/path/to/miniforge3/envs/qnn
LLVM17=/path/to/llvm17/bin # clang++ 17 with aarch64 target
AARCH64_PREFIX=/path/to/arm_gcc11/bin/aarch64-none-linux-gnu-
SYSROOT=/path/to/arm_gcc11/aarch64-none-linux-gnu/libc
Environment Setup
1. Set up QNN Python environment
# Activate the QNN conda env
conda activate qnn # Python 3.10
# Add QNN Python bindings to PYTHONPATH
export QNN_ROOT=/path/to/qairt/2.43.0.260128
export DEPS=/path/to/qnn_py310_deps # extra QNN runtime deps
export LD_LIBRARY_PATH=$QNN_ROOT/lib/x86_64-linux-clang:$LD_LIBRARY_PATH
export PYTHONPATH=$DEPS:$QNN_ROOT/lib/python
export PYTHONNOUSERSITE=1 # prevents user-site package conflicts
export TMPDIR=/tmp/qnn_tmp # set to a large-capacity directory
mkdir -p $TMPDIR
2. Verify toolchain
# LLVM17 clang++ must support aarch64 target
clang++ --version # β clang version 17.x
clang++ -print-targets | grep aarch64 # β aarch64
# aarch64-none-linux-gnu-g++ and ld.bfd must exist
aarch64-none-linux-gnu-g++ --version # β gcc 11.2.1
aarch64-none-linux-gnu-ld.bfd --version
3. Prepare LLM ONNX (v9 patch)
The LLM backbone ONNX must be patched from the original v7 before conversion:
# Requires: llm_backbone_v7.onnx + external data files in the same directory
cd /path/to/onnx_models_fp32/
PYTHONNOUSERSITE=1 python3.10 make_llm_v9.py
# Produces: llm_backbone_v9.onnx
make_llm_v9.py is provided in conversion/.
Conversion
Edit the path variables at the top of each script to match your environment, then run:
Vision Encoder
bash conversion/convert_ve_htp.sh
# Output: $WORK/model_lib_aarch64/libs/aarch64-oe-linux-gcc11.2/libvision_encoder_sliced_htp.so
# Runtime: ~3 min
Key parameters:
- Input:
pixel_valuesshape1,6,224,224 --float_bitwidth 16(FP16 for HTP)
Action Head
bash conversion/convert_ah_htp.sh
# Output: $WORK/model_lib_aarch64/libs/aarch64-oe-linux-gcc11.2/libaction_head_htp.so
# Runtime: ~1 min
Key parameters:
- Input:
hidden_statesshape1,440,4096(Note: ONNX model stores this as[1,440,4096], not[1,4096,440]) --float_bitwidth 16
LLM Backbone
bash conversion/convert_llm_v9_htp.sh
# Output: $WORK/model_lib_aarch64/libs/aarch64-oe-linux-gcc11.2/libllm_backbone_htp.so
# Runtime: qnn-onnx-converter ~5 min, compilation ~30 min (13 GB binary)
Key parameters:
- Inputs:
input_ids_no_stop 1,184,patch_features 1,256,4096,attention_mask 1,184 --float_bitwidth 16
Step-by-Step Conversion Detail
Each script performs the same three stages:
Stage 1 β ONNX β QNN source (qnn-onnx-converter)
$QENV/bin/python3.10 $QNN_ROOT/bin/x86_64-linux-clang/qnn-onnx-converter \
--input_network <model>.onnx \
--input_dim <name> <shape> \
--output_path $WORK/<model> \
--float_bitwidth 16
Outputs: <model> (C++ source, no extension) + <model>.bin (weight tarball).
Note: The converter produces the source file without a
.cppextension. The scripts copy it to<model>.cppbefore compilation.
Stage 2 β Build directory setup
# Copy SDK helper files into jni/
SDK_JNI=$QNN_ROOT/share/QNN/converter/jni
cp $SDK_JNI/QnnModel.cpp $SDK_JNI/QnnModel.hpp ... $JNI/
cp $WORK/<model>.cpp $JNI/
# Copy + patch SDK Makefile
cp $QNN_ROOT/share/QNN/converter/Makefile.oe-linux-aarch64-gcc11.2 $BUILD/
# Patch: replace gnu++20 with c++11 + add -Wno-c99-designator
sed -i 's/-std=gnu++20 -D_LINUX_OE_SOURCE -fPIC -Wl,-lstdc++/-std=c++11 -Wno-c99-designator/' \
$BUILD/Makefile.oe-linux-aarch64-gcc11.2
# LLM only: add -nostartfiles to LDFLAGS
sed -i 's/LDFLAGS += -shared -s/LDFLAGS += -shared -s -nostartfiles/' \
$BUILD/Makefile.oe-linux-aarch64-gcc11.2
Stage 3 β Compile to aarch64 shared library
# VE / AH (standard linker)
CXX_CROSS="$LLVM17/clang++ --target=aarch64-linux-gnu \
--sysroot=$SYSROOT --gcc-toolchain=$GCC_TOOLCHAIN"
# LLM (bfd linker required for 13 GB binary)
CXX_CROSS="$LLVM17/clang++ --target=aarch64-linux-gnu \
--sysroot=$SYSROOT --gcc-toolchain=$GCC_TOOLCHAIN \
-fuse-ld=${AARCH64_PREFIX}ld.bfd"
make -C $BUILD -f Makefile.oe-linux-aarch64-gcc11.2 \
CXX="$CXX_CROSS" \
TARGET_PREFIX=${AARCH64_PREFIX} \
QNN_SDK_ROOT=$QNN_ROOT \
QNN_MODEL_LIB_NAME=lib<model>_htp
Technical Notes
Why FP16 for HTP?
The Hexagon Tensor Processor natively accelerates FP16 operations.
FP32 support on HTP is limited; using --float_bitwidth 16 is recommended
for maximum HTP utilization.
qnn-model-lib-generator cannot be used for large models
qnn-model-lib-generator cannot compile the LLM backbone (13 GB binary)
because it spawns a subprocess that cannot find the required cross-compiler
toolchain environment variables (${TARGET_PREFIX}g++, ${SDKTARGETSYSROOT}).
Manual Makefile-based compilation is used instead.
C++ standard: gnu++20 β c++11
The QNN SDK's Makefile.oe-linux-aarch64-gcc11.2 uses -std=gnu++20.
However, QNN-generated C++ code mixes designated and non-designated
initializers (a C99 extension not fully supported in C++20 strict mode).
We patch the Makefile to use -std=c++11 -Wno-c99-designator.
aarch64 large model relocation issue (LLM only)
Compiling a 13 GB shared library with -mcmodel=large produces
R_AARCH64_MOVW_UA relocations in .data.rel.ro that neither LLD nor gold
can satisfy. The fix is to use the GNU BFD linker
(aarch64-none-linux-gnu-ld.bfd) with -nostartfiles:
-fuse-ld=/path/to/aarch64-none-linux-gnu-ld.bfd -nostartfiles
This matches the approach used for the CPU x86-64 build where
-fuse-ld=bfd -nostartfiles was needed.
PYTHONNOUSERSITE=1 is mandatory
Without PYTHONNOUSERSITE=1, a user-level numpy installation (Python 3.12
ABI) conflicts with the conda env's Python 3.10/3.11, causing an
ImportError when the QNN converter tries to import onnx.
Model Information
| Sub-model | ONNX Input | Precision | .so size |
|---|---|---|---|
| Vision Encoder | pixel_values [1,6,224,224] |
FP16 | 1.5 GB |
| Action Head | hidden_states [1,440,4096] |
FP16 | 251 MB |
| LLM Backbone | input_ids_no_stop [1,184], patch_features [1,256,4096], attention_mask [1,184] |
FP16 | 13 GB |
Base model: openvla/openvla-oft-libero
QNN SDK: Qualcomm AI Runtime 2.43.0.260128
Target: aarch64-oe-linux-gcc11.2 (Snapdragon HTP)