LaMa-Dilated β Hexagon NPU bundle (QHexRT, v79 + v81) β image inpainting / object removal
On-device "Clean Up": erase an object/person from a photo, fully on the Qualcomm Hexagon NPU, no cloud.
qualcomm/LaMa-Dilated (an FFC-ResNet inpainting CNN) compiled to run with
QHexRT via the qhx_inpaint tool. This is the first image β image
bundle (alongside the LLM/VLM/ASR/TTS/embedding *_HNPU repos).
Arch-pinned, multi-arch: each <arch>/ dir is a context binary baked for that target. v79/ = Hexagon v79
(Snapdragon 8 Elite / SM8750, soc_model 69). v81/ = Hexagon v81 (Snapdragon 8 Elite Gen 5 / SM8850, soc_model 87).
Pick the dir matching your device.
What it does
Given an RGB image + a binary mask (white = the region to erase), it fills the masked region in one forward pass (no diffusion loop) and returns the painted image. The region outside the mask is preserved exactly (the model composites internally). Use it for object/person removal, watermark/text removal, and defect repair.
How it was built
LaMa's Fast Fourier Convolution sounds NPU-hostile, but at a fixed 512Γ512 the FFT is linear and collapses
to plain Conv/matmul β the exported graph has zero FFT/DFT ops and runs entirely on the NPU (no custom
ops). This bundle is Qualcomm's prebuilt QNN DLC finalized to a v79 context bin (fp16). The image+mask preproc and
the RGB8 output run host-side in the inpaint host-op.
Files (v79/)
| file | role |
|---|---|
lama-dilated.json |
QHexRT manifest (family inpaint, host-op inpaint) |
lama_inpaint_f16.bin |
the LaMa-Dilated context binary (fp16, NHWC, ~98 MB) |
No tokenizer/embed (image β image). The QNN runtime libs come from the QAIRT SDK, not this repo.
Run
hf download runanywhere/lama_dilated_HNPU --local-dir lama
adb push lama/v79 /data/local/tmp/wq/lama # PowerShell + native paths on Windows
adb push photo.png mask.png /data/local/tmp/wq/lama/
adb shell "cd /data/local/tmp/wq && export ADSP_LIBRARY_PATH='/data/local/tmp/wq;/vendor/dsp/cdsp'; \
LD_LIBRARY_PATH=. ./qhx_inpaint lama/lama-dilated.json libQnnHtp.so libQnnSystem.so lama \
photo.png mask.png out.png"
The mask is a grayscale PNG: white (255) = erase, black (0) = keep. Inputs are resized to 512Γ512; the output is 512Γ512.
Measured (Samsung S25 / SM8750 / Hexagon v79)
- PSNR 65.9 dB (raw graph) / 53.1 dB end-to-end (
qhx_inpaint, 8-bit PNG) vs the onnxruntime fp32 reference β the object is correctly removed; outside the mask is preserved exactly. - ~84 ms end-to-end @ 512Γ512 (NPU graph ~67 ms + host load/resize/PNG-encode).
Caveats (honest)
- Fixed 512Γ512. Inputs are resized to 512 and the output is 512; a full-resolution paste-back is host glue not included here.
- Weights are "Dilated CelebA-HQ" (face-centric). General object removal on arbitrary scenes is better served by the Places/Big-LaMa checkpoint (not included).
License
Apache-2.0 (the upstream advimman/lama / qualcomm/LaMa-Dilated, Β© Samsung Research) β commercially usable.
v81 (SM8850 / soc_model 87)
The v81/ bundle is device-validated on SM8850: a central-masked dahlia photo inpaints cleanly β the region
outside the mask is preserved bit-exact (mean|diff|=0.00, per-channel corr 1.0000 vs the input) and the masked
region is plausibly filled, identical behavior to v79 (which gated at PSNR 65.9 dB vs the onnxruntime fp32 reference
on the same graph). Graph ~49 ms.
Because Qualcomm's published DLC is a QAIRT-2.45 artifact that the 2.47 toolchain cannot load, the v81 bin is
re-converted from Qualcomm's float ONNX with qairt-converter (f16 weights, f32 NHWC I/O preserved to match
the qhx_inpaint host-op) + the {"O":3,"vtcm_mb":8,"dlbc":1} HTP graph-config (required on v81). The graph is
numerically identical (zero FFT ops; same FFC-ResNet).
- Downloads last month
- 26
Model tree for runanywhere/lama_dilated_HNPU
Base model
qualcomm/LaMa-Dilated