LaMa-Dilated — Hexagon NPU bundle (QHexRT, v79 + v81) — image inpainting / object removal

On-device "Clean Up": erase an object/person from a photo, fully on the Qualcomm Hexagon NPU, no cloud. qualcomm/LaMa-Dilated (an FFC-ResNet inpainting CNN) compiled to run with QHexRT via the qhx_inpaint tool. This is the first image → image bundle (alongside the LLM/VLM/ASR/TTS/embedding *_HNPU repos).

Arch-pinned, multi-arch: each <arch>/ dir is a context binary baked for that target. v79/ = Hexagon v79 (Snapdragon 8 Elite / SM8750, soc_model 69). v81/ = Hexagon v81 (Snapdragon 8 Elite Gen 5 / SM8850, soc_model 87). Pick the dir matching your device.

What it does

Given an RGB image + a binary mask (white = the region to erase), it fills the masked region in one forward pass (no diffusion loop) and returns the painted image. The region outside the mask is preserved exactly (the model composites internally). Use it for object/person removal, watermark/text removal, and defect repair.

How it was built

LaMa's Fast Fourier Convolution sounds NPU-hostile, but at a fixed 512×512 the FFT is linear and collapses to plain Conv/matmul — the exported graph has zero FFT/DFT ops and runs entirely on the NPU (no custom ops). This bundle is Qualcomm's prebuilt QNN DLC finalized to a v79 context bin (fp16). The image+mask preproc and the RGB8 output run host-side in the inpaint host-op.

Files (`v79/`)

file	role
`lama-dilated.json`	QHexRT manifest (family `inpaint`, host-op `inpaint`)
`lama_inpaint_f16.bin`	the LaMa-Dilated context binary (fp16, NHWC, ~98 MB)

No tokenizer/embed (image → image). The QNN runtime libs come from the QAIRT SDK, not this repo.

Run

hf download runanywhere/lama_dilated_HNPU --local-dir lama
adb push lama/v79 /data/local/tmp/wq/lama          # PowerShell + native paths on Windows
adb push photo.png mask.png /data/local/tmp/wq/lama/
adb shell "cd /data/local/tmp/wq && export ADSP_LIBRARY_PATH='/data/local/tmp/wq;/vendor/dsp/cdsp'; \
  LD_LIBRARY_PATH=. ./qhx_inpaint lama/lama-dilated.json libQnnHtp.so libQnnSystem.so lama \
  photo.png mask.png out.png"

The mask is a grayscale PNG: white (255) = erase, black (0) = keep. Inputs are resized to 512×512; the output is 512×512.

Measured (Samsung S25 / SM8750 / Hexagon v79)

PSNR 65.9 dB (raw graph) / 53.1 dB end-to-end (qhx_inpaint, 8-bit PNG) vs the onnxruntime fp32 reference — the object is correctly removed; outside the mask is preserved exactly.
~84 ms end-to-end @ 512×512 (NPU graph ~67 ms + host load/resize/PNG-encode).

Caveats (honest)

Fixed 512×512. Inputs are resized to 512 and the output is 512; a full-resolution paste-back is host glue not included here.
Weights are "Dilated CelebA-HQ" (face-centric). General object removal on arbitrary scenes is better served by the Places/Big-LaMa checkpoint (not included).

License

Apache-2.0 (the upstream advimman/lama / qualcomm/LaMa-Dilated, © Samsung Research) — commercially usable.

v81 (SM8850 / soc_model 87)

The v81/ bundle is device-validated on SM8850: a central-masked dahlia photo inpaints cleanly — the region outside the mask is preserved bit-exact (mean|diff|=0.00, per-channel corr 1.0000 vs the input) and the masked region is plausibly filled, identical behavior to v79 (which gated at PSNR 65.9 dB vs the onnxruntime fp32 reference on the same graph). Graph ~49 ms.

Because Qualcomm's published DLC is a QAIRT-2.45 artifact that the 2.47 toolchain cannot load, the v81 bin is re-converted from Qualcomm's float ONNX with qairt-converter (f16 weights, f32 NHWC I/O preserved to match the qhx_inpaint host-op) + the {"O":3,"vtcm_mb":8,"dlbc":1} HTP graph-config (required on v81). The graph is numerically identical (zero FFT ops; same FFC-ResNet).

Downloads last month: 26

Model tree for runanywhere/lama_dilated_HNPU

Base model

qualcomm/LaMa-Dilated

Finetuned

(1)

this model