NAFNet-SIDD-width32 β€” LiteRT (on-device image denoising, fully-GPU)

NAFNet (Nonlinear Activation Free Network, ECCV 2022) image restoration, converted to LiteRT and running fully on the CompiledModel GPU (ML Drift) on Android. This is the SIDD-width32 variant β€” real-image denoising. NAFNet is a U-Net of NAFBlocks with no activation functions (SimpleGate = channel-split multiply), so the whole network is a clean CNN on the GPU.

NAFNet-SIDD β€” noisy input | denoised (on-device LiteRT GPU)

On-device (Pixel 8a, Tensor G3 β€” verified)

nodes on GPU 2179 / 2179 LITERT_CL (full residency)
inference ~46 ms (256Γ—256)
size 62.5 MB (fp16)
accuracy device output == PyTorch (corr 0.999999) β€” re-authoring is numerically exact
image[1,3,256,256] (RGB [0,1]) β†’[GPU: NAFNet U-Net]β†’ denoised[1,3,256,256]

How it converts (litert-torch)

Pure CNN (no activations). Three numerically-exact re-authorings, the headline being SafeLayerNorm: NAFNet's residual stream grows large (|x|β‰ˆ175 at the bottleneck), so the LayerNorm channel reductions Ξ£_c x and Ξ£_c (xβˆ’ΞΌ)Β² (~15M) overflow fp16 (max 65504) on the Mali delegate (which computes in fp16 regardless of the model dtype) β†’ a grid artifact. Doing the reductions in a down-scaled x/S domain (S=128) and rescaling is exact and fp16-safe. Plus the Simplified Channel Attention AdaptiveAvgPool2d(1) β†’ mean(3).mean(2), and the upsample Conv2d(1Γ—1)+PixelShuffle(2) β†’ depth-to-space ZeroStuffConvT2d.

Result: banned ops NONE, all tensors ≀4D, tflite-vs-torch corr 1.0, device-vs-torch corr 1.0.

A complete Android sample (image picker + before/after) is in the official google-ai-edge/litert-samples repo under compiled_model_api/image_restoration (push this .tflite in place of the deblur model).

License

MIT. Upstream: megvii-research/NAFNet; weights NAFNet-SIDD-width32.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support