--- license: mit library_name: LiteRT pipeline_tag: image-to-image tags: - litert - tflite - on-device - android - gpu - image-restoration - deblurring - nafnet base_model: megvii-research/NAFNet --- # NAFNet-GoPro-width32 — LiteRT (on-device image deblur, fully-GPU) [NAFNet](https://github.com/megvii-research/NAFNet) (Nonlinear Activation Free Network, ECCV 2022) image restoration, converted to **LiteRT** and running **fully on the `CompiledModel` GPU** (ML Drift) on Android. NAFNet is a U-Net of **NAFBlocks** with **no activation functions at all** (SimpleGate = channel-split multiply), so the whole network is a clean CNN on the GPU delegate. This is the **GoPro-width32** variant — motion deblur. ![NAFNet — blurry input | restored (on-device LiteRT GPU)](samples/sample.png) ## On-device (Pixel 8a, Tensor G3 — verified) | | | |---|---| | nodes on GPU | **2179 / 2179** LITERT_CL (full residency) | | inference | **~42 ms** (256×256) | | size | 38 MB (fp16) | | accuracy | device output **== PyTorch (corr 1.000000)** — re-authoring is numerically exact | ``` image[1,3,256,256] (RGB [0,1]) →[GPU: NAFNet U-Net]→ restored[1,3,256,256] ``` ## Usage (Android, LiteRT CompiledModel) ```kotlin val model = CompiledModel.create(modelPath, CompiledModel.Options(Accelerator.GPU), null) val input = model.createInputBuffers(); val output = model.createOutputBuffers() input[0].writeFloat(chw) // [1,3,256,256] RGB in [0,1], NCHW model.run(input, output) val restored = output[0].readFloat() // [1,3,256,256] in [0,1] ``` A complete Android sample (image picker + before/after) is in the official [google-ai-edge/litert-samples](https://github.com/google-ai-edge/litert-samples) repo under `compiled_model_api/image_restoration`. ## How it converts (litert-torch) NAFNet is fully convolutional (any size that is a multiple of 16; exported here at 256×256). Three numerically-exact GPU re-authorings: 1. **`LayerNorm2d` → fp16-safe channel LayerNorm.** NAFNet's residual stream grows large (|x|≈175 at the bottleneck), so the LayerNorm channel reductions `Σ_c x` and `Σ_c (x−μ)²` (~15M) **overflow fp16 (max 65504)** on the Mali delegate (which computes in fp16 regardless of the model dtype) → a grid artifact. Doing the reductions in a down-scaled `x/S` domain (S=128) and rescaling is numerically exact and fp16-safe. 2. **Simplified Channel Attention `AdaptiveAvgPool2d(1)` → `mean(3).mean(2)`** (two single-axis means). 3. **Upsample `Conv2d(1×1)+PixelShuffle(2)` → Conv2d + depth-to-space `ZeroStuffConvT2d`**. Result: banned ops NONE, all tensors ≤4D, tflite-vs-torch corr **1.0**, device-vs-torch corr **1.0**. ## License [MIT](https://github.com/megvii-research/NAFNet/blob/main/LICENSE). Upstream: [megvii-research/NAFNet](https://github.com/megvii-research/NAFNet). Original weights: NAFNet-GoPro-width32 from the official release.