NAFNet-GoPro-width32 LiteRT fp16 (fully-GPU deblur, Pixel 8a corr 1.0)

Browse files

Files changed (4) hide show

.gitattributes +1 -0
README.md +72 -0
nafnet_fp16.tflite +3 -0
samples/sample.png +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+samples/sample.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,72 @@

+---
+license: mit
+library_name: LiteRT
+pipeline_tag: image-to-image
+tags:
+  - litert
+  - tflite
+  - on-device
+  - android
+  - gpu
+  - image-restoration
+  - deblurring
+  - nafnet
+base_model: megvii-research/NAFNet
+---
+# NAFNet-GoPro-width32 — LiteRT (on-device image deblur, fully-GPU)
+[NAFNet](https://github.com/megvii-research/NAFNet) (Nonlinear Activation Free Network, ECCV 2022) image
+restoration, converted to **LiteRT** and running **fully on the `CompiledModel` GPU** (ML Drift) on Android.
+NAFNet is a U-Net of **NAFBlocks** with **no activation functions at all** (SimpleGate = channel-split
+multiply), so the whole network is a clean CNN on the GPU delegate. This is the **GoPro-width32** variant —
+motion deblur.
+![NAFNet — blurry input | restored (on-device LiteRT GPU)](samples/sample.png)
+## On-device (Pixel 8a, Tensor G3 — verified)
+| | |
+|---|---|
+| nodes on GPU | **2179 / 2179** LITERT_CL (full residency) |
+| inference | **~42 ms** (256×256) |
+| size | 38 MB (fp16) |
+| accuracy | device output **== PyTorch (corr 1.000000)** — re-authoring is numerically exact |
+```
+image[1,3,256,256] (RGB [0,1]) →[GPU: NAFNet U-Net]→ restored[1,3,256,256]
+```
+## Usage (Android, LiteRT CompiledModel)
+```kotlin
+val model = CompiledModel.create(modelPath, CompiledModel.Options(Accelerator.GPU), null)
+val input = model.createInputBuffers(); val output = model.createOutputBuffers()
+input[0].writeFloat(chw)             // [1,3,256,256] RGB in [0,1], NCHW
+model.run(input, output)
+val restored = output[0].readFloat()  // [1,3,256,256] in [0,1]
+```
+A complete Android sample (image picker + before/after) is in the official
+[google-ai-edge/litert-samples](https://github.com/google-ai-edge/litert-samples) repo under
+`compiled_model_api/image_restoration`.
+## How it converts (litert-torch)
+NAFNet is fully convolutional (any size that is a multiple of 16; exported here at 256×256). Three
+numerically-exact GPU re-authorings:
+1. **`LayerNorm2d` → fp16-safe channel LayerNorm.** NAFNet's residual stream grows large (|x|≈175 at the
+   bottleneck), so the LayerNorm channel reductions `Σ_c x` and `Σ_c (x−μ)²` (~15M) **overflow fp16 (max
+   65504)** on the Mali delegate (which computes in fp16 regardless of the model dtype) → a grid artifact.
+   Doing the reductions in a down-scaled `x/S` domain (S=128) and rescaling is numerically exact and fp16-safe.
+2. **Simplified Channel Attention `AdaptiveAvgPool2d(1)` → `mean(3).mean(2)`** (two single-axis means).
+3. **Upsample `Conv2d(1×1)+PixelShuffle(2)` → Conv2d + depth-to-space `ZeroStuffConvT2d`**.
+Result: banned ops NONE, all tensors ≤4D, tflite-vs-torch corr **1.0**, device-vs-torch corr **1.0**.
+## License
+[MIT](https://github.com/megvii-research/NAFNet/blob/main/LICENSE). Upstream:
+[megvii-research/NAFNet](https://github.com/megvii-research/NAFNet). Original weights:
+NAFNet-GoPro-width32 from the official release.

nafnet_fp16.tflite ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8e749782f7c8a6e462ba7b457724edd089216fa3a25f9d5f83d0caaae7158507
+size 38314240

samples/sample.png ADDED Viewed

Git LFS Details

SHA256: 192d6749081a2c547bba4944f8ded9466955aeb007c4862f5ecf5e189d9c7ac8
Pointer size: 131 Bytes
Size of remote file: 152 kB