--- license: apache-2.0 library_name: litert pipeline_tag: image-segmentation base_model: xuebinqin/U-2-Net tags: - litert - tflite - on-device - android - background-removal - salient-object-detection - image-matting - u2net --- # U²-Net — LiteRT (TFLite) GPU, FP16 On-device [LiteRT](https://ai.google.dev/edge/litert) (`.tflite`) conversion of **[U²-Net](https://github.com/xuebinqin/U-2-Net)** for salient-object segmentation / **background removal**. U²-Net is a nested U-structure ("U-net of U-nets", a pure CNN) that predicts a single-channel saliency mask; the foreground is composited onto transparency to cut the subject out of its background. The model runs **fully on the LiteRT `CompiledModel` GPU accelerator** (ML Drift): every op is GPU-native, no CPU fallback, no Flex ops. It converts with [`litert-torch`](https://github.com/google-ai-edge/ai-edge-torch) **with no custom rewrites** (pure CNN). ## Files | File | Size | Description | |------|------|-------------| | `u2net_fp16.tflite` | 88 MB | float16 weights, GPU-compatible | ## I/O - **Input**: `[1, 3, 320, 320]` float32, **NCHW**, RGB. Preprocessing: resize to 320×320, divide by the per-image max, then ImageNet normalize (`mean = [0.485, 0.456, 0.406]`, `std = [0.229, 0.224, 0.225]`). - **Output**: `[1, 1, 320, 320]` saliency mask in `[0, 1]` (sigmoid). Upscale to the input size and use as the foreground alpha. ## Usage (Android, LiteRT CompiledModel) ```kotlin val model = CompiledModel.create( context.assets, "u2net_fp16.tflite", CompiledModel.Options(Accelerator.GPU), null ) val inputs = model.createInputBuffers() val outputs = model.createOutputBuffers() inputs[0].writeFloat(nchwFloatArray) // [1,3,320,320] model.run(inputs, outputs) val mask = outputs[0].readFloat() // [1,1,320,320] in [0,1] ``` A complete Android sample (live camera + gallery background removal) is available in [google-ai-edge/litert-samples](https://github.com/google-ai-edge/litert-samples). ## Performance - ~147 ms / frame on a Pixel 8a (Tensor G3, Mali) GPU. ## Conversion notes Converted with `litert-torch` (full U2NET, 44M params) and float16-quantized with `ai-edge-quantizer`. Verified: all ops GPU-native, output correlation = 1.0 vs the PyTorch reference (FP32), ~0.9999 for the FP16 build. ## License & attribution - License: **Apache-2.0** (© the U²-Net authors, [xuebinqin/U-2-Net](https://github.com/xuebinqin/U-2-Net/blob/master/LICENSE)). - This is a format conversion of the official U²-Net weights (no architectural changes); all credit to the original authors.