---
license: apache-2.0
library_name: litert
pipeline_tag: image-segmentation
base_model: xuebinqin/U-2-Net
tags:
  - litert
  - tflite
  - on-device
  - android
  - background-removal
  - salient-object-detection
  - image-matting
  - u2net
---

# U²-Net — LiteRT (TFLite) GPU, FP16

On-device [LiteRT](https://ai.google.dev/edge/litert) (`.tflite`) conversion of
**[U²-Net](https://github.com/xuebinqin/U-2-Net)** for salient-object segmentation /
**background removal**. U²-Net is a nested U-structure ("U-net of U-nets", a pure CNN)
that predicts a single-channel saliency mask; the foreground is composited onto
transparency to cut the subject out of its background.

The model runs **fully on the LiteRT `CompiledModel` GPU accelerator** (ML Drift):
every op is GPU-native, no CPU fallback, no Flex ops. It converts with
[`litert-torch`](https://github.com/google-ai-edge/ai-edge-torch) **with no custom
rewrites** (pure CNN).

## Files

| File | Size | Description |
|------|------|-------------|
| `u2net_fp16.tflite` | 88 MB | float16 weights, GPU-compatible |

## I/O

- **Input**: `[1, 3, 320, 320]` float32, **NCHW**, RGB. Preprocessing: resize to 320×320,
  divide by the per-image max, then ImageNet normalize
  (`mean = [0.485, 0.456, 0.406]`, `std = [0.229, 0.224, 0.225]`).
- **Output**: `[1, 1, 320, 320]` saliency mask in `[0, 1]` (sigmoid). Upscale to the input
  size and use as the foreground alpha.

## Usage (Android, LiteRT CompiledModel)

```kotlin
val model = CompiledModel.create(
    context.assets, "u2net_fp16.tflite",
    CompiledModel.Options(Accelerator.GPU), null
)
val inputs = model.createInputBuffers()
val outputs = model.createOutputBuffers()
inputs[0].writeFloat(nchwFloatArray)   // [1,3,320,320]
model.run(inputs, outputs)
val mask = outputs[0].readFloat()      // [1,1,320,320] in [0,1]
```

A complete Android sample (live camera + gallery background removal) is available in
[google-ai-edge/litert-samples](https://github.com/google-ai-edge/litert-samples).

## Performance

- ~147 ms / frame on a Pixel 8a (Tensor G3, Mali) GPU.

## Conversion notes

Converted with `litert-torch` (full U2NET, 44M params) and float16-quantized with
`ai-edge-quantizer`. Verified: all ops GPU-native, output correlation = 1.0 vs the PyTorch
reference (FP32), ~0.9999 for the FP16 build.

## License & attribution

- License: **Apache-2.0** (© the U²-Net authors,
  [xuebinqin/U-2-Net](https://github.com/xuebinqin/U-2-Net/blob/master/LICENSE)).
- This is a format conversion of the official U²-Net weights (no architectural changes);
  all credit to the original authors.