U-2-Net-LiteRT / README.md
mlboydaisuke's picture
Upload README.md with huggingface_hub
930f02b verified
|
Raw
History Blame Contribute Delete
2.61 kB
---
license: apache-2.0
library_name: litert
pipeline_tag: image-segmentation
base_model: xuebinqin/U-2-Net
tags:
- litert
- tflite
- on-device
- android
- background-removal
- salient-object-detection
- image-matting
- u2net
---
# U²-Net — LiteRT (TFLite) GPU, FP16
On-device [LiteRT](https://ai.google.dev/edge/litert) (`.tflite`) conversion of
**[U²-Net](https://github.com/xuebinqin/U-2-Net)** for salient-object segmentation /
**background removal**. U²-Net is a nested U-structure ("U-net of U-nets", a pure CNN)
that predicts a single-channel saliency mask; the foreground is composited onto
transparency to cut the subject out of its background.
The model runs **fully on the LiteRT `CompiledModel` GPU accelerator** (ML Drift):
every op is GPU-native, no CPU fallback, no Flex ops. It converts with
[`litert-torch`](https://github.com/google-ai-edge/ai-edge-torch) **with no custom
rewrites** (pure CNN).
## Files
| File | Size | Description |
|------|------|-------------|
| `u2net_fp16.tflite` | 88 MB | float16 weights, GPU-compatible |
## I/O
- **Input**: `[1, 3, 320, 320]` float32, **NCHW**, RGB. Preprocessing: resize to 320×320,
divide by the per-image max, then ImageNet normalize
(`mean = [0.485, 0.456, 0.406]`, `std = [0.229, 0.224, 0.225]`).
- **Output**: `[1, 1, 320, 320]` saliency mask in `[0, 1]` (sigmoid). Upscale to the input
size and use as the foreground alpha.
## Usage (Android, LiteRT CompiledModel)
```kotlin
val model = CompiledModel.create(
context.assets, "u2net_fp16.tflite",
CompiledModel.Options(Accelerator.GPU), null
)
val inputs = model.createInputBuffers()
val outputs = model.createOutputBuffers()
inputs[0].writeFloat(nchwFloatArray) // [1,3,320,320]
model.run(inputs, outputs)
val mask = outputs[0].readFloat() // [1,1,320,320] in [0,1]
```
A complete Android sample (live camera + gallery background removal) is available in
[google-ai-edge/litert-samples](https://github.com/google-ai-edge/litert-samples).
## Performance
- ~147 ms / frame on a Pixel 8a (Tensor G3, Mali) GPU.
## Conversion notes
Converted with `litert-torch` (full U2NET, 44M params) and float16-quantized with
`ai-edge-quantizer`. Verified: all ops GPU-native, output correlation = 1.0 vs the PyTorch
reference (FP32), ~0.9999 for the FP16 build.
## License & attribution
- License: **Apache-2.0** (© the U²-Net authors,
[xuebinqin/U-2-Net](https://github.com/xuebinqin/U-2-Net/blob/master/LICENSE)).
- This is a format conversion of the official U²-Net weights (no architectural changes);
all credit to the original authors.