mlboydaisuke
/

U-2-Net-LiteRT

Image Segmentation

background-removal

salient-object-detection

Model card Files Files and versions

U-2-Net-LiteRT / README.md

mlboydaisuke's picture

Upload README.md with huggingface_hub

930f02b verified 14 days ago

|

History Blame Contribute Delete

2.61 kB

	---
	license: apache-2.0
	library_name: litert
	pipeline_tag: image-segmentation
	base_model: xuebinqin/U-2-Net
	tags:
	- litert
	- tflite
	- on-device
	- android
	- background-removal
	- salient-object-detection
	- image-matting
	- u2net
	---

	# U²-Net — LiteRT (TFLite) GPU, FP16

	On-device [LiteRT](https://ai.google.dev/edge/litert) (`.tflite`) conversion of
	[U²-Net](https://github.com/xuebinqin/U-2-Net) for salient-object segmentation /
	background removal. U²-Net is a nested U-structure ("U-net of U-nets", a pure CNN)
	that predicts a single-channel saliency mask; the foreground is composited onto
	transparency to cut the subject out of its background.

	The model runs fully on the LiteRT `CompiledModel` GPU accelerator (ML Drift):
	every op is GPU-native, no CPU fallback, no Flex ops. It converts with
	[`litert-torch`](https://github.com/google-ai-edge/ai-edge-torch) **with no custom
	rewrites** (pure CNN).

	## Files

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `u2net_fp16.tflite` \| 88 MB \| float16 weights, GPU-compatible \|

	## I/O

	- Input: `[1, 3, 320, 320]` float32, NCHW, RGB. Preprocessing: resize to 320×320,
	divide by the per-image max, then ImageNet normalize
	(`mean = [0.485, 0.456, 0.406]`, `std = [0.229, 0.224, 0.225]`).
	- Output: `[1, 1, 320, 320]` saliency mask in `[0, 1]` (sigmoid). Upscale to the input
	size and use as the foreground alpha.

	## Usage (Android, LiteRT CompiledModel)

	```kotlin
	val model = CompiledModel.create(
	context.assets, "u2net_fp16.tflite",
	CompiledModel.Options(Accelerator.GPU), null
	)
	val inputs = model.createInputBuffers()
	val outputs = model.createOutputBuffers()
	inputs[0].writeFloat(nchwFloatArray) // [1,3,320,320]
	model.run(inputs, outputs)
	val mask = outputs[0].readFloat() // [1,1,320,320] in [0,1]
	```

	A complete Android sample (live camera + gallery background removal) is available in
	[google-ai-edge/litert-samples](https://github.com/google-ai-edge/litert-samples).

	## Performance

	- ~147 ms / frame on a Pixel 8a (Tensor G3, Mali) GPU.

	## Conversion notes

	Converted with `litert-torch` (full U2NET, 44M params) and float16-quantized with
	`ai-edge-quantizer`. Verified: all ops GPU-native, output correlation = 1.0 vs the PyTorch
	reference (FP32), ~0.9999 for the FP16 build.

	## License & attribution

	- License: Apache-2.0 (© the U²-Net authors,
	[xuebinqin/U-2-Net](https://github.com/xuebinqin/U-2-Net/blob/master/LICENSE)).
	- This is a format conversion of the official U²-Net weights (no architectural changes);
	all credit to the original authors.