mlboydaisuke
/

yolox-tiny-litert

Object Detection

Model card Files Files and versions

yolox-tiny-litert / README.md

mlboydaisuke's picture

Upload README.md with huggingface_hub

4bd82c8 verified 8 days ago

|

History Blame Contribute Delete

2.43 kB

	---
	license: apache-2.0
	library_name: litert
	pipeline_tag: object-detection
	tags:
	- object-detection
	- yolox
	- litert
	- tflite
	- on-device
	- gpu
	---

	# YOLOX-Tiny — LiteRT (CompiledModel GPU)

	Megvii YOLOX-Tiny (COCO, Apache-2.0) re-authored to a GPU-native LiteRT `.tflite` via the
	official litert_torch path (no onnx2tf). FP16, 10.4 MB, input 416×416.

	Verified on a Pixel 8a: the whole graph runs on the GPU delegate (full LITERT_CL residency,
	zero CPU fallback) and the GPU output matches the CPU/PyTorch reference (corr ≥ 0.999).

	## Why this is GPU-clean

	YOLOX is a pure CNN, but its Focus stem (stride-2 space-to-depth slicing) lowers to
	`GATHER_ND`, which the GPU delegate rejects. Here the Focus + its following 3×3 conv are folded
	into a single, numerically-exact 6×6 stride-2 conv, so the graph has **zero GATHER/GATHER_ND/
	TopK/Cast ops and no >4D tensors**. Activations (SiLU) lower to LOGISTIC+MUL.

	## I/O

	- Input `images` `[1, 416, 416, 3]` NHWC, BGR, 0–255, no normalization (YOLOX letterbox:
	uniform-scale to fit, pad bottom/right with gray 114).
	- Output `[1, 3549, 85]` raw heads, anchor-major. `85 = 4 box (cx,cy,w,h, grid units) + 1 obj
	+ 80 class`. obj/class are already sigmoid'd; boxes are not decoded.

	## Host-side decode (kept out of the graph for GPU-cleanliness)

	For anchor `i` at grid `(gx,gy)` with `stride ∈ {8,16,32}`:
	`cx=(raw_cx+gx)stride`, `cy=(raw_cy+gy)stride`, `w=exp(raw_w)stride`, `h=exp(raw_h)stride`;
	`score = obj * max_class`; then per-class NMS. Divide boxes by the letterbox ratio to map back.
	Reference Kotlin + Python decode in the sample below.

	## Performance

	COCO val2017 AP 32.8 (FP32 reference). Real-time on Pixel 8a GPU.

	## Training data & PII

	Trained by Megvii on COCO 2017 (train2017), a public academic object-detection dataset
	(Creative Commons). COCO images contain people as one of the 80 object categories; no names,
	identities, or other personal attributes are modeled or output — the model emits only class id +
	box. No additional or private data was used. Weights are the official Megvii release; only the op
	graph was re-authored for GPU (weights unchanged).

	## Sample app + conversion script

	Android sample (CompiledModel GPU, Kotlin decode + NMS) and the `litert_torch` conversion script:
	https://github.com/google-ai-edge/litert-samples (compiled_model_api/object_detection)