yolox-tiny-litert / README.md
mlboydaisuke's picture
Upload README.md with huggingface_hub
4bd82c8 verified
|
Raw
History Blame Contribute Delete
2.43 kB
---
license: apache-2.0
library_name: litert
pipeline_tag: object-detection
tags:
- object-detection
- yolox
- litert
- tflite
- on-device
- gpu
---
# YOLOX-Tiny — LiteRT (CompiledModel GPU)
Megvii **YOLOX-Tiny** (COCO, Apache-2.0) re-authored to a **GPU-native** LiteRT `.tflite` via the
official **litert_torch** path (no onnx2tf). FP16, **10.4 MB**, input **416×416**.
Verified on a Pixel 8a: the whole graph runs on the GPU delegate (full **LITERT_CL residency**,
zero CPU fallback) and the GPU output matches the CPU/PyTorch reference (corr ≥ 0.999).
## Why this is GPU-clean
YOLOX is a pure CNN, but its **Focus stem** (stride-2 space-to-depth slicing) lowers to
`GATHER_ND`, which the GPU delegate rejects. Here the Focus + its following 3×3 conv are folded
into a single, numerically-exact **6×6 stride-2 conv**, so the graph has **zero GATHER/GATHER_ND/
TopK/Cast** ops and **no >4D tensors**. Activations (SiLU) lower to LOGISTIC+MUL.
## I/O
- **Input** `images` `[1, 416, 416, 3]` NHWC, **BGR, 0–255, no normalization** (YOLOX letterbox:
uniform-scale to fit, pad bottom/right with gray 114).
- **Output** `[1, 3549, 85]` raw heads, anchor-major. `85 = 4 box (cx,cy,w,h, grid units) + 1 obj
+ 80 class`. obj/class are already sigmoid'd; boxes are **not** decoded.
## Host-side decode (kept out of the graph for GPU-cleanliness)
For anchor `i` at grid `(gx,gy)` with `stride ∈ {8,16,32}`:
`cx=(raw_cx+gx)*stride`, `cy=(raw_cy+gy)*stride`, `w=exp(raw_w)*stride`, `h=exp(raw_h)*stride`;
`score = obj * max_class`; then per-class NMS. Divide boxes by the letterbox ratio to map back.
Reference Kotlin + Python decode in the sample below.
## Performance
COCO val2017 AP **32.8** (FP32 reference). Real-time on Pixel 8a GPU.
## Training data & PII
Trained by Megvii on **COCO 2017** (train2017), a public academic object-detection dataset
(Creative Commons). COCO images contain people as one of the 80 object categories; no names,
identities, or other personal attributes are modeled or output — the model emits only class id +
box. No additional or private data was used. Weights are the official Megvii release; only the op
graph was re-authored for GPU (weights unchanged).
## Sample app + conversion script
Android sample (CompiledModel GPU, Kotlin decode + NMS) and the `litert_torch` conversion script:
https://github.com/google-ai-edge/litert-samples (compiled_model_api/object_detection)