Instructions to use mlboydaisuke/yolox-tiny-litert with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use mlboydaisuke/yolox-tiny-litert with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: litert | |
| pipeline_tag: object-detection | |
| tags: | |
| - object-detection | |
| - yolox | |
| - litert | |
| - tflite | |
| - on-device | |
| - gpu | |
| # YOLOX-Tiny — LiteRT (CompiledModel GPU) | |
| Megvii **YOLOX-Tiny** (COCO, Apache-2.0) re-authored to a **GPU-native** LiteRT `.tflite` via the | |
| official **litert_torch** path (no onnx2tf). FP16, **10.4 MB**, input **416×416**. | |
| Verified on a Pixel 8a: the whole graph runs on the GPU delegate (full **LITERT_CL residency**, | |
| zero CPU fallback) and the GPU output matches the CPU/PyTorch reference (corr ≥ 0.999). | |
| ## Why this is GPU-clean | |
| YOLOX is a pure CNN, but its **Focus stem** (stride-2 space-to-depth slicing) lowers to | |
| `GATHER_ND`, which the GPU delegate rejects. Here the Focus + its following 3×3 conv are folded | |
| into a single, numerically-exact **6×6 stride-2 conv**, so the graph has **zero GATHER/GATHER_ND/ | |
| TopK/Cast** ops and **no >4D tensors**. Activations (SiLU) lower to LOGISTIC+MUL. | |
| ## I/O | |
| - **Input** `images` `[1, 416, 416, 3]` NHWC, **BGR, 0–255, no normalization** (YOLOX letterbox: | |
| uniform-scale to fit, pad bottom/right with gray 114). | |
| - **Output** `[1, 3549, 85]` raw heads, anchor-major. `85 = 4 box (cx,cy,w,h, grid units) + 1 obj | |
| + 80 class`. obj/class are already sigmoid'd; boxes are **not** decoded. | |
| ## Host-side decode (kept out of the graph for GPU-cleanliness) | |
| For anchor `i` at grid `(gx,gy)` with `stride ∈ {8,16,32}`: | |
| `cx=(raw_cx+gx)*stride`, `cy=(raw_cy+gy)*stride`, `w=exp(raw_w)*stride`, `h=exp(raw_h)*stride`; | |
| `score = obj * max_class`; then per-class NMS. Divide boxes by the letterbox ratio to map back. | |
| Reference Kotlin + Python decode in the sample below. | |
| ## Performance | |
| COCO val2017 AP **32.8** (FP32 reference). Real-time on Pixel 8a GPU. | |
| ## Training data & PII | |
| Trained by Megvii on **COCO 2017** (train2017), a public academic object-detection dataset | |
| (Creative Commons). COCO images contain people as one of the 80 object categories; no names, | |
| identities, or other personal attributes are modeled or output — the model emits only class id + | |
| box. No additional or private data was used. Weights are the official Megvii release; only the op | |
| graph was re-authored for GPU (weights unchanged). | |
| ## Sample app + conversion script | |
| Android sample (CompiledModel GPU, Kotlin decode + NMS) and the `litert_torch` conversion script: | |
| https://github.com/google-ai-edge/litert-samples (compiled_model_api/object_detection) | |