Instructions to use litert-community/RTMPose-s-LiteRT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use litert-community/RTMPose-s-LiteRT with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
RTMPose-s — LiteRT (on-device real-time 2D human pose, fully-GPU)
RTMPose (mmpose, CSPNeXt backbone +
RTMCC/SimCC head) top-down 2D human pose, converted to LiteRT and running fully on the CompiledModel
GPU (ML Drift) on Android. Estimates 17 COCO keypoints for a single centered person — the SOTA real-time
pose model, device-verified end-to-end.
On-device (Pixel 8a, Tensor G3 — verified)
| nodes on GPU | 256 / 256 LITERT_CL (full residency) |
| inference | ~4 ms (256×192) |
| size | 11.1 MB (fp16) |
| accuracy | device-vs-PyTorch SimCC corr 0.999, keypoints within 0.3 px (max 1 px) |
image[1,3,256,192] (ImageNet 0-255 norm) →[GPU: CSPNeXt + RTMCC]→ simcc_x[1,17,384], simcc_y[1,17,512]
The SimCC head emits two 1D distributions per keypoint; argmax over the bins (÷ split=2) gives the pixel x/y.
Minimal usage
Android (Kotlin, CompiledModel GPU)
val model = CompiledModel.create(context.assets, "rtmpose_s_fp16.tflite",
CompiledModel.Options(Accelerator.GPU), null)
val inputs = model.createInputBuffers()
val outputs = model.createOutputBuffers()
inputs[0].writeFloat(chw) // [1,3,256,192] mmpose mean/std (0-255 RGB), NCHW
model.run(inputs, outputs)
val simccX = outputs[0].readFloat() // [1,17,384]
val simccY = outputs[1].readFloat() // [1,17,512]; keypoint = argmax / 2
Python (desktop verification)
MEAN = np.array([123.675, 116.28, 103.53], np.float32)
STD = np.array([58.395, 57.12, 57.375], np.float32)
import numpy as np
from PIL import Image
from ai_edge_litert.interpreter import Interpreter
img = Image.open("person.jpg").convert("RGB").resize((192, 256)) # centered subject crop
x = ((np.asarray(img, np.float32) - MEAN) / STD).transpose(2, 0, 1)[None]
it = Interpreter(model_path="rtmpose_s_fp16.tflite"); it.allocate_tensors()
it.set_tensor(it.get_input_details()[0]["index"], x); it.invoke()
od = it.get_output_details()
sx, sy = (it.get_tensor(o["index"])[0] for o in od) # [17,384], [17,512]
if sx.shape[-1] != 384: sx, sy = sy, sx # identify by bin count
kx, ky = sx.argmax(-1) / 2.0, sy.argmax(-1) / 2.0 # 17 keypoints, px in 192x256
for i, (a, b) in enumerate(zip(kx, ky)):
print(f"kp{i}: ({a:.1f}, {b:.1f})")
How it converts (litert-torch) — two numerically-exact re-authorings
Both are on-device-only Mali issues: they pass the desktop op-check and report full LITERT_CL residency, yet the device output was wrong until fixed (residency ≠correctness):
ScaleNorm(RMS norm) fp16 overflow → all-zero head. The RTMCCScaleNorminput reaches ≈ |274|, so its channelΣ x²≈ 3.6M overflows fp16 (max 65504) on the Mali delegate (which reduces in fp16 even for an fp32 graph) →norm = ∞→x/∞ = 0→ the whole head collapses to zero. Fix: scalexdown by S=64 before squaring, then rescale (math-identical) — a SafeRMSNorm.- GAU attention
act@actBMM → broadcast-reduce. The Gated Attention Unit'sq@kᵀandkernel@vare activation×activation batch-matmuls that the Mali delegate mis-computes; at K=17 tokens the exact replacement is(q[:,:,None,:]·k[:,None,:,:]).sum(-1).
Result: banned ops NONE, all tensors ≤4D, tflite-vs-torch corr 1.0, device-vs-torch corr 0.999.
Preprocessing
Center-crop to 3:4, resize to 192×256, ImageNet 0-255 normalize (mean [123.675, 116.28, 103.53], std [58.395, 57.12, 57.375]), NCHW planar. Top-down — expects one roughly-centered person.
License
Apache-2.0. Upstream: open-mmlab/mmpose RTMPose-s.
- Downloads last month
- 24
