RTMPose-Hand β€” LiteRT (on-device 21-keypoint hand pose, fully-GPU)

RTMPose (mmpose, CSPNeXt + RTMCC/SimCC head) hand pose, converted to LiteRT and running fully on the CompiledModel GPU (ML Drift) on Android. The 21 standard hand keypoints (wrist + 4 joints Γ— 5 fingers) for a single centered hand.

RTMPose-Hand β€” input | hand skeleton (on-device LiteRT GPU)

On-device (Pixel 8a, Tensor G3 β€” verified)

nodes on GPU 333 / 333 LITERT_CL (full residency)
inference ~4 ms (256Γ—256)
size 28 MB (fp16)
accuracy device-vs-PyTorch SimCC corr 0.999, 21/21 keypoints
image[1,3,256,256] (ImageNet 0-255) β†’[GPU: CSPNeXt + RTMCC]β†’ simcc_x[1,21,512], simcc_y[1,21,512]

How it converts (litert-torch)

Identical RTMPose-family recipe (both numerically exact, no PixelShuffle since there's no neck):

  1. ScaleNorm (RMS) β†’ SafeRMSNorm β€” fp16-overflow all-zero-head fix (scale x down by S=64 before squaring).
  2. GAU act@act BMM β†’ broadcast-multiply + reduce-sum.

Result: banned ops NONE, all tensors ≀4D, tflite-vs-torch corr 1.0, device-vs-torch corr 0.999.

Preprocessing

Center-crop to square, resize to 256Γ—256, ImageNet 0-255 normalize, NCHW. Top-down β€” one centered hand. SimCC argmax (Γ· split=2) β†’ pixel.

License

Apache-2.0. Upstream: open-mmlab/mmpose RTMPose-Hand.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support