lightweight-OpenPose β€” LiteRT (TFLite) GPU, FP16

On-device LiteRT (.tflite) conversion of lightweight-OpenPose for human pose estimation. The model is a MobileNet-based heatmap network; it outputs keypoint heatmaps only and the keypoint decode (argmax) is done in app code.

The model runs fully on the LiteRT CompiledModel GPU accelerator (ML Drift): every op is GPU-native, no CPU fallback. Converted with litert-torch with no patches.

Why heatmaps-only: MoveNet's official .tflite bakes the keypoint decode into the graph (GATHER_ND), which the GPU delegate can't run β€” so it only partially offloads to the GPU. Keeping the graph pure-conv and decoding in app code keeps it 100% on the GPU.

Files

File Precision Size
pose_256_fp16.tflite fp16 weights ~8.3 MB
pose_256.tflite fp32 ~16.4 MB

I/O

  • Input: [1, 256, 256, 3] float32, NHWC, RGB, normalized (px - 128) / 256.
  • Output: [1, 32, 32, 19] float32, NHWC, keypoint heatmaps (18 body keypoints + background). Argmax each of the 18 keypoint channels over the 32 x 32 grid to get the normalized keypoint locations; connect them into a skeleton.

Keypoint order (18): nose, neck, r-shoulder, r-elbow, r-wrist, l-shoulder, l-elbow, l-wrist, r-hip, r-knee, r-ankle, l-hip, l-knee, l-ankle, r-eye, l-eye, r-ear, l-ear.

Ops

CONV_2D x41, DEPTHWISE_CONV_2D x14, TRANSPOSE x14, EXP x6, SUB x6,
GREATER_EQUAL x6, SELECT x6, ADD x6, PAD x3, CONCATENATION x1

(The ELU activations lower to EXP/SUB/GREATER_EQUAL/SELECT, all GPU-supported.) No GATHER_ND, no Flex/Custom.

On-device (Pixel 8a, verified)

The fp16 model compiles to 158 / 158 nodes on the LiteRT GPU delegate (LITERT_CL) β€” full GPU residency, no CPU fallback.

Usage (Android, LiteRT CompiledModel)

val model = CompiledModel.create(
    context.assets, "pose_256_fp16.tflite",
    CompiledModel.Options(Accelerator.GPU), null
)
val inputs = model.createInputBuffers()
val outputs = model.createOutputBuffers()
inputs[0].writeFloat(rgbNormalized)    // [1,256,256,3], (px-128)/256
model.run(inputs, outputs)
val heatmaps = outputs[0].readFloat()  // [1,32,32,19] -> argmax per keypoint in app code

A complete Android sample (camera + gallery, skeleton overlay) is available in google-ai-edge/litert-samples.

License & attribution

Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support