RTMPose-Face (WFLW) — LiteRT (on-device 98-point face alignment, fully-GPU)

RTMPose (mmpose) face alignment, trained on WFLW, converted to LiteRT and running fully on the CompiledModel GPU (ML Drift) on Android. 98 dense facial landmarks (contour, eyebrows, eyes, nose, mouth, pupils) — the dense complement to a 5-point face detector.

On-device (Pixel 8a, Tensor G3 — verified)


nodes on GPU	333 / 333 LITERT_CL (full residency)
inference	~4 ms (256×256)
size	33.6 MB (fp16)
accuracy	device-vs-PyTorch SimCC corr 0.9995, 98 landmarks

face[1,3,256,256] (mmpose mean/std) →[GPU: RTMPose-m]→ simcc_x[1,98,512], simcc_y[1,98,512]

output[0] = simcc_x, output[1] = simcc_y; each landmark = argmax over its 1D SimCC (bins = pixels × 2).

Minimal usage

Android (Kotlin, CompiledModel GPU)

val model = CompiledModel.create(context.assets, "rtm_face_fp16.tflite",
    CompiledModel.Options(Accelerator.GPU), null)
val inputs = model.createInputBuffers()
val outputs = model.createOutputBuffers()
inputs[0].writeFloat(chw)              // [1,3,256,256] mmpose mean/std (0-255 RGB), NCHW
model.run(inputs, outputs)
val simccX = outputs[0].readFloat()    // [1,98,512]
val simccY = outputs[1].readFloat()    // [1,98,512]; keypoint = argmax / 2

Python (desktop verification)

MEAN = np.array([123.675, 116.28, 103.53], np.float32)
STD  = np.array([58.395, 57.12, 57.375], np.float32)
import numpy as np
from PIL import Image
from ai_edge_litert.interpreter import Interpreter

img = Image.open("face.jpg").convert("RGB").resize((256, 256))  # centered subject crop
x = ((np.asarray(img, np.float32) - MEAN) / STD).transpose(2, 0, 1)[None]

it = Interpreter(model_path="rtm_face_fp16.tflite"); it.allocate_tensors()
it.set_tensor(it.get_input_details()[0]["index"], x); it.invoke()
od = it.get_output_details()                                     # output 0 = simcc_x, 1 = simcc_y
sx = it.get_tensor(od[0]["index"])[0]                             # simcc_x [98,512]
sy = it.get_tensor(od[1]["index"])[0]                             # simcc_y [98,512]
kx, ky = sx.argmax(-1) / 2.0, sy.argmax(-1) / 2.0                 # 98 keypoints, px in 256x256
for i, (a, b) in enumerate(zip(kx, ky)):
    print(f"kp{i}: ({a:.1f}, {b:.1f})")

How it converts (litert-torch) — the RTMPose recipe, unchanged

Same model family as the human-pose RTMPose; only the config/checkpoint change to WFLW. The two on-device-only Mali fixes transfer without modification: ScaleNorm → SafeRMSNorm and GAU act@act BMM → broadcast-reduce. banned ops NONE, ≤4D, tflite-vs-torch corr 1.0, device-vs-torch 0.9995.

Preprocessing

Center-crop to a (centered) face, resize 256×256, mmpose mean/std (RGB, 0-255 scale), NCHW.

License

Apache-2.0. Upstream: open-mmlab/mmpose; dataset WFLW.

Downloads last month: 7

Inference Providers NEW

Keypoint Detection

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support