ViTPose base-simple — MLX

ViTPose (vitpose-base-simple) converted to MLX for on-device human pose estimation on Apple Silicon. Weights are float16.

Built for MLXPose — a native MLX Swift ViTPose implementation. The Swift forward pass is numerically verified against the Hugging Face reference (heatmaps max|Δ|=1.5e-6, decoded keypoints max 3e-5 px).

Backbone: plain ViT-base (12 layers, dim 768), patch 16, input 256×192.
Head: simple decoder → 17 COCO keypoint heatmaps (64×48).
Conversion: convert_vitpose_to_mlx.py.

Files

weights.safetensors — MLX float16 weights.
config.json — original ViTPose config.

License

Apache-2.0. Pretrained weights derive from COCO/MPII training data — review dataset terms for your use case.

Downloads last month: 22

MLX

Hardware compatibility

Quantized

Inference Providers NEW

Keypoint Detection

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nazarkozak/vitpose-base-simple-mlx

Base model

usyd-community/vitpose-base-simple

Finetuned

(2)

this model