ViTPose base-simple β€” MLX

ViTPose (vitpose-base-simple) converted to MLX for on-device human pose estimation on Apple Silicon. Weights are float16.

Built for MLXPose β€” a native MLX Swift ViTPose implementation. The Swift forward pass is numerically verified against the Hugging Face reference (heatmaps max|Ξ”|=1.5e-6, decoded keypoints max 3e-5 px).

  • Backbone: plain ViT-base (12 layers, dim 768), patch 16, input 256Γ—192.
  • Head: simple decoder β†’ 17 COCO keypoint heatmaps (64Γ—48).
  • Conversion: convert_vitpose_to_mlx.py.

Files

  • weights.safetensors β€” MLX float16 weights.
  • config.json β€” original ViTPose config.

License

Apache-2.0. Pretrained weights derive from COCO/MPII training data β€” review dataset terms for your use case.

Downloads last month
22
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for nazarkozak/vitpose-base-simple-mlx

Finetuned
(2)
this model