pi0.5 โ€” 4-bit Quantized MLX Weights

4-bit quantized weights for lerobot/pi05_base converted to Apple MLX format.

Runs on Apple Silicon (M1/M2/M3) with ~2.4 GB RAM. Loads in ~6s, inference in ~2s per action chunk.

Architecture

  • PaliGemma 2B VLM (SigLIP + Gemma 2B) + Gemma 300M action expert
  • Flow-matching policy: 10-step Forward Euler denoising
  • Output: action chunk [B, 50, 32]

Usage

from huggingface_hub import hf_hub_download
import mlx.core as mx
import mlx.nn as nn

# Download quantized weights (~2.6 GB, one-time)
npz_path = hf_hub_download("mohan007/pi05-mlx-4bit", "pi05_mlx_4bit.npz")

# Load with mlx_pi05
from mlx_pi05.load import load_model
model = load_model(quantized_path=npz_path, quantize=True)
model.eval()

# Run inference
import numpy as np
image_mlx = mx.array(np.zeros((1, 3, 224, 224), dtype=np.float32))
lang_mlx  = mx.array(np.array([[1, 2, 3, 4, 5]], dtype=np.int32))
actions   = model.sample_actions(image_mlx, lang_mlx)  # [1, 50, 32]

Quantization

  • Gemma 2B + expert layers: 4-bit (group_size=64)
  • SigLIP kept in float16 (fc2 input dim 4304 not divisible by 64)
  • Total: ~2.4 GB vs ~7.2 GB float16

Source

Converted from lerobot/pi05_base original float32 safetensors.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading