mohan007
/

pi05-mlx-4bit

Model card Files Files and versions

pi05-mlx-4bit / README.md

mohan007's picture

Add README

d66b7b6 verified 29 days ago

|

history blame contribute delete

1.43 kB

	---
	license: other
	tags:
	- mlx
	- robotics
	- pi0.5
	- quantized
	- apple-silicon
	---

	# pi0.5 — 4-bit Quantized MLX Weights

	4-bit quantized weights for [lerobot/pi05_base](https://huggingface.co/lerobot/pi05_base) converted to Apple MLX format.

	Runs on Apple Silicon (M1/M2/M3) with ~2.4 GB RAM. Loads in ~6s, inference in ~2s per action chunk.

	## Architecture
	- PaliGemma 2B VLM (SigLIP + Gemma 2B) + Gemma 300M action expert
	- Flow-matching policy: 10-step Forward Euler denoising
	- Output: action chunk [B, 50, 32]

	## Usage

	```python
	from huggingface_hub import hf_hub_download
	import mlx.core as mx
	import mlx.nn as nn

	# Download quantized weights (~2.6 GB, one-time)
	npz_path = hf_hub_download("mohan007/pi05-mlx-4bit", "pi05_mlx_4bit.npz")

	# Load with mlx_pi05
	from mlx_pi05.load import load_model
	model = load_model(quantized_path=npz_path, quantize=True)
	model.eval()

	# Run inference
	import numpy as np
	image_mlx = mx.array(np.zeros((1, 3, 224, 224), dtype=np.float32))
	lang_mlx = mx.array(np.array([[1, 2, 3, 4, 5]], dtype=np.int32))
	actions = model.sample_actions(image_mlx, lang_mlx) # [1, 50, 32]
	```

	## Quantization
	- Gemma 2B + expert layers: 4-bit (group_size=64)
	- SigLIP kept in float16 (fc2 input dim 4304 not divisible by 64)
	- Total: ~2.4 GB vs ~7.2 GB float16

	## Source
	Converted from [lerobot/pi05_base](https://huggingface.co/lerobot/pi05_base) original float32 safetensors.