Instructions to use mohan007/pi05-mlx-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mohan007/pi05-mlx-4bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir pi05-mlx-4bit mohan007/pi05-mlx-4bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
| license: other | |
| tags: | |
| - mlx | |
| - robotics | |
| - pi0.5 | |
| - quantized | |
| - apple-silicon | |
| # pi0.5 — 4-bit Quantized MLX Weights | |
| 4-bit quantized weights for [lerobot/pi05_base](https://huggingface.co/lerobot/pi05_base) converted to Apple MLX format. | |
| Runs on **Apple Silicon (M1/M2/M3)** with ~2.4 GB RAM. Loads in ~6s, inference in ~2s per action chunk. | |
| ## Architecture | |
| - PaliGemma 2B VLM (SigLIP + Gemma 2B) + Gemma 300M action expert | |
| - Flow-matching policy: 10-step Forward Euler denoising | |
| - Output: action chunk [B, 50, 32] | |
| ## Usage | |
| ```python | |
| from huggingface_hub import hf_hub_download | |
| import mlx.core as mx | |
| import mlx.nn as nn | |
| # Download quantized weights (~2.6 GB, one-time) | |
| npz_path = hf_hub_download("mohan007/pi05-mlx-4bit", "pi05_mlx_4bit.npz") | |
| # Load with mlx_pi05 | |
| from mlx_pi05.load import load_model | |
| model = load_model(quantized_path=npz_path, quantize=True) | |
| model.eval() | |
| # Run inference | |
| import numpy as np | |
| image_mlx = mx.array(np.zeros((1, 3, 224, 224), dtype=np.float32)) | |
| lang_mlx = mx.array(np.array([[1, 2, 3, 4, 5]], dtype=np.int32)) | |
| actions = model.sample_actions(image_mlx, lang_mlx) # [1, 50, 32] | |
| ``` | |
| ## Quantization | |
| - Gemma 2B + expert layers: 4-bit (group_size=64) | |
| - SigLIP kept in float16 (fc2 input dim 4304 not divisible by 64) | |
| - Total: ~2.4 GB vs ~7.2 GB float16 | |
| ## Source | |
| Converted from [lerobot/pi05_base](https://huggingface.co/lerobot/pi05_base) original float32 safetensors. | |