Instructions to use larsvandorp/folding_pi05 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use larsvandorp/folding_pi05 with LeRobot:
- Notebooks
- Google Colab
- Kaggle
folding_pi05 — π0.5 fine-tune for towel folding
Ï€0.5 vision-language-action policy fine-tuned for autonomous towel folding on a 6-DoF SO-101 follower arm with a single wrist camera. A strong alternative to the diffusion-transformer policy larsvandorp/folding_dit.
The repo root holds the step-9000 checkpoint (our best), so from_pretrained("larsvandorp/folding_pi05") loads it directly.
Notes
- Full fine-tune from
lerobot/pi05_base(PaliGemma backbone + action expert, ~3B params), vision encoder unfrozen. - NVIDIA GPU only — too heavy for Mac MPS at 30 Hz.
- π0.5 instantiates PaliGemma at load time, so the HF account running it must have accepted https://huggingface.co/google/paligemma-3b-pt-224.
Run it
uv venv --python 3.12 .venv
GIT_LFS_SKIP_SMUDGE=1 uv pip install --python .venv/bin/python \
"lerobot[pi] @ git+https://github.com/LarsvanDorp/lerobot.git@dinov3"
.venv/bin/lerobot-rollout \
--strategy.type=base \
--robot.type=so101_follower --robot.port=/dev/ttyACM0 --robot.id=my_follower \
--robot.cameras="{wrist: {type: opencv, index_or_path: <cam-index>, width: 800, height: 600, fps: 30, fourcc: MJPG}}" \
--policy.path=larsvandorp/folding_pi05 \
--policy.device=cuda --inference.type=sync \
--task="fold the towel" --duration=60
Note the fourcc: MJPG in the camera config (needed on the lab Linux PC). We run without --interpolation_multiplier.
Training data
larsvandorp/magic_soup — the filtered SO-101 towel-folding set (bad episodes removed: high mean |Δa|, or no fold in the last frame).
- Downloads last month
- -
Model tree for larsvandorp/folding_pi05
Base model
lerobot/pi05_base