folding_pi05 — π0.5 fine-tune for towel folding

Ï€0.5 vision-language-action policy fine-tuned for autonomous towel folding on a 6-DoF SO-101 follower arm with a single wrist camera. A strong alternative to the diffusion-transformer policy larsvandorp/folding_dit.

The repo root holds the step-9000 checkpoint (our best), so from_pretrained("larsvandorp/folding_pi05") loads it directly.

Notes

Run it

uv venv --python 3.12 .venv
GIT_LFS_SKIP_SMUDGE=1 uv pip install --python .venv/bin/python \
  "lerobot[pi] @ git+https://github.com/LarsvanDorp/lerobot.git@dinov3"

.venv/bin/lerobot-rollout \
  --strategy.type=base \
  --robot.type=so101_follower --robot.port=/dev/ttyACM0 --robot.id=my_follower \
  --robot.cameras="{wrist: {type: opencv, index_or_path: <cam-index>, width: 800, height: 600, fps: 30, fourcc: MJPG}}" \
  --policy.path=larsvandorp/folding_pi05 \
  --policy.device=cuda --inference.type=sync \
  --task="fold the towel" --duration=60

Note the fourcc: MJPG in the camera config (needed on the lab Linux PC). We run without --interpolation_multiplier.

Training data

larsvandorp/magic_soup — the filtered SO-101 towel-folding set (bad episodes removed: high mean |Δa|, or no fold in the last frame).

Downloads last month
-
Safetensors
Model size
4B params
Tensor type
F32
·
BF16
·
Video Preview
loading

Model tree for larsvandorp/folding_pi05

Finetuned
(38)
this model