towel-folding-pi05
A pi0.5 vision-language-action policy fine-tuned to fold a towel with the SO-101 follower arm and a single wrist-mounted camera. Trained on 97 teleop demonstrations (~26 k frames after trimming) recorded with the LeRobot framework.
Hardware this expects
- Robot: SO-101 follower (5-DOF + gripper).
- Camera: one wrist-mounted RGB camera, dataset key
observation.images.wrist, captured at 1280 × 720, 30 fps, MJPG. Other resolutions/keys will work but will need preprocessing tweaks. - Task language prompt:
"Fold towel"(the only string the policy was trained on).
Quick install
The policy uses a patched fork of transformers that LeRobot bundles. Install LeRobot from main with the [pi] extra — do not use pip install lerobot, the PyPI release is months behind and is v2-only.
python -m venv .venv && source .venv/bin/activate && pip install --upgrade pip && pip install "lerobot[pi]@git+https://github.com/huggingface/lerobot.git"
You also need access to the PaliGemma backbone:
- Accept the gated license at https://huggingface.co/google/paligemma-3b-pt-224 (one click, on the same HF account whose token you'll use).
- Generate a read token at https://huggingface.co/settings/tokens.
hf auth login --token "hf_yourtoken"(orhuggingface-cli loginon older versions).
Loading the policy
from lerobot.policies.pi05.modeling_pi05 import PI05Policy
policy = PI05Policy.from_pretrained("ChinLR/towel-folding-pi05")
policy.eval()
policy.to("cuda") # or "mps" / "cpu"
This pulls all 7 files in the repo:
model.safetensors— fine-tuned weights (9.35 GB).config.json,train_config.json— architecture and training config.policy_preprocessor.json+*_normalizer_processor.safetensors— input pipeline (image resize/normalize, state/action normalization with q01/q99 stats from the training dataset).policy_postprocessor.json+*_unnormalizer_processor.safetensors— output pipeline (un-normalize predicted actions back to SO-101 joint-space units).
Loading any of the above outside from_pretrained is not recommended — the pre/post-processors are baked into how the model was trained, and skipping them produces garbage actions.
Running inference on the SO-101
A minimal control loop — adapt as needed for your stack. Run at 30 Hz to match the training distribution.
import torch
from lerobot.policies.pi05.modeling_pi05 import PI05Policy
from lerobot.robots.so101_follower import SO101Follower, SO101FollowerConfig
from lerobot.cameras.opencv import OpenCVCameraConfig
policy = PI05Policy.from_pretrained("ChinLR/towel-folding-pi05").eval().to("cuda")
robot = SO101Follower(SO101FollowerConfig(
port="/dev/tty.usbmodemXXXX", # your follower port
id="followerbot",
cameras={
"wrist": OpenCVCameraConfig(
index_or_path=0,
width=1280, height=720, fps=30, fourcc="MJPG",
),
},
))
robot.connect()
task = "Fold towel"
try:
while True:
obs = robot.get_observation()
# obs is a dict of torch tensors keyed by what the dataset used:
# observation.state -> shape (6,) joint positions
# observation.images.wrist -> shape (3, H, W) uint8 RGB
with torch.inference_mode():
action = policy.select_action({**obs, "task": task}) # shape (6,)
robot.send_action(action.cpu().numpy())
finally:
robot.disconnect()
policy.select_action predicts a chunk of future actions internally and returns one per call. Reset the action queue between episodes:
policy.reset()
Training details
- Base model:
lerobot/pi05_base(Physical Intelligence pi0.5, ~4 B params, PaliGemma vision-language tower + action expert). - Dataset: 97 episodes filtered from 152 recorded demos using a manual
fold_score ≥ 3quality grade, then trimmed to remove idle lead-in / tail (~26 k useful frames). - Optimizer: AdamW, default LeRobot pi0.5 LR schedule, gradient checkpointing on.
- Compute: single A100, batch size 4, bfloat16, gradient checkpointing.
- Schedule: 30 000 steps (~5 h 41 min on A100), ~0.63 s/step.
- Loss trajectory: 0.343 (step 0) → 0.017 (step 30 000), no divergence.
- Reproducibility: all hyperparameters in
train_config.json.
Limitations
- Single task: only knows
"Fold towel". - Single camera view: trained only with a wrist-cam; will not generalize to a side view.
- Trained on one specific towel under one lighting condition — expect degradation with very different fabrics, sizes, or lighting.
- No closed-loop recovery training: large disturbances mid-episode may put the policy in OOD states.
- Not RL-tuned, not safety-bounded — wrap with appropriate joint/torque limits for your hardware.
Acknowledgements
- Physical Intelligence for pi0.5.
- HuggingFace LeRobot team for the base checkpoint, training stack, and SO-101 driver.
- ETH Zürich Euler cluster for compute.
- Downloads last month
- 41
Model tree for ChinLR/towel-folding-pi05
Base model
lerobot/pi05_base