Instructions to use Kavin60606/piper-pi0fast-multistage with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use Kavin60606/piper-pi0fast-multistage with LeRobot:
- Notebooks
- Google Colab
- Kaggle
Pi0-FAST — Piper Robot Multi-Stage Curriculum Fine-Tune
Fine-tuned Pi0-FAST on a Piper 6-DOF robot arm for pick-and-place manipulation, using a 3-stage curriculum fine-tuning strategy across two observation modalities: teleop (direct joint control) and ego (egocentric hand video).
Model Description
- Base model: Pi0-FAST (PaliGemma Gemma-2B backbone + LoRA + flow-matching action expert)
- Task: Pick-and-place with language intent conditioning
- Action space: 7-DOF (6 joints + gripper), delta joint targets, horizon=10
- Observation: Single RGB camera (480×640) + joint state [7] + language prompt
- Training hardware: NVIDIA A100-SXM4-80GB
Training Strategy
3-stage curriculum fine-tuning:
| Stage | Dataset | Mixture (by frames) | LR | Steps |
|---|---|---|---|---|
| 1 | Teleop only | 100% Teleop | 5e-5 | 1000 |
| 2 | Mixed | 60% Teleop / 40% Ego | 2e-5 | 1500 |
| 3 | Ego-heavy | 28% Teleop / 72% Ego | 1e-5 | 1000 |
Total: 3500 steps, ~70 min wall-clock on A100.
Dataset
| Source | Episodes | Frames | Notes |
|---|---|---|---|
| Teleop | 116 | ~80K | Direct joint teleoperation, 20fps |
| Ego (QA pass) | 1,448 | ~54K | Egocentric hand video, 10fps, HaMeR QA gated |
| Ego (all) | 1,800 | ~94K | Includes 352 QA-failed episodes used in Stage 3 |
Language intents: teleop/ego {approach, manipulate, transition} pick and place
Loss Curves
| Stage | Initial Loss | Final Loss |
|---|---|---|
| Base model | 11.64 | — |
| Stage 1 | 11.64 | 3.17 |
| Stage 2 | 5.40 | 2.55 |
| Stage 3 | 3.45 | 2.00 |
82.8% total loss reduction from base model cold start.
WandB runs:
- Stage 1: https://wandb.ai/kavinrajkr60-dsfsd/openpi/runs/d5ivmueq
- Stage 2: https://wandb.ai/kavinrajkr60-dsfsd/openpi/runs/484apxj6
- Stage 3: https://wandb.ai/kavinrajkr60-dsfsd/openpi/runs/45s3me56
Key Findings
- Frame count is the right unit for mixture design — not episode count. Teleop episodes average ~694 frames vs ego ~37 frames, so episode-count ratios are misleading.
- Stage boundary loss jumps are healthy — both domain shifts (teleop→mixed, mixed→ego-heavy) caused initial loss spikes that recovered within 100 steps, confirming no catastrophic forgetting.
- QA-failed ego data did not hurt Stage 3 — including all 1800 ego episodes (with 352 QA-failed) showed no training instability under decayed LR.
- Curriculum enables efficient final-stage learning — Stage 3 showed the steepest per-step improvement of any stage, enabled by Stage 1+2 foundation.
Checkpoint
The checkpoint in this repo is the Stage 3 final model (step 999) — the fully curriculum-trained ego-heavy model. Stored in Orbax format.
Usage
This checkpoint is compatible with the OpenPI training and inference framework.
from openpi.training import config as _config
from openpi.policies import pi0_fast_policy
# Load Stage 3 checkpoint
policy = pi0_fast_policy.Pi0FASTPolicy.from_checkpoint(
"checkpoints/piper_stage3_ego/piper_stage3_ego/999"
)
Citation
If you use this work, please cite the original Pi0-FAST paper:
@article{black2024pi0,
title={pi0: A Vision-Language-Action Flow Model for General Robot Control},
author={Black, Kevin and others},
year={2024}
}