Pi0-FAST — Piper Robot Multi-Stage Curriculum Fine-Tune

Fine-tuned Pi0-FAST on a Piper 6-DOF robot arm for pick-and-place manipulation, using a 3-stage curriculum fine-tuning strategy across two observation modalities: teleop (direct joint control) and ego (egocentric hand video).

Model Description

Base model: Pi0-FAST (PaliGemma Gemma-2B backbone + LoRA + flow-matching action expert)
Task: Pick-and-place with language intent conditioning
Action space: 7-DOF (6 joints + gripper), delta joint targets, horizon=10
Observation: Single RGB camera (480×640) + joint state [7] + language prompt
Training hardware: NVIDIA A100-SXM4-80GB

Training Strategy

3-stage curriculum fine-tuning:

Stage	Dataset	Mixture (by frames)	LR	Steps
1	Teleop only	100% Teleop	5e-5	1000
2	Mixed	60% Teleop / 40% Ego	2e-5	1500
3	Ego-heavy	28% Teleop / 72% Ego	1e-5	1000

Total: 3500 steps, ~70 min wall-clock on A100.

Dataset

Source	Episodes	Frames	Notes
Teleop	116	~80K	Direct joint teleoperation, 20fps
Ego (QA pass)	1,448	~54K	Egocentric hand video, 10fps, HaMeR QA gated
Ego (all)	1,800	~94K	Includes 352 QA-failed episodes used in Stage 3

Language intents: teleop/ego {approach, manipulate, transition} pick and place

Loss Curves

Stage	Initial Loss	Final Loss
Base model	11.64	—
Stage 1	11.64	3.17
Stage 2	5.40	2.55
Stage 3	3.45	2.00

82.8% total loss reduction from base model cold start.

WandB runs:

Stage 1: https://wandb.ai/kavinrajkr60-dsfsd/openpi/runs/d5ivmueq
Stage 2: https://wandb.ai/kavinrajkr60-dsfsd/openpi/runs/484apxj6
Stage 3: https://wandb.ai/kavinrajkr60-dsfsd/openpi/runs/45s3me56

Key Findings

Frame count is the right unit for mixture design — not episode count. Teleop episodes average ~694 frames vs ego ~37 frames, so episode-count ratios are misleading.
Stage boundary loss jumps are healthy — both domain shifts (teleop→mixed, mixed→ego-heavy) caused initial loss spikes that recovered within 100 steps, confirming no catastrophic forgetting.
QA-failed ego data did not hurt Stage 3 — including all 1800 ego episodes (with 352 QA-failed) showed no training instability under decayed LR.
Curriculum enables efficient final-stage learning — Stage 3 showed the steepest per-step improvement of any stage, enabled by Stage 1+2 foundation.

Checkpoint

The checkpoint in this repo is the Stage 3 final model (step 999) — the fully curriculum-trained ego-heavy model. Stored in Orbax format.

Usage

This checkpoint is compatible with the OpenPI training and inference framework.

from openpi.training import config as _config
from openpi.policies import pi0_fast_policy

# Load Stage 3 checkpoint
policy = pi0_fast_policy.Pi0FASTPolicy.from_checkpoint(
    "checkpoints/piper_stage3_ego/piper_stage3_ego/999"
)

Citation

If you use this work, please cite the original Pi0-FAST paper:

@article{black2024pi0,
  title={pi0: A Vision-Language-Action Flow Model for General Robot Control},
  author={Black, Kevin and others},
  year={2024}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics