Pi0-FAST — Piper Robot Multi-Stage Curriculum Fine-Tune

Fine-tuned Pi0-FAST on a Piper 6-DOF robot arm for pick-and-place manipulation, using a 3-stage curriculum fine-tuning strategy across two observation modalities: teleop (direct joint control) and ego (egocentric hand video).

Model Description

  • Base model: Pi0-FAST (PaliGemma Gemma-2B backbone + LoRA + flow-matching action expert)
  • Task: Pick-and-place with language intent conditioning
  • Action space: 7-DOF (6 joints + gripper), delta joint targets, horizon=10
  • Observation: Single RGB camera (480×640) + joint state [7] + language prompt
  • Training hardware: NVIDIA A100-SXM4-80GB

Training Strategy

3-stage curriculum fine-tuning:

Stage Dataset Mixture (by frames) LR Steps
1 Teleop only 100% Teleop 5e-5 1000
2 Mixed 60% Teleop / 40% Ego 2e-5 1500
3 Ego-heavy 28% Teleop / 72% Ego 1e-5 1000

Total: 3500 steps, ~70 min wall-clock on A100.

Dataset

Source Episodes Frames Notes
Teleop 116 ~80K Direct joint teleoperation, 20fps
Ego (QA pass) 1,448 ~54K Egocentric hand video, 10fps, HaMeR QA gated
Ego (all) 1,800 ~94K Includes 352 QA-failed episodes used in Stage 3

Language intents: teleop/ego {approach, manipulate, transition} pick and place

Loss Curves

Stage Initial Loss Final Loss
Base model 11.64
Stage 1 11.64 3.17
Stage 2 5.40 2.55
Stage 3 3.45 2.00

82.8% total loss reduction from base model cold start.

WandB runs:

Key Findings

  1. Frame count is the right unit for mixture design — not episode count. Teleop episodes average ~694 frames vs ego ~37 frames, so episode-count ratios are misleading.
  2. Stage boundary loss jumps are healthy — both domain shifts (teleop→mixed, mixed→ego-heavy) caused initial loss spikes that recovered within 100 steps, confirming no catastrophic forgetting.
  3. QA-failed ego data did not hurt Stage 3 — including all 1800 ego episodes (with 352 QA-failed) showed no training instability under decayed LR.
  4. Curriculum enables efficient final-stage learning — Stage 3 showed the steepest per-step improvement of any stage, enabled by Stage 1+2 foundation.

Checkpoint

The checkpoint in this repo is the Stage 3 final model (step 999) — the fully curriculum-trained ego-heavy model. Stored in Orbax format.

Usage

This checkpoint is compatible with the OpenPI training and inference framework.

from openpi.training import config as _config
from openpi.policies import pi0_fast_policy

# Load Stage 3 checkpoint
policy = pi0_fast_policy.Pi0FASTPolicy.from_checkpoint(
    "checkpoints/piper_stage3_ego/piper_stage3_ego/999"
)

Citation

If you use this work, please cite the original Pi0-FAST paper:

@article{black2024pi0,
  title={pi0: A Vision-Language-Action Flow Model for General Robot Control},
  author={Black, Kevin and others},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading