pi0.5 Build Block Tower โ€” Baseline

Fine-tuned pi0.5 checkpoint for building a block tower, trained with imitation learning only (no advantage conditioning) on 200 human demonstrations.

Experiment

  • Objective: Train a block tower policy from pi0.5 base weights using human demos only (imitation learning baseline).
  • Weight init: weights/pi05_base/params (pi0.5 base weights).
  • Advantage conditioning: None (standard imitation learning).
  • Target steps: 100,000

Config

  • Config name: pi05_build_block_tower_baseline
  • Model: pi0.5 (pi05=True, action_horizon=50)
  • Batch size: 36
  • Learning rate: 5e-5 cosine decay (10k warmup)
  • Optimizer: AdamW (gradient clip norm 1.0)
  • EMA decay: 0.999
  • Delta actions: enabled
  • State/action space: 7D joint-space

Dataset

  • villekuosmanen/build_block_tower (200 episodes, LeRobot v2.1)

Checkpoint Hashes

Verify integrity with tar cf - -C checkpoints/<step> params | sha256sum.

Step Loss SHA-256
25,000 0.0111 e137fa5a690e21f275e9dcbbffda6428c0eabd0100270c407ac366c225674ea2
50,000 0.0077 51e01160ab7119dbbf6b7c4515900cc794eb11773c47ced6f853e0c4f976c652
55,000 0.0068 158ff29591982484b6594afaabec078bdf8b59e7b72a4c886d4a542233581d11

W&B

Repo Structure

assets/                      # Norm stats for inference
checkpoints/<step>/params/   # Model weights (params only)
README.md                    # This file
TRAINING_LOG.md              # Training log

Usage

from openpi.training.config import get_config
from openpi.serving.policy_server import PolicyServer

config = get_config("pi05_build_block_tower_baseline")
server = PolicyServer(config, checkpoint_path="checkpoints/<step>/params")
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading