lsnu's picture
Upload 10k report docs
4cc9180 verified
|
raw
history blame
4.21 kB

pi0.5 Packed Multi-Arm OpenPI Artifacts

This repo packages the full local artifact set for the TWIN handover packed-action-head study on pi0.5, including:

  • all finished checkpoints under openpi/checkpoints/
  • the modified openpi/ training and evaluation code
  • train/eval logs and structured metric tables
  • reproducibility manifests and environment snapshots

Two runs are included:

  1. an initial 2K baseline-vs-parallel comparison
  2. a longer 10K follow-up on the same packed setup

Experiment setup

  • Train repo: lsnu/twin_handover_256_train
  • Val repo: lsnu/twin_handover_256_val
  • Hardware: 4x H100 80GB
  • Precision: bfloat16
  • Semantic packed layout: [L8, 0x8, R8, 0x8]
  • Active action-loss dims: [0:8] and [16:24]
  • Masked padded dims: [8:16] and [24:32]

Headline results

Teacher-forced masked validation loss:

Model 2K @ final 10K @ 1K 10K @ 2K 10K @ 5K 10K @ 10K
Packed baseline 0.035776 0.061130 0.041595 0.027324 0.022345
Packed parallel 0.035680 0.059715 0.039947 0.027340 0.022168

Sample-based eval on the fixed 10K final validation subset:

Model 4-step masked MAE 10-step masked MAE Train runtime Peak VRAM
Packed baseline 0.029935 0.030294 2:13:40 35.23GB
Packed parallel 0.029277 0.030241 2:20:51 35.27GB

The long run still shows a very small parallel edge on teacher-forced validation loss by 10K, while the sample-based eval is essentially a tie.

Warm-start note

The packed parallel warm-start uses the slice/fuse mapping implemented in openpi/scripts/init_parallel_pi05_from_single_pytorch.py, but the added step-0 numerical check shows it is not exactly identical end-to-end on a real batch:

  • input_projection_max_abs_diff = 0.00122881
  • masked_loss_abs_diff = 0.00398052
  • warmstart_equivalent = False

So this repo should be read as a matched warm-start study, not as a bitwise-identical step-0 control.

Repo layout

  • openpi/
    • modified source and scripts used for training/eval
    • copied norm-stats assets for the packed configs
    • full 2K and 10K checkpoint trees
  • artifacts/twin_handover_packed_parallelization_20260309/
    • initial 2K study bundle
  • artifacts/twin_handover_packed_parallelization_10k_20260309/
    • 10K follow-up bundle with metrics, logs, repro manifests, and environment snapshot
  • artifacts/pi05_base_params/
    • staged base parameter snapshot used during JAX-to-PyTorch conversion

Key files

  • Full report: REPORT.md
  • 2K summary: artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json
  • 10K summary: artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/summary.json
  • 10K comparison table: artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/comparison_2k_vs_10k.csv
  • 10K repro commands: artifacts/twin_handover_packed_parallelization_10k_20260309/repro/commands_reproduce.sh
  • 10K changed-file manifest: artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt
  • 10K environment snapshot: artifacts/twin_handover_packed_parallelization_10k_20260309/environment/

Main changed files

Initial 2K + 10K study logic lives primarily in:

  • openpi/src/openpi/transforms.py
  • openpi/src/openpi/training/config.py
  • openpi/src/openpi/training/data_loader.py
  • openpi/src/openpi/models/model.py
  • openpi/src/openpi/models/tokenizer.py
  • openpi/src/openpi/models_pytorch/pi0_pytorch.py
  • openpi/scripts/train_pytorch.py
  • openpi/scripts/eval_twin_val_loss_pytorch.py
  • openpi/scripts/init_parallel_pi05_from_single_pytorch.py
  • openpi/scripts/inspect_twin_packed_batch.py
  • openpi/scripts/check_parallel_warmstart_equivalence.py
  • openpi/scripts/run_twin_handover_packed_followup.sh
  • openpi/scripts/run_twin_handover_packed_10k.sh

The per-file rationale is recorded in:

  • artifacts/twin_handover_packed_parallelization_20260309/repro/changed_files.txt
  • artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt