pi0.5 Packed Multi-Arm OpenPI Artifacts
This repo packages the full local artifact set for the TWIN handover packed-action-head study on pi0.5, including:
- all finished checkpoints under
openpi/checkpoints/ - the modified
openpi/training and evaluation code - train/eval logs and structured metric tables
- reproducibility manifests and environment snapshots
Two runs are included:
- an initial
2Kbaseline-vs-parallel comparison - a longer
10Kfollow-up on the same packed setup
Experiment setup
- Train repo:
lsnu/twin_handover_256_train - Val repo:
lsnu/twin_handover_256_val - Hardware:
4x H100 80GB - Precision:
bfloat16 - Semantic packed layout:
[L8, 0x8, R8, 0x8] - Active action-loss dims:
[0:8]and[16:24] - Masked padded dims:
[8:16]and[24:32]
Headline results
Teacher-forced masked validation loss:
| Model | 2K @ final | 10K @ 1K | 10K @ 2K | 10K @ 5K | 10K @ 10K |
|---|---|---|---|---|---|
| Packed baseline | 0.035776 |
0.061130 |
0.041595 |
0.027324 |
0.022345 |
| Packed parallel | 0.035680 |
0.059715 |
0.039947 |
0.027340 |
0.022168 |
Sample-based eval on the fixed 10K final validation subset:
| Model | 4-step masked MAE | 10-step masked MAE | Train runtime | Peak VRAM |
|---|---|---|---|---|
| Packed baseline | 0.029935 |
0.030294 |
2:13:40 |
35.23GB |
| Packed parallel | 0.029277 |
0.030241 |
2:20:51 |
35.27GB |
The long run still shows a very small parallel edge on teacher-forced validation loss by 10K, while the sample-based eval is essentially a tie.
Warm-start note
The packed parallel warm-start uses the slice/fuse mapping implemented in openpi/scripts/init_parallel_pi05_from_single_pytorch.py, but the added step-0 numerical check shows it is not exactly identical end-to-end on a real batch:
input_projection_max_abs_diff = 0.00122881masked_loss_abs_diff = 0.00398052warmstart_equivalent = False
So this repo should be read as a matched warm-start study, not as a bitwise-identical step-0 control.
Repo layout
openpi/- modified source and scripts used for training/eval
- copied norm-stats assets for the packed configs
- full
2Kand10Kcheckpoint trees
artifacts/twin_handover_packed_parallelization_20260309/- initial
2Kstudy bundle
- initial
artifacts/twin_handover_packed_parallelization_10k_20260309/10Kfollow-up bundle with metrics, logs, repro manifests, and environment snapshot
artifacts/pi05_base_params/- staged base parameter snapshot used during JAX-to-PyTorch conversion
Key files
- Full report:
REPORT.md 2Ksummary:artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json10Ksummary:artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/summary.json10Kcomparison table:artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/comparison_2k_vs_10k.csv10Krepro commands:artifacts/twin_handover_packed_parallelization_10k_20260309/repro/commands_reproduce.sh10Kchanged-file manifest:artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt10Kenvironment snapshot:artifacts/twin_handover_packed_parallelization_10k_20260309/environment/
Main changed files
Initial 2K + 10K study logic lives primarily in:
openpi/src/openpi/transforms.pyopenpi/src/openpi/training/config.pyopenpi/src/openpi/training/data_loader.pyopenpi/src/openpi/models/model.pyopenpi/src/openpi/models/tokenizer.pyopenpi/src/openpi/models_pytorch/pi0_pytorch.pyopenpi/scripts/train_pytorch.pyopenpi/scripts/eval_twin_val_loss_pytorch.pyopenpi/scripts/init_parallel_pi05_from_single_pytorch.pyopenpi/scripts/inspect_twin_packed_batch.pyopenpi/scripts/check_parallel_warmstart_equivalence.pyopenpi/scripts/run_twin_handover_packed_followup.shopenpi/scripts/run_twin_handover_packed_10k.sh
The per-file rationale is recorded in:
artifacts/twin_handover_packed_parallelization_20260309/repro/changed_files.txtartifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt