YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
pi0.5 Packed Multi-Arm OpenPI Artifacts
This repo packages the full local artifact set for packed-action-head studies on pi0.5 across TWIN handover and TWIN dual-push, including:
- all finished checkpoints under
openpi/checkpoints/ - the modified
openpi/training and evaluation code - train/eval logs and structured metric tables
- reproducibility manifests and environment snapshots
Three runs are included:
- an initial
2Kbaseline-vs-parallel comparison - a longer
10Kfollow-up on the same packed setup - a
5Kdual-push128screening study on the same packed path
This update also adds a split-action-expert bring-up bundle for the packed TWIN path, covering:
- exact single-to-split warm-start checkpoints for
split_independentandsplit_communicating - invariant checks for the new split architecture
- detached real-data smoke and
20-step training runs onlsnu/twin_dual_push_128_train - the code changes that introduce the new split-expert action path
Experiment setup
- Handover train/val:
lsnu/twin_handover_256_train,lsnu/twin_handover_256_val - Dual-push train/val:
lsnu/twin_dual_push_128_train,lsnu/twin_dual_push_128_val - Hardware:
4x H100 80GB - Precision:
bfloat16 - Semantic packed layout:
[L8, 0x8, R8, 0x8] - Active action-loss dims:
[0:8]and[16:24] - Masked padded dims:
[8:16]and[24:32]
Headline results
Teacher-forced masked validation loss:
| Model | 2K @ final | 10K @ 1K | 10K @ 2K | 10K @ 5K | 10K @ 10K |
|---|---|---|---|---|---|
| Packed baseline | 0.035776 |
0.061130 |
0.041595 |
0.027324 |
0.022345 |
| Packed parallel | 0.035680 |
0.059715 |
0.039947 |
0.027340 |
0.022168 |
Sample-based eval on the fixed 10K final validation subset:
| Model | 4-step masked MAE | 10-step masked MAE | Train runtime | Peak VRAM |
|---|---|---|---|---|
| Packed baseline | 0.029935 |
0.030294 |
2:13:40 |
35.23GB |
| Packed parallel | 0.029277 |
0.030241 |
2:20:51 |
35.27GB |
The long run still shows a very small parallel edge on teacher-forced validation loss by 10K, while the sample-based eval is essentially a tie.
Dual-push 128 screening results:
| Model | 1K val loss | 2K val loss | 5K val loss | 5K 4-step MAE | 5K 10-step MAE | Train runtime |
|---|---|---|---|---|---|---|
| Packed baseline | 0.095597 |
0.083194 |
0.055958 |
0.056830 |
0.058973 |
1:05:25 |
| Packed parallel | 0.093704 |
0.082729 |
0.055242 |
0.054630 |
0.056627 |
1:00:33 |
The dual-push screening run shows a small but consistent parallel edge at 1K, 2K, and 5K on both teacher-forced validation loss and fixed-subset sample MAE.
Warm-start note
The packed parallel warm-start uses the slice/fuse mapping implemented in openpi/scripts/init_parallel_pi05_from_single_pytorch.py, but the added step-0 numerical checks show it is not exactly identical end-to-end on a real batch:
- handover
10K:input_projection_max_abs_diff = 0.00122881,masked_loss_abs_diff = 0.00398052 - dual-push
5K:input_projection_max_abs_diff = 0.00099802,masked_loss_abs_diff = 0.08580410 - both checks report
warmstart_equivalent = False
So this repo should be read as a matched warm-start study, not as a bitwise-identical step-0 control.
Split-Expert Bring-Up (2026-03-10)
The current repo now contains a true split-action-expert implementation in addition to the earlier packed head-only factorization. The new config flag is action_expert_mode with:
sharedhead_only_parallelsplit_independentsplit_communicating
Key bring-up results:
- the split warm-start copies the original single
gemma_expertinto exact left/right expert branches for both split modes split_independentpasses the branch-local invariants:- identical left/right inputs produce identical suffix outputs
- perturbing right-arm inputs leaves left-arm outputs unchanged, and vice versa
- both split modes pass detached real-data training on packed TWIN dual-push:
3-step real-data smoke run with checkpoint save20-step real-data training run with checkpoint save
- the communicating model emits nonzero cross-arm attention diagnostics and remains finite through the real-data
20-step run
New bring-up artifact bundle:
artifacts/twin_split_expert_bringup_20260310/- split warm-start checkpoints
- invariant-check outputs
- reproducibility commands
- summary README for the split-expert bring-up
Repo layout
openpi/- modified source and scripts used for training/eval
- copied norm-stats assets for the packed configs
- full
2K,10K, and dual-push5Kcheckpoint trees
artifacts/twin_handover_packed_parallelization_20260309/- initial
2Kstudy bundle
- initial
artifacts/twin_handover_packed_parallelization_10k_20260309/10Kfollow-up bundle with metrics, logs, repro manifests, and environment snapshot
artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/- dual-push
128screening bundle with metrics, logs, repro manifests, and environment snapshot
- dual-push
artifacts/twin_split_expert_bringup_20260310/- split-expert warm-start checkpoints, sanity checks, and bring-up repro commands
artifacts/pi05_base_params/- staged base parameter snapshot used during JAX-to-PyTorch conversion
Key files
- Full report:
REPORT.md 2Ksummary:artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json10Ksummary:artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/summary.json10Kcomparison table:artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/comparison_2k_vs_10k.csv- dual-push
5Ksummary:artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/summary.json - dual-push
5Kteacher-forced table:artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/teacher_forced_eval_table.csv - dual-push
5Ksample eval table:artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/sample_eval_table.csv - dual-push
5Kenvironment snapshot:artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/environment/ - split-expert bring-up summary:
artifacts/twin_split_expert_bringup_20260310/README.md - split-expert repro commands:
artifacts/twin_split_expert_bringup_20260310/repro/commands_bringup.sh - split-expert invariant check outputs:
artifacts/twin_split_expert_bringup_20260310/sanity_checks/ - split-expert real-data logs:
openpi/run_logs/split_independent_real_smoke3_r2.log,openpi/run_logs/split_communicating_real_smoke3.log,openpi/run_logs/split_independent_real_train20.log,openpi/run_logs/split_communicating_real_train20.log - split-expert real-data checkpoints:
openpi/checkpoints/pi05_twin_dual_push_128_packed_split_expert_independent_pytorch_5k/,openpi/checkpoints/pi05_twin_dual_push_128_packed_split_expert_communicating_pytorch_5k/ 10Krepro commands:artifacts/twin_handover_packed_parallelization_10k_20260309/repro/commands_reproduce.sh10Kchanged-file manifest:artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt10Kenvironment snapshot:artifacts/twin_handover_packed_parallelization_10k_20260309/environment/
Main changed files
Initial 2K + 10K study logic lives primarily in:
openpi/src/openpi/transforms.pyopenpi/src/openpi/training/config.pyopenpi/src/openpi/training/data_loader.pyopenpi/src/openpi/models/model.pyopenpi/src/openpi/models/tokenizer.pyopenpi/src/openpi/models_pytorch/pi0_pytorch.pyopenpi/scripts/train_pytorch.pyopenpi/scripts/eval_twin_val_loss_pytorch.pyopenpi/scripts/init_parallel_pi05_from_single_pytorch.pyopenpi/scripts/inspect_twin_packed_batch.pyopenpi/scripts/check_parallel_warmstart_equivalence.pyopenpi/scripts/check_split_expert_invariants.pyopenpi/scripts/run_twin_handover_packed_followup.shopenpi/scripts/run_twin_handover_packed_10k.shopenpi/scripts/run_twin_dual_push_128_packed_5k.sh
The per-file rationale is recorded in:
artifacts/twin_handover_packed_parallelization_20260309/repro/changed_files.txtartifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txtartifacts/twin_dual_push_128_packed_parallelization_5k_20260310/repro/changed_files.txt