lsnu's picture
Add files using upload-large-folder tool
ccf25b1 verified
# Split-Expert Bring-Up (`2026-03-10`)
This bundle captures the initial PyTorch bring-up for the new packed TWIN split-action-expert path on `pi0.5`.
Included here:
- exact split warm-start checkpoints created from the original single-head PyTorch base checkpoint
- invariant-check outputs for `split_independent` and `split_communicating`
- detached real-data smoke and `20`-step training logs on `lsnu/twin_dual_push_128_train`
- reproducibility commands used for the bring-up
## Warm-start summary
Both split modes inherit the same base expert weights and per-arm input/output projections from the single-head checkpoint.
- `split_independent`
- `left_expert_max_abs_diff = 0.0`
- `right_expert_max_abs_diff = 0.0`
- `left_input_projection_max_abs_diff = 0.0`
- `right_input_projection_max_abs_diff = 0.0`
- `left_output_projection_max_abs_diff = 0.0`
- `right_output_projection_max_abs_diff = 0.0`
- `split_communicating`
- same exact inherited diffs as above
- added cross-arm communication parameters are zero-initialized at warm start
## Real-data bring-up summary
Dataset used for real-data smoke and short training:
- `lsnu/twin_dual_push_128_train`
Successful detached runs:
- `split_independent_real_smoke3_r2`
- `3` train steps on real packed TWIN data
- checkpoint saved at step `3`
- `split_communicating_real_smoke3`
- `3` train steps on real packed TWIN data
- checkpoint saved at step `3`
- `split_independent_real_train20`
- `20` train steps on real packed TWIN data
- final logged train loss at step `20`: `0.6038`
- checkpoint saved at step `20`
- `split_communicating_real_train20`
- `20` train steps on real packed TWIN data
- final logged train loss at step `20`: `0.5943`
- checkpoint saved at step `20`
## Layout
- `bootstrap_checkpoints/`
- exact split warm-start checkpoints
- `sanity_checks/`
- invariant-check outputs
- `run_logs/`
- detached real-data run logs
- `repro/commands_bringup.sh`
- reproduction commands used during the bring-up