Upload dual-push report docs
Browse files
README.md
CHANGED
|
@@ -1,21 +1,22 @@
|
|
| 1 |
# pi0.5 Packed Multi-Arm OpenPI Artifacts
|
| 2 |
|
| 3 |
-
This repo packages the full local artifact set for
|
| 4 |
|
| 5 |
- all finished checkpoints under `openpi/checkpoints/`
|
| 6 |
- the modified `openpi/` training and evaluation code
|
| 7 |
- train/eval logs and structured metric tables
|
| 8 |
- reproducibility manifests and environment snapshots
|
| 9 |
|
| 10 |
-
|
| 11 |
|
| 12 |
1. an initial `2K` baseline-vs-parallel comparison
|
| 13 |
2. a longer `10K` follow-up on the same packed setup
|
|
|
|
| 14 |
|
| 15 |
## Experiment setup
|
| 16 |
|
| 17 |
-
-
|
| 18 |
-
-
|
| 19 |
- Hardware: `4x H100 80GB`
|
| 20 |
- Precision: `bfloat16`
|
| 21 |
- Semantic packed layout: `[L8, 0x8, R8, 0x8]`
|
|
@@ -40,13 +41,22 @@ Sample-based eval on the fixed `10K` final validation subset:
|
|
| 40 |
|
| 41 |
The long run still shows a very small parallel edge on teacher-forced validation loss by `10K`, while the sample-based eval is essentially a tie.
|
| 42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
## Warm-start note
|
| 44 |
|
| 45 |
-
The packed parallel warm-start uses the slice/fuse mapping implemented in `openpi/scripts/init_parallel_pi05_from_single_pytorch.py`, but the added step-0 numerical
|
| 46 |
|
| 47 |
-
- `input_projection_max_abs_diff = 0.00122881`
|
| 48 |
-
- `masked_loss_abs_diff = 0.
|
| 49 |
-
- `warmstart_equivalent = False`
|
| 50 |
|
| 51 |
So this repo should be read as a matched warm-start study, not as a bitwise-identical step-0 control.
|
| 52 |
|
|
@@ -55,11 +65,13 @@ So this repo should be read as a matched warm-start study, not as a bitwise-iden
|
|
| 55 |
- `openpi/`
|
| 56 |
- modified source and scripts used for training/eval
|
| 57 |
- copied norm-stats assets for the packed configs
|
| 58 |
-
- full `2K` and `
|
| 59 |
- `artifacts/twin_handover_packed_parallelization_20260309/`
|
| 60 |
- initial `2K` study bundle
|
| 61 |
- `artifacts/twin_handover_packed_parallelization_10k_20260309/`
|
| 62 |
- `10K` follow-up bundle with metrics, logs, repro manifests, and environment snapshot
|
|
|
|
|
|
|
| 63 |
- `artifacts/pi05_base_params/`
|
| 64 |
- staged base parameter snapshot used during JAX-to-PyTorch conversion
|
| 65 |
|
|
@@ -69,6 +81,10 @@ So this repo should be read as a matched warm-start study, not as a bitwise-iden
|
|
| 69 |
- `2K` summary: `artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json`
|
| 70 |
- `10K` summary: `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/summary.json`
|
| 71 |
- `10K` comparison table: `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/comparison_2k_vs_10k.csv`
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
- `10K` repro commands: `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/commands_reproduce.sh`
|
| 73 |
- `10K` changed-file manifest: `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
|
| 74 |
- `10K` environment snapshot: `artifacts/twin_handover_packed_parallelization_10k_20260309/environment/`
|
|
@@ -90,8 +106,10 @@ Initial `2K` + `10K` study logic lives primarily in:
|
|
| 90 |
- `openpi/scripts/check_parallel_warmstart_equivalence.py`
|
| 91 |
- `openpi/scripts/run_twin_handover_packed_followup.sh`
|
| 92 |
- `openpi/scripts/run_twin_handover_packed_10k.sh`
|
|
|
|
| 93 |
|
| 94 |
The per-file rationale is recorded in:
|
| 95 |
|
| 96 |
- `artifacts/twin_handover_packed_parallelization_20260309/repro/changed_files.txt`
|
| 97 |
- `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
|
|
|
|
|
|
| 1 |
# pi0.5 Packed Multi-Arm OpenPI Artifacts
|
| 2 |
|
| 3 |
+
This repo packages the full local artifact set for packed-action-head studies on `pi0.5` across TWIN handover and TWIN dual-push, including:
|
| 4 |
|
| 5 |
- all finished checkpoints under `openpi/checkpoints/`
|
| 6 |
- the modified `openpi/` training and evaluation code
|
| 7 |
- train/eval logs and structured metric tables
|
| 8 |
- reproducibility manifests and environment snapshots
|
| 9 |
|
| 10 |
+
Three runs are included:
|
| 11 |
|
| 12 |
1. an initial `2K` baseline-vs-parallel comparison
|
| 13 |
2. a longer `10K` follow-up on the same packed setup
|
| 14 |
+
3. a `5K` dual-push `128` screening study on the same packed path
|
| 15 |
|
| 16 |
## Experiment setup
|
| 17 |
|
| 18 |
+
- Handover train/val: `lsnu/twin_handover_256_train`, `lsnu/twin_handover_256_val`
|
| 19 |
+
- Dual-push train/val: `lsnu/twin_dual_push_128_train`, `lsnu/twin_dual_push_128_val`
|
| 20 |
- Hardware: `4x H100 80GB`
|
| 21 |
- Precision: `bfloat16`
|
| 22 |
- Semantic packed layout: `[L8, 0x8, R8, 0x8]`
|
|
|
|
| 41 |
|
| 42 |
The long run still shows a very small parallel edge on teacher-forced validation loss by `10K`, while the sample-based eval is essentially a tie.
|
| 43 |
|
| 44 |
+
Dual-push `128` screening results:
|
| 45 |
+
|
| 46 |
+
| Model | 1K val loss | 2K val loss | 5K val loss | 5K 4-step MAE | 5K 10-step MAE | Train runtime |
|
| 47 |
+
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
|
| 48 |
+
| Packed baseline | `0.095597` | `0.083194` | `0.055958` | `0.056830` | `0.058973` | `1:05:25` |
|
| 49 |
+
| Packed parallel | `0.093704` | `0.082729` | `0.055242` | `0.054630` | `0.056627` | `1:00:33` |
|
| 50 |
+
|
| 51 |
+
The dual-push screening run shows a small but consistent parallel edge at `1K`, `2K`, and `5K` on both teacher-forced validation loss and fixed-subset sample MAE.
|
| 52 |
+
|
| 53 |
## Warm-start note
|
| 54 |
|
| 55 |
+
The packed parallel warm-start uses the slice/fuse mapping implemented in `openpi/scripts/init_parallel_pi05_from_single_pytorch.py`, but the added step-0 numerical checks show it is not exactly identical end-to-end on a real batch:
|
| 56 |
|
| 57 |
+
- handover `10K`: `input_projection_max_abs_diff = 0.00122881`, `masked_loss_abs_diff = 0.00398052`
|
| 58 |
+
- dual-push `5K`: `input_projection_max_abs_diff = 0.00099802`, `masked_loss_abs_diff = 0.08580410`
|
| 59 |
+
- both checks report `warmstart_equivalent = False`
|
| 60 |
|
| 61 |
So this repo should be read as a matched warm-start study, not as a bitwise-identical step-0 control.
|
| 62 |
|
|
|
|
| 65 |
- `openpi/`
|
| 66 |
- modified source and scripts used for training/eval
|
| 67 |
- copied norm-stats assets for the packed configs
|
| 68 |
+
- full `2K`, `10K`, and dual-push `5K` checkpoint trees
|
| 69 |
- `artifacts/twin_handover_packed_parallelization_20260309/`
|
| 70 |
- initial `2K` study bundle
|
| 71 |
- `artifacts/twin_handover_packed_parallelization_10k_20260309/`
|
| 72 |
- `10K` follow-up bundle with metrics, logs, repro manifests, and environment snapshot
|
| 73 |
+
- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/`
|
| 74 |
+
- dual-push `128` screening bundle with metrics, logs, repro manifests, and environment snapshot
|
| 75 |
- `artifacts/pi05_base_params/`
|
| 76 |
- staged base parameter snapshot used during JAX-to-PyTorch conversion
|
| 77 |
|
|
|
|
| 81 |
- `2K` summary: `artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json`
|
| 82 |
- `10K` summary: `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/summary.json`
|
| 83 |
- `10K` comparison table: `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/comparison_2k_vs_10k.csv`
|
| 84 |
+
- dual-push `5K` summary: `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/summary.json`
|
| 85 |
+
- dual-push `5K` teacher-forced table: `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/teacher_forced_eval_table.csv`
|
| 86 |
+
- dual-push `5K` sample eval table: `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/sample_eval_table.csv`
|
| 87 |
+
- dual-push `5K` environment snapshot: `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/environment/`
|
| 88 |
- `10K` repro commands: `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/commands_reproduce.sh`
|
| 89 |
- `10K` changed-file manifest: `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
|
| 90 |
- `10K` environment snapshot: `artifacts/twin_handover_packed_parallelization_10k_20260309/environment/`
|
|
|
|
| 106 |
- `openpi/scripts/check_parallel_warmstart_equivalence.py`
|
| 107 |
- `openpi/scripts/run_twin_handover_packed_followup.sh`
|
| 108 |
- `openpi/scripts/run_twin_handover_packed_10k.sh`
|
| 109 |
+
- `openpi/scripts/run_twin_dual_push_128_packed_5k.sh`
|
| 110 |
|
| 111 |
The per-file rationale is recorded in:
|
| 112 |
|
| 113 |
- `artifacts/twin_handover_packed_parallelization_20260309/repro/changed_files.txt`
|
| 114 |
- `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
|
| 115 |
+
- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/repro/changed_files.txt`
|
REPORT.md
CHANGED
|
@@ -1,13 +1,14 @@
|
|
| 1 |
-
# Report: pi0.5 Packed Action-Head Parallelization on TWIN Handover
|
| 2 |
|
| 3 |
## Scope
|
| 4 |
|
| 5 |
-
This repo now contains
|
| 6 |
|
| 7 |
1. the initial `2K` baseline-vs-parallel comparison
|
| 8 |
2. the longer `10K` follow-up with richer diagnostics
|
|
|
|
| 9 |
|
| 10 |
-
|
| 11 |
|
| 12 |
- train repo `lsnu/twin_handover_256_train`
|
| 13 |
- val repo `lsnu/twin_handover_256_val`
|
|
@@ -19,6 +20,17 @@ Both runs used:
|
|
| 19 |
|
| 20 |
Existing public `16`-dim norm stats were reused. No raw-data reconversion was done.
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
## Data packing and masking
|
| 23 |
|
| 24 |
The TWIN converted state/action layout is `[L8, R8]`, where each arm is `7` joints plus gripper. The packed transform path added for these runs preserves the left/right semantics inside a `32`-dim model input:
|
|
@@ -71,6 +83,28 @@ The exact `10K` changed-file manifest is:
|
|
| 71 |
|
| 72 |
- `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
|
| 73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
## Commands and run flow
|
| 75 |
|
| 76 |
The exact `10K` rerun commands are stored in:
|
|
@@ -137,6 +171,19 @@ Interpretation:
|
|
| 137 |
- this weakens a strict “identical function at step 0” claim
|
| 138 |
- it does not invalidate the comparison as a matched warm-start study
|
| 139 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
## Results
|
| 141 |
|
| 142 |
### Initial `2K` study
|
|
@@ -237,6 +284,73 @@ Reference:
|
|
| 237 |
|
| 238 |
- `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/runtime_table.csv`
|
| 239 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 240 |
## Artifact locations
|
| 241 |
|
| 242 |
### `2K` bundle
|
|
@@ -256,6 +370,18 @@ Reference:
|
|
| 256 |
- `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
|
| 257 |
- `artifacts/twin_handover_packed_parallelization_10k_20260309/environment/`
|
| 258 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 259 |
## Bottom line
|
| 260 |
|
| 261 |
The `10K` follow-up suggests the `2K` near-tie was not hiding a large later divergence.
|
|
@@ -265,4 +391,10 @@ The `10K` follow-up suggests the `2K` near-tie was not hiding a large later dive
|
|
| 265 |
- left/right imbalance does not materially change
|
| 266 |
- the main difference remains subtle rather than dramatic
|
| 267 |
|
| 268 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Report: pi0.5 Packed Action-Head Parallelization on TWIN Handover and Dual Push
|
| 2 |
|
| 3 |
## Scope
|
| 4 |
|
| 5 |
+
This repo now contains three completed studies:
|
| 6 |
|
| 7 |
1. the initial `2K` baseline-vs-parallel comparison
|
| 8 |
2. the longer `10K` follow-up with richer diagnostics
|
| 9 |
+
3. a `5K` dual-push `128` screening run on the same packed path
|
| 10 |
|
| 11 |
+
The handover runs used:
|
| 12 |
|
| 13 |
- train repo `lsnu/twin_handover_256_train`
|
| 14 |
- val repo `lsnu/twin_handover_256_val`
|
|
|
|
| 20 |
|
| 21 |
Existing public `16`-dim norm stats were reused. No raw-data reconversion was done.
|
| 22 |
|
| 23 |
+
The dual-push screening run used:
|
| 24 |
+
|
| 25 |
+
- train repo `lsnu/twin_dual_push_128_train`
|
| 26 |
+
- val repo `lsnu/twin_dual_push_128_val`
|
| 27 |
+
- `4x H100 80GB`
|
| 28 |
+
- `bfloat16`
|
| 29 |
+
- packed semantic layout `[L8, 0x8, R8, 0x8]`
|
| 30 |
+
- active action-loss dims `[0:8]` and `[16:24]`
|
| 31 |
+
- masked dims `[8:16]` and `[24:32]`
|
| 32 |
+
- recomputed norm stats for the dual-push `128` train split
|
| 33 |
+
|
| 34 |
## Data packing and masking
|
| 35 |
|
| 36 |
The TWIN converted state/action layout is `[L8, R8]`, where each arm is `7` joints plus gripper. The packed transform path added for these runs preserves the left/right semantics inside a `32`-dim model input:
|
|
|
|
| 83 |
|
| 84 |
- `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
|
| 85 |
|
| 86 |
+
### Dual-push `5K` screening additions
|
| 87 |
+
|
| 88 |
+
The dual-push screening run added or updated:
|
| 89 |
+
|
| 90 |
+
- `openpi/src/openpi/training/config.py`
|
| 91 |
+
- added `pi05_twin_dual_push_128_packed_baseline_pytorch_5k`
|
| 92 |
+
- added `pi05_twin_dual_push_128_packed_parallel_pytorch_5k`
|
| 93 |
+
- `openpi/scripts/run_twin_dual_push_128_packed_5k.sh`
|
| 94 |
+
- added detached dual-push `5K` baseline->eval sweep->parallel->eval sweep runner
|
| 95 |
+
- `openpi/assets/pi05_twin_dual_push_128_packed_baseline_pytorch_5k/lsnu/twin_dual_push_128_train/norm_stats.json`
|
| 96 |
+
- computed dual-push `128` train norm stats for the packed baseline config
|
| 97 |
+
- `openpi/assets/pi05_twin_dual_push_128_packed_parallel_pytorch_5k/lsnu/twin_dual_push_128_train/norm_stats.json`
|
| 98 |
+
- computed dual-push `128` train norm stats for the packed parallel config
|
| 99 |
+
- `README.md`
|
| 100 |
+
- updated landing page to cover the dual-push screening study
|
| 101 |
+
- `REPORT.md`
|
| 102 |
+
- updated full report to include dual-push setup, results, and artifact locations
|
| 103 |
+
|
| 104 |
+
The exact dual-push changed-file manifest is:
|
| 105 |
+
|
| 106 |
+
- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/repro/changed_files.txt`
|
| 107 |
+
|
| 108 |
## Commands and run flow
|
| 109 |
|
| 110 |
The exact `10K` rerun commands are stored in:
|
|
|
|
| 171 |
- this weakens a strict “identical function at step 0” claim
|
| 172 |
- it does not invalidate the comparison as a matched warm-start study
|
| 173 |
|
| 174 |
+
Dual-push `5K` warm-start check:
|
| 175 |
+
|
| 176 |
+
- `input_projection_max_abs_diff = 0.00099802`
|
| 177 |
+
- `input_projection_mean_abs_diff = 0.00010568`
|
| 178 |
+
- `baseline_masked_loss = 1.43506372`
|
| 179 |
+
- `parallel_masked_loss = 1.52086782`
|
| 180 |
+
- `masked_loss_abs_diff = 0.08580410`
|
| 181 |
+
- `warmstart_equivalent = False`
|
| 182 |
+
|
| 183 |
+
Reference:
|
| 184 |
+
|
| 185 |
+
- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/sanity_checks/warmstart_dual_push_128_5k.log`
|
| 186 |
+
|
| 187 |
## Results
|
| 188 |
|
| 189 |
### Initial `2K` study
|
|
|
|
| 284 |
|
| 285 |
- `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/runtime_table.csv`
|
| 286 |
|
| 287 |
+
## Dual-push `128` screening results
|
| 288 |
+
|
| 289 |
+
### Teacher-forced validation
|
| 290 |
+
|
| 291 |
+
| Checkpoint | Baseline | Parallel | Delta (parallel - baseline) |
|
| 292 |
+
| --- | ---: | ---: | ---: |
|
| 293 |
+
| `1000` | `0.095597` | `0.093704` | `-0.001893` |
|
| 294 |
+
| `2000` | `0.083194` | `0.082729` | `-0.000465` |
|
| 295 |
+
| `5000` | `0.055958` | `0.055242` | `-0.000716` |
|
| 296 |
+
|
| 297 |
+
The screening signal is small but consistently positive for the packed parallel model at all three checkpoints.
|
| 298 |
+
|
| 299 |
+
### Dual-push arm breakdown
|
| 300 |
+
|
| 301 |
+
Teacher-forced validation at `5000`:
|
| 302 |
+
|
| 303 |
+
| Metric | Baseline | Parallel |
|
| 304 |
+
| --- | ---: | ---: |
|
| 305 |
+
| Mean val loss | `0.055958` | `0.055242` |
|
| 306 |
+
| Left arm loss | `0.017725` | `0.017044` |
|
| 307 |
+
| Right arm loss | `0.094191` | `0.093439` |
|
| 308 |
+
| Left joint loss | `0.017577` | `0.017052` |
|
| 309 |
+
| Left gripper loss | `0.018765` | `0.016992` |
|
| 310 |
+
| Right joint loss | `0.103576` | `0.102856` |
|
| 311 |
+
| Right gripper loss | `0.028502` | `0.027523` |
|
| 312 |
+
| Left/right imbalance | `0.080993` | `0.081011` |
|
| 313 |
+
|
| 314 |
+
Interpretation:
|
| 315 |
+
|
| 316 |
+
- the small parallel advantage is visible on both arms
|
| 317 |
+
- the right arm remains much harder than the left on this task
|
| 318 |
+
- left/right imbalance is essentially unchanged
|
| 319 |
+
|
| 320 |
+
### Dual-push sample-based eval
|
| 321 |
+
|
| 322 |
+
Fixed-subset sample eval:
|
| 323 |
+
|
| 324 |
+
| Checkpoint | Model | 4-step masked MAE | 10-step masked MAE |
|
| 325 |
+
| --- | --- | ---: | ---: |
|
| 326 |
+
| `1000` | baseline | `0.103199` | `0.108652` |
|
| 327 |
+
| `1000` | parallel | `0.101439` | `0.106874` |
|
| 328 |
+
| `2000` | baseline | `0.069732` | `0.074413` |
|
| 329 |
+
| `2000` | parallel | `0.069053` | `0.073501` |
|
| 330 |
+
| `5000` | baseline | `0.056830` | `0.058973` |
|
| 331 |
+
| `5000` | parallel | `0.054630` | `0.056627` |
|
| 332 |
+
|
| 333 |
+
Interpretation:
|
| 334 |
+
|
| 335 |
+
- the parallel model is also slightly better on fixed-subset inference-style eval
|
| 336 |
+
- unlike handover, the positive signal stays visible at `5K`
|
| 337 |
+
- the margin is still small enough that this remains a screening result, not a paper-final claim
|
| 338 |
+
|
| 339 |
+
### Dual-push runtime and memory
|
| 340 |
+
|
| 341 |
+
| Stage | Duration |
|
| 342 |
+
| --- | ---: |
|
| 343 |
+
| Baseline train | `1:05:25` |
|
| 344 |
+
| Baseline eval sweep | `0:14:34` |
|
| 345 |
+
| Parallel train | `1:00:33` |
|
| 346 |
+
| Parallel eval sweep | `0:14:39` |
|
| 347 |
+
| Full dual-push pipeline | `2:35:11` |
|
| 348 |
+
|
| 349 |
+
Peak VRAM:
|
| 350 |
+
|
| 351 |
+
- baseline: `35.23GB`
|
| 352 |
+
- parallel: `35.27GB`
|
| 353 |
+
|
| 354 |
## Artifact locations
|
| 355 |
|
| 356 |
### `2K` bundle
|
|
|
|
| 370 |
- `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
|
| 371 |
- `artifacts/twin_handover_packed_parallelization_10k_20260309/environment/`
|
| 372 |
|
| 373 |
+
### Dual-push `5K` bundle
|
| 374 |
+
|
| 375 |
+
- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/summary.json`
|
| 376 |
+
- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/train_loss_table.csv`
|
| 377 |
+
- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/teacher_forced_eval_table.csv`
|
| 378 |
+
- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/sample_eval_table.csv`
|
| 379 |
+
- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/runtime_table.csv`
|
| 380 |
+
- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/repro/commands_reproduce.sh`
|
| 381 |
+
- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/repro/changed_files.txt`
|
| 382 |
+
- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/repro/checkpoint_locations.txt`
|
| 383 |
+
- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/environment/`
|
| 384 |
+
|
| 385 |
## Bottom line
|
| 386 |
|
| 387 |
The `10K` follow-up suggests the `2K` near-tie was not hiding a large later divergence.
|
|
|
|
| 391 |
- left/right imbalance does not materially change
|
| 392 |
- the main difference remains subtle rather than dramatic
|
| 393 |
|
| 394 |
+
The dual-push screening run adds a second signal:
|
| 395 |
+
|
| 396 |
+
- the packed parallel model is slightly better at `1K`, `2K`, and `5K`
|
| 397 |
+
- the same small advantage appears on both teacher-forced and sample-based eval
|
| 398 |
+
- the effect is still modest, but it is cleaner and more consistent than handover
|
| 399 |
+
|
| 400 |
+
So the current repo state supports a narrow next-step conclusion: packed parallelization remains subtle on handover, but dual-push is a better candidate task for the next seed/scale confirmation.
|