lsnu commited on
Commit
422ae16
·
verified ·
1 Parent(s): a67cf5c

Upload dual-push report docs

Browse files
Files changed (2) hide show
  1. README.md +27 -9
  2. REPORT.md +136 -4
README.md CHANGED
@@ -1,21 +1,22 @@
1
  # pi0.5 Packed Multi-Arm OpenPI Artifacts
2
 
3
- This repo packages the full local artifact set for the TWIN handover packed-action-head study on `pi0.5`, including:
4
 
5
  - all finished checkpoints under `openpi/checkpoints/`
6
  - the modified `openpi/` training and evaluation code
7
  - train/eval logs and structured metric tables
8
  - reproducibility manifests and environment snapshots
9
 
10
- Two runs are included:
11
 
12
  1. an initial `2K` baseline-vs-parallel comparison
13
  2. a longer `10K` follow-up on the same packed setup
 
14
 
15
  ## Experiment setup
16
 
17
- - Train repo: `lsnu/twin_handover_256_train`
18
- - Val repo: `lsnu/twin_handover_256_val`
19
  - Hardware: `4x H100 80GB`
20
  - Precision: `bfloat16`
21
  - Semantic packed layout: `[L8, 0x8, R8, 0x8]`
@@ -40,13 +41,22 @@ Sample-based eval on the fixed `10K` final validation subset:
40
 
41
  The long run still shows a very small parallel edge on teacher-forced validation loss by `10K`, while the sample-based eval is essentially a tie.
42
 
 
 
 
 
 
 
 
 
 
43
  ## Warm-start note
44
 
45
- The packed parallel warm-start uses the slice/fuse mapping implemented in `openpi/scripts/init_parallel_pi05_from_single_pytorch.py`, but the added step-0 numerical check shows it is not exactly identical end-to-end on a real batch:
46
 
47
- - `input_projection_max_abs_diff = 0.00122881`
48
- - `masked_loss_abs_diff = 0.00398052`
49
- - `warmstart_equivalent = False`
50
 
51
  So this repo should be read as a matched warm-start study, not as a bitwise-identical step-0 control.
52
 
@@ -55,11 +65,13 @@ So this repo should be read as a matched warm-start study, not as a bitwise-iden
55
  - `openpi/`
56
  - modified source and scripts used for training/eval
57
  - copied norm-stats assets for the packed configs
58
- - full `2K` and `10K` checkpoint trees
59
  - `artifacts/twin_handover_packed_parallelization_20260309/`
60
  - initial `2K` study bundle
61
  - `artifacts/twin_handover_packed_parallelization_10k_20260309/`
62
  - `10K` follow-up bundle with metrics, logs, repro manifests, and environment snapshot
 
 
63
  - `artifacts/pi05_base_params/`
64
  - staged base parameter snapshot used during JAX-to-PyTorch conversion
65
 
@@ -69,6 +81,10 @@ So this repo should be read as a matched warm-start study, not as a bitwise-iden
69
  - `2K` summary: `artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json`
70
  - `10K` summary: `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/summary.json`
71
  - `10K` comparison table: `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/comparison_2k_vs_10k.csv`
 
 
 
 
72
  - `10K` repro commands: `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/commands_reproduce.sh`
73
  - `10K` changed-file manifest: `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
74
  - `10K` environment snapshot: `artifacts/twin_handover_packed_parallelization_10k_20260309/environment/`
@@ -90,8 +106,10 @@ Initial `2K` + `10K` study logic lives primarily in:
90
  - `openpi/scripts/check_parallel_warmstart_equivalence.py`
91
  - `openpi/scripts/run_twin_handover_packed_followup.sh`
92
  - `openpi/scripts/run_twin_handover_packed_10k.sh`
 
93
 
94
  The per-file rationale is recorded in:
95
 
96
  - `artifacts/twin_handover_packed_parallelization_20260309/repro/changed_files.txt`
97
  - `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
 
 
1
  # pi0.5 Packed Multi-Arm OpenPI Artifacts
2
 
3
+ This repo packages the full local artifact set for packed-action-head studies on `pi0.5` across TWIN handover and TWIN dual-push, including:
4
 
5
  - all finished checkpoints under `openpi/checkpoints/`
6
  - the modified `openpi/` training and evaluation code
7
  - train/eval logs and structured metric tables
8
  - reproducibility manifests and environment snapshots
9
 
10
+ Three runs are included:
11
 
12
  1. an initial `2K` baseline-vs-parallel comparison
13
  2. a longer `10K` follow-up on the same packed setup
14
+ 3. a `5K` dual-push `128` screening study on the same packed path
15
 
16
  ## Experiment setup
17
 
18
+ - Handover train/val: `lsnu/twin_handover_256_train`, `lsnu/twin_handover_256_val`
19
+ - Dual-push train/val: `lsnu/twin_dual_push_128_train`, `lsnu/twin_dual_push_128_val`
20
  - Hardware: `4x H100 80GB`
21
  - Precision: `bfloat16`
22
  - Semantic packed layout: `[L8, 0x8, R8, 0x8]`
 
41
 
42
  The long run still shows a very small parallel edge on teacher-forced validation loss by `10K`, while the sample-based eval is essentially a tie.
43
 
44
+ Dual-push `128` screening results:
45
+
46
+ | Model | 1K val loss | 2K val loss | 5K val loss | 5K 4-step MAE | 5K 10-step MAE | Train runtime |
47
+ | --- | ---: | ---: | ---: | ---: | ---: | ---: |
48
+ | Packed baseline | `0.095597` | `0.083194` | `0.055958` | `0.056830` | `0.058973` | `1:05:25` |
49
+ | Packed parallel | `0.093704` | `0.082729` | `0.055242` | `0.054630` | `0.056627` | `1:00:33` |
50
+
51
+ The dual-push screening run shows a small but consistent parallel edge at `1K`, `2K`, and `5K` on both teacher-forced validation loss and fixed-subset sample MAE.
52
+
53
  ## Warm-start note
54
 
55
+ The packed parallel warm-start uses the slice/fuse mapping implemented in `openpi/scripts/init_parallel_pi05_from_single_pytorch.py`, but the added step-0 numerical checks show it is not exactly identical end-to-end on a real batch:
56
 
57
+ - handover `10K`: `input_projection_max_abs_diff = 0.00122881`, `masked_loss_abs_diff = 0.00398052`
58
+ - dual-push `5K`: `input_projection_max_abs_diff = 0.00099802`, `masked_loss_abs_diff = 0.08580410`
59
+ - both checks report `warmstart_equivalent = False`
60
 
61
  So this repo should be read as a matched warm-start study, not as a bitwise-identical step-0 control.
62
 
 
65
  - `openpi/`
66
  - modified source and scripts used for training/eval
67
  - copied norm-stats assets for the packed configs
68
+ - full `2K`, `10K`, and dual-push `5K` checkpoint trees
69
  - `artifacts/twin_handover_packed_parallelization_20260309/`
70
  - initial `2K` study bundle
71
  - `artifacts/twin_handover_packed_parallelization_10k_20260309/`
72
  - `10K` follow-up bundle with metrics, logs, repro manifests, and environment snapshot
73
+ - `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/`
74
+ - dual-push `128` screening bundle with metrics, logs, repro manifests, and environment snapshot
75
  - `artifacts/pi05_base_params/`
76
  - staged base parameter snapshot used during JAX-to-PyTorch conversion
77
 
 
81
  - `2K` summary: `artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json`
82
  - `10K` summary: `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/summary.json`
83
  - `10K` comparison table: `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/comparison_2k_vs_10k.csv`
84
+ - dual-push `5K` summary: `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/summary.json`
85
+ - dual-push `5K` teacher-forced table: `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/teacher_forced_eval_table.csv`
86
+ - dual-push `5K` sample eval table: `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/sample_eval_table.csv`
87
+ - dual-push `5K` environment snapshot: `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/environment/`
88
  - `10K` repro commands: `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/commands_reproduce.sh`
89
  - `10K` changed-file manifest: `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
90
  - `10K` environment snapshot: `artifacts/twin_handover_packed_parallelization_10k_20260309/environment/`
 
106
  - `openpi/scripts/check_parallel_warmstart_equivalence.py`
107
  - `openpi/scripts/run_twin_handover_packed_followup.sh`
108
  - `openpi/scripts/run_twin_handover_packed_10k.sh`
109
+ - `openpi/scripts/run_twin_dual_push_128_packed_5k.sh`
110
 
111
  The per-file rationale is recorded in:
112
 
113
  - `artifacts/twin_handover_packed_parallelization_20260309/repro/changed_files.txt`
114
  - `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
115
+ - `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/repro/changed_files.txt`
REPORT.md CHANGED
@@ -1,13 +1,14 @@
1
- # Report: pi0.5 Packed Action-Head Parallelization on TWIN Handover
2
 
3
  ## Scope
4
 
5
- This repo now contains two completed studies on the same packed TWIN handover setup:
6
 
7
  1. the initial `2K` baseline-vs-parallel comparison
8
  2. the longer `10K` follow-up with richer diagnostics
 
9
 
10
- Both runs used:
11
 
12
  - train repo `lsnu/twin_handover_256_train`
13
  - val repo `lsnu/twin_handover_256_val`
@@ -19,6 +20,17 @@ Both runs used:
19
 
20
  Existing public `16`-dim norm stats were reused. No raw-data reconversion was done.
21
 
 
 
 
 
 
 
 
 
 
 
 
22
  ## Data packing and masking
23
 
24
  The TWIN converted state/action layout is `[L8, R8]`, where each arm is `7` joints plus gripper. The packed transform path added for these runs preserves the left/right semantics inside a `32`-dim model input:
@@ -71,6 +83,28 @@ The exact `10K` changed-file manifest is:
71
 
72
  - `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  ## Commands and run flow
75
 
76
  The exact `10K` rerun commands are stored in:
@@ -137,6 +171,19 @@ Interpretation:
137
  - this weakens a strict “identical function at step 0” claim
138
  - it does not invalidate the comparison as a matched warm-start study
139
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
  ## Results
141
 
142
  ### Initial `2K` study
@@ -237,6 +284,73 @@ Reference:
237
 
238
  - `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/runtime_table.csv`
239
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
240
  ## Artifact locations
241
 
242
  ### `2K` bundle
@@ -256,6 +370,18 @@ Reference:
256
  - `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
257
  - `artifacts/twin_handover_packed_parallelization_10k_20260309/environment/`
258
 
 
 
 
 
 
 
 
 
 
 
 
 
259
  ## Bottom line
260
 
261
  The `10K` follow-up suggests the `2K` near-tie was not hiding a large later divergence.
@@ -265,4 +391,10 @@ The `10K` follow-up suggests the `2K` near-tie was not hiding a large later dive
265
  - left/right imbalance does not materially change
266
  - the main difference remains subtle rather than dramatic
267
 
268
- So the packed parallel head looks competitive and slightly favorable on the masked teacher-forced objective, but the current evidence does not show a large practical separation at inference time on this task.
 
 
 
 
 
 
 
1
+ # Report: pi0.5 Packed Action-Head Parallelization on TWIN Handover and Dual Push
2
 
3
  ## Scope
4
 
5
+ This repo now contains three completed studies:
6
 
7
  1. the initial `2K` baseline-vs-parallel comparison
8
  2. the longer `10K` follow-up with richer diagnostics
9
+ 3. a `5K` dual-push `128` screening run on the same packed path
10
 
11
+ The handover runs used:
12
 
13
  - train repo `lsnu/twin_handover_256_train`
14
  - val repo `lsnu/twin_handover_256_val`
 
20
 
21
  Existing public `16`-dim norm stats were reused. No raw-data reconversion was done.
22
 
23
+ The dual-push screening run used:
24
+
25
+ - train repo `lsnu/twin_dual_push_128_train`
26
+ - val repo `lsnu/twin_dual_push_128_val`
27
+ - `4x H100 80GB`
28
+ - `bfloat16`
29
+ - packed semantic layout `[L8, 0x8, R8, 0x8]`
30
+ - active action-loss dims `[0:8]` and `[16:24]`
31
+ - masked dims `[8:16]` and `[24:32]`
32
+ - recomputed norm stats for the dual-push `128` train split
33
+
34
  ## Data packing and masking
35
 
36
  The TWIN converted state/action layout is `[L8, R8]`, where each arm is `7` joints plus gripper. The packed transform path added for these runs preserves the left/right semantics inside a `32`-dim model input:
 
83
 
84
  - `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
85
 
86
+ ### Dual-push `5K` screening additions
87
+
88
+ The dual-push screening run added or updated:
89
+
90
+ - `openpi/src/openpi/training/config.py`
91
+ - added `pi05_twin_dual_push_128_packed_baseline_pytorch_5k`
92
+ - added `pi05_twin_dual_push_128_packed_parallel_pytorch_5k`
93
+ - `openpi/scripts/run_twin_dual_push_128_packed_5k.sh`
94
+ - added detached dual-push `5K` baseline->eval sweep->parallel->eval sweep runner
95
+ - `openpi/assets/pi05_twin_dual_push_128_packed_baseline_pytorch_5k/lsnu/twin_dual_push_128_train/norm_stats.json`
96
+ - computed dual-push `128` train norm stats for the packed baseline config
97
+ - `openpi/assets/pi05_twin_dual_push_128_packed_parallel_pytorch_5k/lsnu/twin_dual_push_128_train/norm_stats.json`
98
+ - computed dual-push `128` train norm stats for the packed parallel config
99
+ - `README.md`
100
+ - updated landing page to cover the dual-push screening study
101
+ - `REPORT.md`
102
+ - updated full report to include dual-push setup, results, and artifact locations
103
+
104
+ The exact dual-push changed-file manifest is:
105
+
106
+ - `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/repro/changed_files.txt`
107
+
108
  ## Commands and run flow
109
 
110
  The exact `10K` rerun commands are stored in:
 
171
  - this weakens a strict “identical function at step 0” claim
172
  - it does not invalidate the comparison as a matched warm-start study
173
 
174
+ Dual-push `5K` warm-start check:
175
+
176
+ - `input_projection_max_abs_diff = 0.00099802`
177
+ - `input_projection_mean_abs_diff = 0.00010568`
178
+ - `baseline_masked_loss = 1.43506372`
179
+ - `parallel_masked_loss = 1.52086782`
180
+ - `masked_loss_abs_diff = 0.08580410`
181
+ - `warmstart_equivalent = False`
182
+
183
+ Reference:
184
+
185
+ - `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/sanity_checks/warmstart_dual_push_128_5k.log`
186
+
187
  ## Results
188
 
189
  ### Initial `2K` study
 
284
 
285
  - `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/runtime_table.csv`
286
 
287
+ ## Dual-push `128` screening results
288
+
289
+ ### Teacher-forced validation
290
+
291
+ | Checkpoint | Baseline | Parallel | Delta (parallel - baseline) |
292
+ | --- | ---: | ---: | ---: |
293
+ | `1000` | `0.095597` | `0.093704` | `-0.001893` |
294
+ | `2000` | `0.083194` | `0.082729` | `-0.000465` |
295
+ | `5000` | `0.055958` | `0.055242` | `-0.000716` |
296
+
297
+ The screening signal is small but consistently positive for the packed parallel model at all three checkpoints.
298
+
299
+ ### Dual-push arm breakdown
300
+
301
+ Teacher-forced validation at `5000`:
302
+
303
+ | Metric | Baseline | Parallel |
304
+ | --- | ---: | ---: |
305
+ | Mean val loss | `0.055958` | `0.055242` |
306
+ | Left arm loss | `0.017725` | `0.017044` |
307
+ | Right arm loss | `0.094191` | `0.093439` |
308
+ | Left joint loss | `0.017577` | `0.017052` |
309
+ | Left gripper loss | `0.018765` | `0.016992` |
310
+ | Right joint loss | `0.103576` | `0.102856` |
311
+ | Right gripper loss | `0.028502` | `0.027523` |
312
+ | Left/right imbalance | `0.080993` | `0.081011` |
313
+
314
+ Interpretation:
315
+
316
+ - the small parallel advantage is visible on both arms
317
+ - the right arm remains much harder than the left on this task
318
+ - left/right imbalance is essentially unchanged
319
+
320
+ ### Dual-push sample-based eval
321
+
322
+ Fixed-subset sample eval:
323
+
324
+ | Checkpoint | Model | 4-step masked MAE | 10-step masked MAE |
325
+ | --- | --- | ---: | ---: |
326
+ | `1000` | baseline | `0.103199` | `0.108652` |
327
+ | `1000` | parallel | `0.101439` | `0.106874` |
328
+ | `2000` | baseline | `0.069732` | `0.074413` |
329
+ | `2000` | parallel | `0.069053` | `0.073501` |
330
+ | `5000` | baseline | `0.056830` | `0.058973` |
331
+ | `5000` | parallel | `0.054630` | `0.056627` |
332
+
333
+ Interpretation:
334
+
335
+ - the parallel model is also slightly better on fixed-subset inference-style eval
336
+ - unlike handover, the positive signal stays visible at `5K`
337
+ - the margin is still small enough that this remains a screening result, not a paper-final claim
338
+
339
+ ### Dual-push runtime and memory
340
+
341
+ | Stage | Duration |
342
+ | --- | ---: |
343
+ | Baseline train | `1:05:25` |
344
+ | Baseline eval sweep | `0:14:34` |
345
+ | Parallel train | `1:00:33` |
346
+ | Parallel eval sweep | `0:14:39` |
347
+ | Full dual-push pipeline | `2:35:11` |
348
+
349
+ Peak VRAM:
350
+
351
+ - baseline: `35.23GB`
352
+ - parallel: `35.27GB`
353
+
354
  ## Artifact locations
355
 
356
  ### `2K` bundle
 
370
  - `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
371
  - `artifacts/twin_handover_packed_parallelization_10k_20260309/environment/`
372
 
373
+ ### Dual-push `5K` bundle
374
+
375
+ - `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/summary.json`
376
+ - `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/train_loss_table.csv`
377
+ - `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/teacher_forced_eval_table.csv`
378
+ - `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/sample_eval_table.csv`
379
+ - `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/runtime_table.csv`
380
+ - `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/repro/commands_reproduce.sh`
381
+ - `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/repro/changed_files.txt`
382
+ - `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/repro/checkpoint_locations.txt`
383
+ - `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/environment/`
384
+
385
  ## Bottom line
386
 
387
  The `10K` follow-up suggests the `2K` near-tie was not hiding a large later divergence.
 
391
  - left/right imbalance does not materially change
392
  - the main difference remains subtle rather than dramatic
393
 
394
+ The dual-push screening run adds a second signal:
395
+
396
+ - the packed parallel model is slightly better at `1K`, `2K`, and `5K`
397
+ - the same small advantage appears on both teacher-forced and sample-based eval
398
+ - the effect is still modest, but it is cleaner and more consistent than handover
399
+
400
+ So the current repo state supports a narrow next-step conclusion: packed parallelization remains subtle on handover, but dual-push is a better candidate task for the next seed/scale confirmation.