lsnu
/

pi05tests-openpi-multiarm

Safetensors

Model card Files Files and versions

xet

Community

lsnu commited on 1 day ago

Commit

422ae16

verified ·

1 Parent(s): a67cf5c

Upload dual-push report docs

Browse files

Files changed (2) hide show

README.md +27 -9
REPORT.md +136 -4

README.md CHANGED Viewed

@@ -1,21 +1,22 @@
 # pi0.5 Packed Multi-Arm OpenPI Artifacts
-This repo packages the full local artifact set for the TWIN handover packed-action-head study on `pi0.5`, including:
 - all finished checkpoints under `openpi/checkpoints/`
 - the modified `openpi/` training and evaluation code
 - train/eval logs and structured metric tables
 - reproducibility manifests and environment snapshots
-Two runs are included:
 1. an initial `2K` baseline-vs-parallel comparison
 2. a longer `10K` follow-up on the same packed setup
 ## Experiment setup
-- Train repo: `lsnu/twin_handover_256_train`
-- Val repo: `lsnu/twin_handover_256_val`
 - Hardware: `4x H100 80GB`
 - Precision: `bfloat16`
 - Semantic packed layout: `[L8, 0x8, R8, 0x8]`
@@ -40,13 +41,22 @@ Sample-based eval on the fixed `10K` final validation subset:
 The long run still shows a very small parallel edge on teacher-forced validation loss by `10K`, while the sample-based eval is essentially a tie.
 ## Warm-start note
-The packed parallel warm-start uses the slice/fuse mapping implemented in `openpi/scripts/init_parallel_pi05_from_single_pytorch.py`, but the added step-0 numerical check shows it is not exactly identical end-to-end on a real batch:
-- `input_projection_max_abs_diff = 0.00122881`
-- `masked_loss_abs_diff = 0.00398052`
-- `warmstart_equivalent = False`
 So this repo should be read as a matched warm-start study, not as a bitwise-identical step-0 control.
@@ -55,11 +65,13 @@ So this repo should be read as a matched warm-start study, not as a bitwise-iden
 - `openpi/`
   - modified source and scripts used for training/eval
   - copied norm-stats assets for the packed configs
-  - full `2K` and `10K` checkpoint trees
 - `artifacts/twin_handover_packed_parallelization_20260309/`
   - initial `2K` study bundle
 - `artifacts/twin_handover_packed_parallelization_10k_20260309/`
   - `10K` follow-up bundle with metrics, logs, repro manifests, and environment snapshot
 - `artifacts/pi05_base_params/`
   - staged base parameter snapshot used during JAX-to-PyTorch conversion
@@ -69,6 +81,10 @@ So this repo should be read as a matched warm-start study, not as a bitwise-iden
 - `2K` summary: `artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json`
 - `10K` summary: `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/summary.json`
 - `10K` comparison table: `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/comparison_2k_vs_10k.csv`
 - `10K` repro commands: `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/commands_reproduce.sh`
 - `10K` changed-file manifest: `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
 - `10K` environment snapshot: `artifacts/twin_handover_packed_parallelization_10k_20260309/environment/`
@@ -90,8 +106,10 @@ Initial `2K` + `10K` study logic lives primarily in:
 - `openpi/scripts/check_parallel_warmstart_equivalence.py`
 - `openpi/scripts/run_twin_handover_packed_followup.sh`
 - `openpi/scripts/run_twin_handover_packed_10k.sh`
 The per-file rationale is recorded in:
 - `artifacts/twin_handover_packed_parallelization_20260309/repro/changed_files.txt`
 - `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`

 # pi0.5 Packed Multi-Arm OpenPI Artifacts
+This repo packages the full local artifact set for packed-action-head studies on `pi0.5` across TWIN handover and TWIN dual-push, including:
 - all finished checkpoints under `openpi/checkpoints/`
 - the modified `openpi/` training and evaluation code
 - train/eval logs and structured metric tables
 - reproducibility manifests and environment snapshots
+Three runs are included:
 1. an initial `2K` baseline-vs-parallel comparison
 2. a longer `10K` follow-up on the same packed setup
+3. a `5K` dual-push `128` screening study on the same packed path
 ## Experiment setup
+- Handover train/val: `lsnu/twin_handover_256_train`, `lsnu/twin_handover_256_val`
+- Dual-push train/val: `lsnu/twin_dual_push_128_train`, `lsnu/twin_dual_push_128_val`
 - Hardware: `4x H100 80GB`
 - Precision: `bfloat16`
 - Semantic packed layout: `[L8, 0x8, R8, 0x8]`
 The long run still shows a very small parallel edge on teacher-forced validation loss by `10K`, while the sample-based eval is essentially a tie.
+Dual-push `128` screening results:
+| Model | 1K val loss | 2K val loss | 5K val loss | 5K 4-step MAE | 5K 10-step MAE | Train runtime |
+| --- | ---: | ---: | ---: | ---: | ---: | ---: |
+| Packed baseline | `0.095597` | `0.083194` | `0.055958` | `0.056830` | `0.058973` | `1:05:25` |
+| Packed parallel | `0.093704` | `0.082729` | `0.055242` | `0.054630` | `0.056627` | `1:00:33` |
+The dual-push screening run shows a small but consistent parallel edge at `1K`, `2K`, and `5K` on both teacher-forced validation loss and fixed-subset sample MAE.
 ## Warm-start note
+The packed parallel warm-start uses the slice/fuse mapping implemented in `openpi/scripts/init_parallel_pi05_from_single_pytorch.py`, but the added step-0 numerical checks show it is not exactly identical end-to-end on a real batch:
+- handover `10K`: `input_projection_max_abs_diff = 0.00122881`, `masked_loss_abs_diff = 0.00398052`
+- dual-push `5K`: `input_projection_max_abs_diff = 0.00099802`, `masked_loss_abs_diff = 0.08580410`
+- both checks report `warmstart_equivalent = False`
 So this repo should be read as a matched warm-start study, not as a bitwise-identical step-0 control.
 - `openpi/`
   - modified source and scripts used for training/eval
   - copied norm-stats assets for the packed configs
+  - full `2K`, `10K`, and dual-push `5K` checkpoint trees
 - `artifacts/twin_handover_packed_parallelization_20260309/`
   - initial `2K` study bundle
 - `artifacts/twin_handover_packed_parallelization_10k_20260309/`
   - `10K` follow-up bundle with metrics, logs, repro manifests, and environment snapshot
+- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/`
+  - dual-push `128` screening bundle with metrics, logs, repro manifests, and environment snapshot
 - `artifacts/pi05_base_params/`
   - staged base parameter snapshot used during JAX-to-PyTorch conversion
 - `2K` summary: `artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json`
 - `10K` summary: `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/summary.json`
 - `10K` comparison table: `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/comparison_2k_vs_10k.csv`
+- dual-push `5K` summary: `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/summary.json`
+- dual-push `5K` teacher-forced table: `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/teacher_forced_eval_table.csv`
+- dual-push `5K` sample eval table: `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/sample_eval_table.csv`
+- dual-push `5K` environment snapshot: `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/environment/`
 - `10K` repro commands: `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/commands_reproduce.sh`
 - `10K` changed-file manifest: `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
 - `10K` environment snapshot: `artifacts/twin_handover_packed_parallelization_10k_20260309/environment/`
 - `openpi/scripts/check_parallel_warmstart_equivalence.py`
 - `openpi/scripts/run_twin_handover_packed_followup.sh`
 - `openpi/scripts/run_twin_handover_packed_10k.sh`
+- `openpi/scripts/run_twin_dual_push_128_packed_5k.sh`
 The per-file rationale is recorded in:
 - `artifacts/twin_handover_packed_parallelization_20260309/repro/changed_files.txt`
 - `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
+- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/repro/changed_files.txt`

REPORT.md CHANGED Viewed

@@ -1,13 +1,14 @@
-# Report: pi0.5 Packed Action-Head Parallelization on TWIN Handover
 ## Scope
-This repo now contains two completed studies on the same packed TWIN handover setup:
 1. the initial `2K` baseline-vs-parallel comparison
 2. the longer `10K` follow-up with richer diagnostics
-Both runs used:
 - train repo `lsnu/twin_handover_256_train`
 - val repo `lsnu/twin_handover_256_val`
@@ -19,6 +20,17 @@ Both runs used:
 Existing public `16`-dim norm stats were reused. No raw-data reconversion was done.
 ## Data packing and masking
 The TWIN converted state/action layout is `[L8, R8]`, where each arm is `7` joints plus gripper. The packed transform path added for these runs preserves the left/right semantics inside a `32`-dim model input:
@@ -71,6 +83,28 @@ The exact `10K` changed-file manifest is:
 - `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
 ## Commands and run flow
 The exact `10K` rerun commands are stored in:
@@ -137,6 +171,19 @@ Interpretation:
 - this weakens a strict “identical function at step 0” claim
 - it does not invalidate the comparison as a matched warm-start study
 ## Results
 ### Initial `2K` study
@@ -237,6 +284,73 @@ Reference:
 - `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/runtime_table.csv`
 ## Artifact locations
 ### `2K` bundle
@@ -256,6 +370,18 @@ Reference:
 - `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
 - `artifacts/twin_handover_packed_parallelization_10k_20260309/environment/`
 ## Bottom line
 The `10K` follow-up suggests the `2K` near-tie was not hiding a large later divergence.
@@ -265,4 +391,10 @@ The `10K` follow-up suggests the `2K` near-tie was not hiding a large later dive
 - left/right imbalance does not materially change
 - the main difference remains subtle rather than dramatic
-So the packed parallel head looks competitive and slightly favorable on the masked teacher-forced objective, but the current evidence does not show a large practical separation at inference time on this task.

+# Report: pi0.5 Packed Action-Head Parallelization on TWIN Handover and Dual Push
 ## Scope
+This repo now contains three completed studies:
 1. the initial `2K` baseline-vs-parallel comparison
 2. the longer `10K` follow-up with richer diagnostics
+3. a `5K` dual-push `128` screening run on the same packed path
+The handover runs used:
 - train repo `lsnu/twin_handover_256_train`
 - val repo `lsnu/twin_handover_256_val`
 Existing public `16`-dim norm stats were reused. No raw-data reconversion was done.
+The dual-push screening run used:
+- train repo `lsnu/twin_dual_push_128_train`
+- val repo `lsnu/twin_dual_push_128_val`
+- `4x H100 80GB`
+- `bfloat16`
+- packed semantic layout `[L8, 0x8, R8, 0x8]`
+- active action-loss dims `[0:8]` and `[16:24]`
+- masked dims `[8:16]` and `[24:32]`
+- recomputed norm stats for the dual-push `128` train split
 ## Data packing and masking
 The TWIN converted state/action layout is `[L8, R8]`, where each arm is `7` joints plus gripper. The packed transform path added for these runs preserves the left/right semantics inside a `32`-dim model input:
 - `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
+### Dual-push `5K` screening additions
+The dual-push screening run added or updated:
+- `openpi/src/openpi/training/config.py`
+  - added `pi05_twin_dual_push_128_packed_baseline_pytorch_5k`
+  - added `pi05_twin_dual_push_128_packed_parallel_pytorch_5k`
+- `openpi/scripts/run_twin_dual_push_128_packed_5k.sh`
+  - added detached dual-push `5K` baseline->eval sweep->parallel->eval sweep runner
+- `openpi/assets/pi05_twin_dual_push_128_packed_baseline_pytorch_5k/lsnu/twin_dual_push_128_train/norm_stats.json`
+  - computed dual-push `128` train norm stats for the packed baseline config
+- `openpi/assets/pi05_twin_dual_push_128_packed_parallel_pytorch_5k/lsnu/twin_dual_push_128_train/norm_stats.json`
+  - computed dual-push `128` train norm stats for the packed parallel config
+- `README.md`
+  - updated landing page to cover the dual-push screening study
+- `REPORT.md`
+  - updated full report to include dual-push setup, results, and artifact locations
+The exact dual-push changed-file manifest is:
+- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/repro/changed_files.txt`
 ## Commands and run flow
 The exact `10K` rerun commands are stored in:
 - this weakens a strict “identical function at step 0” claim
 - it does not invalidate the comparison as a matched warm-start study
+Dual-push `5K` warm-start check:
+- `input_projection_max_abs_diff = 0.00099802`
+- `input_projection_mean_abs_diff = 0.00010568`
+- `baseline_masked_loss = 1.43506372`
+- `parallel_masked_loss = 1.52086782`
+- `masked_loss_abs_diff = 0.08580410`
+- `warmstart_equivalent = False`
+Reference:
+- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/sanity_checks/warmstart_dual_push_128_5k.log`
 ## Results
 ### Initial `2K` study
 - `artifacts/twin_handover_packed_parallelization_10k_20260309/metrics/runtime_table.csv`
+## Dual-push `128` screening results
+### Teacher-forced validation
+| Checkpoint | Baseline | Parallel | Delta (parallel - baseline) |
+| --- | ---: | ---: | ---: |
+| `1000` | `0.095597` | `0.093704` | `-0.001893` |
+| `2000` | `0.083194` | `0.082729` | `-0.000465` |
+| `5000` | `0.055958` | `0.055242` | `-0.000716` |
+The screening signal is small but consistently positive for the packed parallel model at all three checkpoints.
+### Dual-push arm breakdown
+Teacher-forced validation at `5000`:
+| Metric | Baseline | Parallel |
+| --- | ---: | ---: |
+| Mean val loss | `0.055958` | `0.055242` |
+| Left arm loss | `0.017725` | `0.017044` |
+| Right arm loss | `0.094191` | `0.093439` |
+| Left joint loss | `0.017577` | `0.017052` |
+| Left gripper loss | `0.018765` | `0.016992` |
+| Right joint loss | `0.103576` | `0.102856` |
+| Right gripper loss | `0.028502` | `0.027523` |
+| Left/right imbalance | `0.080993` | `0.081011` |
+Interpretation:
+- the small parallel advantage is visible on both arms
+- the right arm remains much harder than the left on this task
+- left/right imbalance is essentially unchanged
+### Dual-push sample-based eval
+Fixed-subset sample eval:
+| Checkpoint | Model | 4-step masked MAE | 10-step masked MAE |
+| --- | --- | ---: | ---: |
+| `1000` | baseline | `0.103199` | `0.108652` |
+| `1000` | parallel | `0.101439` | `0.106874` |
+| `2000` | baseline | `0.069732` | `0.074413` |
+| `2000` | parallel | `0.069053` | `0.073501` |
+| `5000` | baseline | `0.056830` | `0.058973` |
+| `5000` | parallel | `0.054630` | `0.056627` |
+Interpretation:
+- the parallel model is also slightly better on fixed-subset inference-style eval
+- unlike handover, the positive signal stays visible at `5K`
+- the margin is still small enough that this remains a screening result, not a paper-final claim
+### Dual-push runtime and memory
+| Stage | Duration |
+| --- | ---: |
+| Baseline train | `1:05:25` |
+| Baseline eval sweep | `0:14:34` |
+| Parallel train | `1:00:33` |
+| Parallel eval sweep | `0:14:39` |
+| Full dual-push pipeline | `2:35:11` |
+Peak VRAM:
+- baseline: `35.23GB`
+- parallel: `35.27GB`
 ## Artifact locations
 ### `2K` bundle
 - `artifacts/twin_handover_packed_parallelization_10k_20260309/repro/changed_files.txt`
 - `artifacts/twin_handover_packed_parallelization_10k_20260309/environment/`
+### Dual-push `5K` bundle
+- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/summary.json`
+- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/train_loss_table.csv`
+- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/teacher_forced_eval_table.csv`
+- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/sample_eval_table.csv`
+- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/metrics/runtime_table.csv`
+- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/repro/commands_reproduce.sh`
+- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/repro/changed_files.txt`
+- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/repro/checkpoint_locations.txt`
+- `artifacts/twin_dual_push_128_packed_parallelization_5k_20260310/environment/`
 ## Bottom line
 The `10K` follow-up suggests the `2K` near-tie was not hiding a large later divergence.
 - left/right imbalance does not materially change
 - the main difference remains subtle rather than dramatic
+The dual-push screening run adds a second signal:
+- the packed parallel model is slightly better at `1K`, `2K`, and `5K`
+- the same small advantage appears on both teacher-forced and sample-based eval
+- the effect is still modest, but it is cleaner and more consistent than handover
+So the current repo state supports a narrow next-step conclusion: packed parallelization remains subtle on handover, but dual-push is a better candidate task for the next seed/scale confirmation.