lsnu commited on Mar 9

Commit

1759ca7

verified ·

1 Parent(s): aa91438

Add files using upload-large-folder tool

Browse files

Files changed (38) hide show

README.md +58 -0
REPORT.md +347 -0
artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/pi05_base_parallel_packed_from_single/config.json +14 -0
artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/pi05_base_parallel_packed_from_single/init_parallel_metadata.json +27 -0
artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/pi05_base_single_pytorch/config.json +7 -0
artifacts/twin_handover_packed_parallelization_20260309/environment/gpu_info.txt +10 -0
artifacts/twin_handover_packed_parallelization_20260309/environment/hf_env.txt +3 -0
artifacts/twin_handover_packed_parallelization_20260309/environment/openpi_source_snapshot.txt +5 -0
artifacts/twin_handover_packed_parallelization_20260309/environment/pip_freeze.txt +242 -0
artifacts/twin_handover_packed_parallelization_20260309/environment/python_env.txt +11 -0
artifacts/twin_handover_packed_parallelization_20260309/environment/selected_env_vars.json +1 -0
artifacts/twin_handover_packed_parallelization_20260309/environment/system_info.txt +7 -0
artifacts/twin_handover_packed_parallelization_20260309/environment/workspace_snapshot.txt +49 -0
artifacts/twin_handover_packed_parallelization_20260309/metrics/norm_stats_verification.txt +9 -0
artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json +318 -0
artifacts/twin_handover_packed_parallelization_20260309/metrics/train_loss_table.csv +11 -0
artifacts/twin_handover_packed_parallelization_20260309/metrics/val_loss_table.csv +5 -0
artifacts/twin_handover_packed_parallelization_20260309/repro/changed_files.txt +15 -0
artifacts/twin_handover_packed_parallelization_20260309/repro/checkpoint_locations.txt +6 -0
artifacts/twin_handover_packed_parallelization_20260309/repro/commands_reproduce.sh +22 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/detach_test.log +2 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k.log +0 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k_val_1000.log +66 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k_val_2000.log +114 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k.log +0 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k_val_1000.log +64 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k_val_2000.log +114 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/importtime_train_pytorch.log +349 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/inspect_twin_packed_batch_handover_train.log +176 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20.log +241 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20b.log +0 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20d.log +34 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20e.log +34 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20k.log +234 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20l.log +141 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_parallel_20a.log +141 -0
artifacts/twin_handover_packed_parallelization_20260309/run_logs/twin_handover_followup.log +37 -0
artifacts/twin_handover_packed_parallelization_20260309/sanity_checks/inspect_twin_packed_batch_handover_train.log +176 -0

README.md ADDED Viewed

	@@ -0,0 +1,58 @@

+# pi0.5 Packed Multi-Arm OpenPI Artifacts
+This repo packages a finished initial comparison between:
+1. a packed single-head `pi0.5` baseline
+2. a packed parallel-head `pi0.5` model with an exact packed warm-start from the single-head checkpoint
+The study was run from the checked-out `openpi/` tree on `4x H100 80GB` with `bfloat16`, `2000` optimizer steps per model, verbose startup/debug logging, fixed validation passes, and no raw data reconversion.
+## Dataset and packing
+- Train repo: `lsnu/twin_handover_256_train`
+- Val repo: `lsnu/twin_handover_256_val`
+- Original TWIN layout: `[L8, R8]`
+- Packed model layout used for both models: `[L8, 0x8, R8, 0x8]`
+- Action-loss mask: active dims `[0:8]` and `[16:24]`, padded dims masked out
+- Public `16`-dim norm stats were reused; they were not recomputed
+## Headline results
+| Model | Val @ 1000 | Val @ 2000 | Train runtime | Peak VRAM |
+| --- | ---: | ---: | ---: | ---: |
+| Packed baseline | `0.052885` | `0.035776` | `33:27` | `35.23 GB` |
+| Packed parallel | `0.051214` | `0.035680` | `30:38` | `35.27 GB` |
+The two models tracked closely. In this short run, the packed parallel head finished with a small edge on validation loss while staying within the same memory envelope.
+## Repo contents
+- `openpi/`
+  - modified training/eval code
+  - config and transform changes
+  - copied norm-stats assets for the new packed configs
+  - smoke and main-run checkpoints under `openpi/checkpoints/`
+- `artifacts/twin_handover_packed_parallelization_20260309/`
+  - `bootstrap_checkpoints/`: single-head PyTorch bootstrap and exact packed parallel warm-start
+  - `metrics/`: JSON and CSV summaries
+  - `run_logs/`: smoke, train, eval, and follow-up logs
+  - `sanity_checks/`: packed-batch inspection output
+  - `environment/`: system, GPU, package, HF-tooling, and workspace snapshots
+  - `repro/`: changed-file list, checkpoint locations, and rerun commands
+- `artifacts/pi05_base_params/`
+  - staged base JAX parameter snapshot used for PyTorch conversion
+## Key artifact paths
+- Full report: `REPORT.md`
+- Reproduction commands: `artifacts/twin_handover_packed_parallelization_20260309/repro/commands_reproduce.sh`
+- Metrics summary: `artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json`
+- Train loss table: `artifacts/twin_handover_packed_parallelization_20260309/metrics/train_loss_table.csv`
+- Val loss table: `artifacts/twin_handover_packed_parallelization_20260309/metrics/val_loss_table.csv`
+- Environment snapshot: `artifacts/twin_handover_packed_parallelization_20260309/environment/`
+## Notes
+- The packed parallel warm-start is exact by construction from the implemented slice/fuse mapping.
+- Weight loading on both main runs reported `missing=0` and `unexpected=0`.
+- The packaged tree intentionally records reproducibility snapshots instead of uploading transient cache state.

REPORT.md ADDED Viewed

	@@ -0,0 +1,347 @@

+# Report: pi0.5 Packed Action-Head Parallelization on TWIN Handover
+## Objective
+Run the minimum scientifically meaningful comparison between:
+1. a packed single-head `pi0.5` baseline
+2. a packed parallel-head `pi0.5` model
+Both models were fine-tuned on the same converted public TWIN handover dataset with the same training schedule:
+- train: `lsnu/twin_handover_256_train`
+- val: `lsnu/twin_handover_256_val`
+- hardware: `4x H100 80GB`
+- precision: `bfloat16`
+- global batch size: `16`
+- optimizer steps per model: `2000`
+- save interval: `250`
+- log interval: `10`
+## Data layout and packing
+The TWIN converted state/action layout is `16` dims in `[L8, R8]`, where each arm is `7` joints plus gripper. The generic `pi0.5` path right-pads to `32` dims, which does not preserve a semantic left/right split for a naive parallel-head setup.
+To keep the experiment minimal and still semantically correct:
+- existing public `16`-dim norm stats were reused
+- semantic packing happened after normalization in model transforms
+- both models consumed the same packed `32`-dim layout:
+```text
+[L8, R8] -> [L8, 0x8, R8, 0x8]
+```
+- the action loss was masked so only the real arm dims contributed:
+```text
+active dims: [0:8] and [16:24]
+masked dims: [8:16] and [24:32]
+```
+The packed-batch sanity check confirmed exact zero padding:
+- `state_padded_zero_count: 16 / 16`
+- `actions_padded_zero_count: 256 / 256`
+- `state_padded_exact_zero: True`
+- `actions_padded_exact_zero: True`
+Reference log:
+- `artifacts/twin_handover_packed_parallelization_20260309/sanity_checks/inspect_twin_packed_batch_handover_train.log`
+## Code changes tied to files
+The experiment-specific changes are summarized below.
+- `openpi/src/openpi/transforms.py`
+  - added `PackPerArmBlocks` and `UnpackPerArmBlocks` for semantic TWIN packed training
+- `openpi/src/openpi/training/config.py`
+  - added packed TWIN model-transform path
+  - added `action_loss_mask`
+  - added `pi05_twin_handover_256_packed_baseline_pytorch_2k`
+  - added `pi05_twin_handover_256_packed_parallel_pytorch_2k`
+- `openpi/src/openpi/training/data_loader.py`
+  - added `set_epoch`
+  - improved local dataset mirror handling and loader startup behavior
+- `openpi/src/openpi/models/model.py`
+  - made `pi0_pytorch` import lazy
+- `openpi/src/openpi/models/tokenizer.py`
+  - made `AutoProcessor` import lazy
+- `openpi/src/openpi/models_pytorch/pi0_pytorch.py`
+  - disabled unconditional `sample_actions` `torch.compile` by default
+- `openpi/scripts/train_pytorch.py`
+  - added startup prints
+  - added masked action-loss reduction
+  - added first-steps debug prints and periodic runtime/memory logging
+  - hardened DDP/checkpoint startup
+- `openpi/scripts/eval_twin_val_loss_pytorch.py`
+  - added masked validation-loss evaluation with fixed-batch execution
+- `openpi/scripts/init_parallel_pi05_from_single_pytorch.py`
+  - added exact packed parallel warm-start initialization
+- `openpi/scripts/inspect_twin_packed_batch.py`
+  - added packed-batch inspection and zero-padding verification
+- `openpi/scripts/run_twin_handover_packed_followup.sh`
+  - added detached follow-up automation for the remaining train/eval stages
+- `openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json`
+  - copied the existing handover train norm stats for the packed baseline config
+- `openpi/assets/pi05_twin_handover_256_packed_parallel_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json`
+  - copied the existing handover train norm stats for the packed parallel config
+Reference file list:
+- `artifacts/twin_handover_packed_parallelization_20260309/repro/changed_files.txt`
+## Commands run
+The exact rerun command list is saved in:
+- `artifacts/twin_handover_packed_parallelization_20260309/repro/commands_reproduce.sh`
+The executed flow was:
+1. packed-batch inspection
+2. base `pi0.5` JAX-to-PyTorch conversion
+3. exact packed parallel warm-start initialization from the single-head PyTorch checkpoint
+4. packed baseline training for `2000` steps
+5. baseline val at `1000`
+6. baseline val at `2000`
+7. packed parallel training for `2000` steps
+8. parallel val at `1000`
+9. parallel val at `2000`
+The parallel training and its validation passes were chained through a detached follow-up runner.
+Reference logs:
+- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/twin_handover_followup.log`
+- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k.log`
+- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k.log`
+## Startup sanity checks
+### Norm stats
+The copied norm-stats files were loaded successfully and reported:
+- keys: `['actions', 'state']`
+- `state_mean_len=16`
+- `state_std_len=16`
+- `actions_mean_len=16`
+- `actions_std_len=16`
+Reference:
+- `artifacts/twin_handover_packed_parallelization_20260309/metrics/norm_stats_verification.txt`
+### Baseline startup summary
+Rank-0 startup logging for the packed baseline recorded:
+```text
+Resolved config name: pi05_twin_handover_256_packed_baseline_pytorch_2k
+Dataset repo_id: lsnu/twin_handover_256_train
+Norm-stats summary: {'keys': ['actions', 'state'], 'state_mean_len': 16, 'state_std_len': 16, 'actions_mean_len': 16, 'actions_std_len': 16}
+Checkpoint source path: /workspace/checkpoints/pi05_base_single_pytorch
+Model type: baseline
+Packed transforms active: True
+Batch size: local=4, global=16
+Action-loss mask: (1.0 x8, 0.0 x8, 1.0 x8, 0.0 x8)
+Weight loading missing key count: 0
+Weight loading unexpected key count: 0
+```
+The first debug steps also showed:
+- `observation.state shape=(4, 32)`
+- `actions shape=(4, 16, 32)`
+- `state_nonzero_counts_8d_blocks=[32, 0, 32, 0]`
+- `action_nonzero_counts_8d_blocks=[512, 0, 512, 0]`
+- masked padded dims stayed exactly zero in the batch
+### Parallel startup summary
+Rank-0 startup logging for the packed parallel run recorded:
+```text
+Resolved config name: pi05_twin_handover_256_packed_parallel_pytorch_2k
+Dataset repo_id: lsnu/twin_handover_256_train
+Norm-stats summary: {'keys': ['actions', 'state'], 'state_mean_len': 16, 'state_std_len': 16, 'actions_mean_len': 16, 'actions_std_len': 16}
+Checkpoint source path: /workspace/checkpoints/pi05_base_parallel_packed_from_single
+Model type: parallel
+Packed transforms active: True
+Batch size: local=4, global=16
+Action-loss mask: (1.0 x8, 0.0 x8, 1.0 x8, 0.0 x8)
+Weight loading missing key count: 0
+Weight loading unexpected key count: 0
+```
+The first debug steps matched the expected packed layout:
+- `observation.state shape=(4, 32)`
+- `actions shape=(4, 16, 32)`
+- `state_nonzero_counts_8d_blocks=[32, 0, 32, 0]`
+- `action_nonzero_counts_8d_blocks=[512, 0, 512, 0]`
+### Smoke tests
+All required smoke tests passed before the main runs:
+1. `debug_pi05_multiarm_pytorch_smoke`
+2. packed-batch inspection on `lsnu/twin_handover_256_train`
+3. packed baseline TWIN smoke on `4` GPUs for `20` steps
+4. packed parallel TWIN smoke on `4` GPUs for `20` steps
+Smoke logs are stored in:
+- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20k.log`
+- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20l.log`
+- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_parallel_20a.log`
+## Warm-start note
+The packed parallel warm-start was implemented as an exact slice/fuse mapping from the single-head PyTorch checkpoint:
+- input side: split single-head input projection by packed arm blocks
+- fuse side: initialize `arm_token_fuse.weight` as `[I I]`
+- output side: split single-head output projection rows by packed arm blocks
+This was exact by construction for the implemented mapping and both the warm-start checkpoint creation and main-run loading succeeded without missing or unexpected keys.
+What was not done:
+- no separate numerical equivalence test was run that compared step-0 forward outputs between the single-head and parallel-head models on the same batch
+Bootstrap checkpoints:
+- `/workspace/checkpoints/pi05_base_single_pytorch`
+- `/workspace/checkpoints/pi05_base_parallel_packed_from_single`
+Copies are also staged under:
+- `artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/`
+## Results
+### Training loss snapshots
+| Model | Step 250 | Step 500 | Step 1000 | Step 1500 | Step 2000 |
+| --- | ---: | ---: | ---: | ---: | ---: |
+| Baseline loss | `0.1975` | `0.0606` | `0.0245` | `0.0155` | `0.0391` |
+| Baseline smoothed | `0.1166` | `0.0554` | `0.0387` | `0.0331` | `0.0278` |
+| Parallel loss | `0.1894` | `0.0633` | `0.0214` | `0.0155` | `0.0326` |
+| Parallel smoothed | `0.1153` | `0.0565` | `0.0392` | `0.0331` | `0.0270` |
+### Validation loss
+| Model | Checkpoint | Batches | Mean val loss | Std val loss |
+| --- | ---: | ---: | ---: | ---: |
+| Baseline | `1000` | `50` | `0.052885` | `0.032533` |
+| Baseline | `2000` | `100` | `0.035776` | `0.027648` |
+| Parallel | `1000` | `50` | `0.051214` | `0.028985` |
+| Parallel | `2000` | `100` | `0.035680` | `0.026077` |
+### Runtime and memory
+| Item | Value |
+| --- | --- |
+| Pipeline wallclock from baseline launch to final val | `01:32:29` |
+| Detached follow-up runner wallclock | `01:17:47` |
+| Baseline train runtime | `33:27` |
+| Parallel train runtime | `30:38` |
+| Baseline val @ 1000 | `00:05:14` |
+| Baseline val @ 2000 | `00:05:19` |
+| Parallel val @ 1000 | `00:03:23` |
+| Parallel val @ 2000 | `00:03:33` |
+| Peak baseline VRAM | `35.23 GB` |
+| Peak parallel VRAM | `35.27 GB` |
+### Interpretation
+For this short `2000`-step TWIN handover run, the packed baseline and packed parallel-head models behaved very similarly. The packed parallel-head model ended slightly lower on both validation checkpoints while staying in the same memory range and training cleanly under the same schedule.
+This should be treated as an initial profiling run, not a final benchmark claim.
+Reference metrics:
+- `artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json`
+- `artifacts/twin_handover_packed_parallelization_20260309/metrics/train_loss_table.csv`
+- `artifacts/twin_handover_packed_parallelization_20260309/metrics/val_loss_table.csv`
+## Checkpoints and logs
+### Main-run checkpoints
+- Baseline step `1000`:
+  - `/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1000`
+- Baseline step `2000`:
+  - `/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/2000`
+- Parallel step `1000`:
+  - `/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1000`
+- Parallel step `2000`:
+  - `/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/2000`
+The full checkpoint trees, including smoke checkpoints and intermediate saves every `250` steps, are under:
+- `openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/`
+- `openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/`
+### Bootstrap checkpoints
+- `artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/pi05_base_single_pytorch/`
+- `artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/pi05_base_parallel_packed_from_single/`
+### Logs
+- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k.log`
+- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k_val_1000.log`
+- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k_val_2000.log`
+- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k.log`
+- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k_val_1000.log`
+- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k_val_2000.log`
+## Environment and provenance snapshot
+Environment snapshots are stored in:
+- `artifacts/twin_handover_packed_parallelization_20260309/environment/system_info.txt`
+- `artifacts/twin_handover_packed_parallelization_20260309/environment/gpu_info.txt`
+- `artifacts/twin_handover_packed_parallelization_20260309/environment/python_env.txt`
+- `artifacts/twin_handover_packed_parallelization_20260309/environment/pip_freeze.txt`
+- `artifacts/twin_handover_packed_parallelization_20260309/environment/hf_env.txt`
+- `artifacts/twin_handover_packed_parallelization_20260309/environment/selected_env_vars.json`
+- `artifacts/twin_handover_packed_parallelization_20260309/environment/workspace_snapshot.txt`
+- `artifacts/twin_handover_packed_parallelization_20260309/environment/openpi_source_snapshot.txt`
+OpenPI source provenance:
+- packaged `openpi/` tree does not contain a live `.git` directory
+- source clone snapshot recorded in `openpi_source_snapshot.txt`
+- source commit: `aa91438c0c130dcef4ccf378a56f4cf4cffc1310`
+## Acceptance criteria status
+1. Packed-batch inspection showed raw `16`-dim `[L8, R8]` and packed `32`-dim `[L8, 0x8, R8, 0x8]`: `PASS`
+2. Both smoke tests passed on `4` GPUs with finite loss: `PASS`
+3. Baseline run started from `/workspace/checkpoints/pi05_base_single_pytorch`: `PASS`
+4. Parallel run started from `/workspace/checkpoints/pi05_base_parallel_packed_from_single`: `PASS`
+5. Masked loss was active and padded dims were excluded: `PASS`
+6. DDP ran without shape/key mismatches: `PASS`
+7. Quick val was run at step `1000` for both models: `PASS`
+8. Final val was run at step `2000` for both models: `PASS`
+9. Both main runs finished under the `10`-hour cap: `PASS`
+10. Final bundle includes code, checkpoints, logs, metrics, and environment snapshot: `PASS`
+## Final inventory
+The artifact bundle at repo root contains:
+- all modified training/eval code under `openpi/`
+- all baseline and parallel checkpoints under `openpi/checkpoints/`
+- both bootstrap checkpoints under `artifacts/.../bootstrap_checkpoints/`
+- all train/eval/smoke logs under `artifacts/.../run_logs/`
+- metrics tables and summary JSON under `artifacts/.../metrics/`
+- reproducibility files under `artifacts/.../repro/`
+- environment and provenance snapshot under `artifacts/.../environment/`
+This is a complete rerunnable package for the initial TWIN handover packed action-head parallelization study.

artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/pi05_base_parallel_packed_from_single/config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "action_dim": 32,
+  "action_expert_variant": "gemma_300m",
+  "action_horizon": 16,
+  "arm_action_dims": [
+    16,
+    16
+  ],
+  "discrete_state_input": true,
+  "dtype": "bfloat16",
+  "max_token_len": 200,
+  "paligemma_variant": "gemma_2b",
+  "pi05": true
+}

artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/pi05_base_parallel_packed_from_single/init_parallel_metadata.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "config_name": "pi05_twin_handover_256_packed_parallel_pytorch_2k",
+  "input_projection_max_abs_diff": 1.1920928955078125e-06,
+  "load_state_missing_keys": [
+    "paligemma_with_expert.paligemma.model.language_model.embed_tokens.weight",
+    "action_in_proj_arms.0.weight",
+    "action_in_proj_arms.0.bias",
+    "action_in_proj_arms.1.weight",
+    "action_in_proj_arms.1.bias",
+    "arm_token_fuse.weight",
+    "arm_token_fuse.bias",
+    "action_out_proj_arms.0.weight",
+    "action_out_proj_arms.0.bias",
+    "action_out_proj_arms.1.weight",
+    "action_out_proj_arms.1.bias"
+  ],
+  "load_state_unexpected_keys": [
+    "action_in_proj.bias",
+    "action_in_proj.weight",
+    "action_out_proj.bias",
+    "action_out_proj.weight"
+  ],
+  "output_path": "/workspace/checkpoints/pi05_base_parallel_packed_from_single",
+  "output_projection_max_abs_diff": 9.5367431640625e-07,
+  "single_ckpt": "/workspace/checkpoints/pi05_base_single_pytorch",
+  "warm_start_exact": false
+}

artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/pi05_base_single_pytorch/config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "action_dim": 32,
+  "action_horizon": 16,
+  "paligemma_variant": "gemma_2b",
+  "action_expert_variant": "gemma_300m",
+  "precision": "bfloat16"
+}

artifacts/twin_handover_packed_parallelization_20260309/environment/gpu_info.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+timestamp_utc=2026-03-09T02:09:46Z
+GPU 0: NVIDIA H100 80GB HBM3 (UUID: GPU-352e04eb-3fa2-0b3b-c24f-5c9567d275af)
+GPU 1: NVIDIA H100 80GB HBM3 (UUID: GPU-09e17180-0d03-02d6-53c8-863ebf34f1a0)
+GPU 2: NVIDIA H100 80GB HBM3 (UUID: GPU-323a86ac-758a-6993-c4b8-7b0c6cf94b3f)
+GPU 3: NVIDIA H100 80GB HBM3 (UUID: GPU-dfccd461-1fa0-0b62-00da-e9abb74fb025)
+0, NVIDIA H100 80GB HBM3, 81559 MiB, 580.126.09
+1, NVIDIA H100 80GB HBM3, 81559 MiB, 580.126.09
+2, NVIDIA H100 80GB HBM3, 81559 MiB, 580.126.09
+3, NVIDIA H100 80GB HBM3, 81559 MiB, 580.126.09

artifacts/twin_handover_packed_parallelization_20260309/environment/hf_env.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+timestamp_utc=2026-03-09T02:10:06Z
+hf_version=1.6.0
+auth_state=Not logged in

artifacts/twin_handover_packed_parallelization_20260309/environment/openpi_source_snapshot.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+timestamp_utc=2026-03-09T02:11:23Z
+packaged_openpi_has_git=no
+source_clone_path=/workspace/openpi_partial_broken_1773005128
+source_commit=aa91438c0c130dcef4ccf378a56f4cf4cffc1310
+source_remote=https://huggingface.co/lsnu/pi05tests-openpi-multiarm

artifacts/twin_handover_packed_parallelization_20260309/environment/pip_freeze.txt ADDED Viewed

	@@ -0,0 +1,242 @@

+absl-py==2.3.0
+aiohappyeyeballs==2.6.1
+aiohttp==3.12.4
+aiosignal==1.3.2
+annotated-types==0.7.0
+antlr4-python3-runtime==4.9.3
+asttokens==3.0.0
+attrs==25.3.0
+augmax==0.4.1
+av==14.4.0
+beartype==0.19.0
+beautifulsoup4==4.13.4
+blinker==1.9.0
+cachetools==5.5.2
+certifi==2025.4.26
+cffi==1.17.1
+cfgv==3.4.0
+charset-normalizer==3.4.2
+chex==0.1.89
+click==8.2.1
+cloudpickle==3.1.1
+cmake==4.0.2
+comm==0.2.2
+contourpy==1.3.2
+crc32c==2.7.1
+cycler==0.12.1
+datasets==3.6.0
+debugpy==1.8.14
+decorator==5.2.1
+deepdiff==8.5.0
+diffusers==0.33.1
+dill==0.3.8
+distlib==0.3.9
+dm-control==1.0.14
+dm-env==1.6
+dm-tree==0.1.9
+docker-pycreds==0.4.0
+docstring-parser==0.16
+donfig==0.8.1.post1
+draccus==0.10.0
+einops==0.8.1
+equinox==0.12.2
+etils==1.12.2
+evdev==1.9.2
+executing==2.2.0
+farama-notifications==0.0.4
+filelock==3.18.0
+flask==3.1.1
+flatbuffers==25.2.10
+flax==0.10.2
+fonttools==4.58.1
+frozenlist==1.6.0
+fsspec==2025.3.0
+gcsfs==2025.3.0
+gdown==5.2.0
+gitdb==4.0.12
+gitpython==3.1.44
+glfw==2.9.0
+google-api-core==2.24.2
+google-auth==2.40.2
+google-auth-oauthlib==1.2.2
+google-cloud-core==2.4.3
+google-cloud-storage==3.1.0
+google-crc32c==1.7.1
+google-resumable-media==2.7.2
+googleapis-common-protos==1.70.0
+gym-aloha==0.1.1
+gymnasium==0.29.1
+h5py==3.13.0
+hf-transfer==0.1.9
+hf-xet==1.1.2
+huggingface-hub==0.32.3
+humanize==4.12.3
+identify==2.6.12
+idna==3.10
+imageio==2.37.0
+imageio-ffmpeg==0.6.0
+importlib-metadata==8.7.0
+importlib-resources==6.5.2
+iniconfig==2.1.0
+inquirerpy==0.3.4
+ipykernel==6.29.5
+ipython==9.2.0
+ipython-pygments-lexers==1.1.1
+ipywidgets==8.1.7
+itsdangerous==2.2.0
+jax==0.5.3
+jax-cuda12-pjrt==0.5.3
+jax-cuda12-plugin==0.5.3
+jaxlib==0.5.3
+jaxtyping==0.2.36
+jedi==0.19.2
+jinja2==3.1.6
+jsonlines==4.0.0
+jupyter-client==8.6.3
+jupyter-core==5.8.1
+jupyterlab-widgets==3.0.15
+kiwisolver==1.4.8
+labmaze==1.0.6
+lerobot @ git+https://github.com/huggingface/lerobot@0cf864870cf29f4738d3ade893e6fd13fbd7cdb5
+llvmlite==0.44.0
+lxml==5.4.0
+markdown-it-py==3.0.0
+markupsafe==3.0.2
+matplotlib==3.10.3
+matplotlib-inline==0.1.7
+mdurl==0.1.2
+mergedeep==1.3.4
+ml-collections==1.0.0
+ml-dtypes==0.4.1
+mpmath==1.3.0
+msgpack==1.1.0
+mujoco==2.3.7
+multidict==6.4.4
+multiprocess==0.70.16
+mypy-extensions==1.1.0
+nest-asyncio==1.6.0
+networkx==3.5
+nodeenv==1.9.1
+numba==0.61.2
+numcodecs==0.16.1
+numpy==1.26.4
+numpydantic==1.6.9
+nvidia-cublas-cu12==12.6.4.1
+nvidia-cuda-cupti-cu12==12.6.80
+nvidia-cuda-nvcc-cu12==12.9.41
+nvidia-cuda-nvrtc-cu12==12.6.77
+nvidia-cuda-runtime-cu12==12.6.77
+nvidia-cudnn-cu12==9.5.1.17
+nvidia-cufft-cu12==11.3.0.4
+nvidia-cufile-cu12==1.11.1.6
+nvidia-curand-cu12==10.3.7.77
+nvidia-cusolver-cu12==11.7.1.2
+nvidia-cusparse-cu12==12.5.4.2
+nvidia-cusparselt-cu12==0.6.3
+nvidia-ml-py==12.575.51
+nvidia-nccl-cu12==2.26.2
+nvidia-nvjitlink-cu12==12.6.85
+nvidia-nvtx-cu12==12.6.77
+oauthlib==3.2.2
+omegaconf==2.3.0
+opencv-python==4.11.0.86
+opencv-python-headless==4.11.0.86
+-e file:///workspace/pi05tests-openpi-multiarm/openpi
+-e file:///workspace/pi05tests-openpi-multiarm/openpi/packages/openpi-client
+opt-einsum==3.4.0
+optax==0.2.4
+orbax-checkpoint==0.11.13
+orderly-set==5.4.1
+packaging==25.0
+pandas==2.2.3
+parso==0.8.4
+pexpect==4.9.0
+pfzy==0.3.4
+pillow==11.2.1
+platformdirs==4.3.8
+pluggy==1.6.0
+polars==1.30.0
+pre-commit==4.2.0
+prompt-toolkit==3.0.51
+propcache==0.3.1
+proto-plus==1.26.1
+protobuf==4.25.8
+psutil==7.0.0
+ptyprocess==0.7.0
+pure-eval==0.2.3
+pyarrow==20.0.0
+pyasn1==0.6.1
+pyasn1-modules==0.4.2
+pycparser==2.22
+pydantic==2.11.5
+pydantic-core==2.33.2
+pygments==2.19.1
+pymunk==7.0.0
+pynput==1.8.1
+pynvml==12.0.0
+pyopengl==3.1.9
+pyparsing==3.2.3
+pysocks==1.7.1
+pytest==8.3.5
+python-dateutil==2.9.0.post0
+python-xlib==0.33
+pytz==2025.2
+pyyaml==6.0.2
+pyyaml-include==1.4.1
+pyzmq==26.4.0
+regex==2024.11.6
+requests==2.32.3
+requests-oauthlib==2.0.0
+rerun-sdk==0.23.1
+rich==14.0.0
+rsa==4.9.1
+ruff==0.11.12
+safetensors==0.5.3
+scipy==1.15.3
+sentencepiece==0.2.0
+sentry-sdk==2.29.1
+setproctitle==1.3.6
+setuptools==80.9.0
+shtab==1.7.2
+simplejson==3.20.1
+six==1.17.0
+smmap==5.0.2
+soupsieve==2.7
+stack-data==0.6.3
+svgwrite==1.4.3
+sympy==1.14.0
+tensorstore==0.1.74
+termcolor==3.1.0
+tokenizers==0.21.1
+toml==0.10.2
+toolz==1.0.0
+torch==2.7.1
+torchcodec==0.4.0
+torchvision==0.22.1
+tornado==6.5.1
+tqdm==4.67.1
+tqdm-loggable==0.2
+traitlets==5.14.3
+transformers==4.53.2
+tree==0.2.4
+treescope==0.1.9
+triton==3.3.1
+typeguard==4.4.2
+typing-extensions==4.13.2
+typing-inspect==0.9.0
+typing-inspection==0.4.1
+tyro==0.9.22
+tzdata==2025.2
+urllib3==2.4.0
+virtualenv==20.31.2
+wadler-lindig==0.1.6
+wandb==0.19.11
+wcwidth==0.2.13
+websockets==15.0.1
+werkzeug==3.1.3
+widgetsnbextension==4.0.14
+wrapt==1.14.1
+xxhash==3.5.0
+yarl==1.20.0
+zarr==3.0.8
+zipp==3.22.0

artifacts/twin_handover_packed_parallelization_20260309/environment/python_env.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+timestamp_utc=2026-03-09T02:09:46Z
+Python 3.11.10
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/bin/python
+/usr/local/bin/uv
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/bin/python
+/workspace/pi05tests-openpi-multiarm/openpi/.venv
+torch=2.7.1+cu126
+cuda=12.6
+cudnn=90501
+huggingface_hub=0.32.3

artifacts/twin_handover_packed_parallelization_20260309/environment/selected_env_vars.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {}

artifacts/twin_handover_packed_parallelization_20260309/environment/system_info.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+timestamp_utc=2026-03-09T02:08:36Z
+hostname=9e9e564d5d6e
+uname=Linux 9e9e564d5d6e 6.8.0-90-generic #91-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 18 14:14:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
+python=Python 3.11.10
+uv=uv 0.10.9
+torch=2.7.1+cu126
+huggingface_hub=0.32.3

artifacts/twin_handover_packed_parallelization_20260309/environment/workspace_snapshot.txt ADDED Viewed

	@@ -0,0 +1,49 @@

+timestamp_utc=2026-03-09T02:10:07Z
+Top-level /workspace contents:
+/workspace/.codex
+/workspace/.hf
+/workspace/.local
+/workspace/bin
+/workspace/checkpoints
+/workspace/codex-env.sh
+/workspace/lerobot
+/workspace/openpi
+/workspace/openpi_partial_broken_1773005128
+/workspace/pi05tests-openpi-multiarm
+/workspace/run_logs
+Top-level packaged repo contents:
+/workspace/pi05tests-openpi-multiarm/.cache
+/workspace/pi05tests-openpi-multiarm/.cache/huggingface
+/workspace/pi05tests-openpi-multiarm/artifacts
+/workspace/pi05tests-openpi-multiarm/artifacts/pi05_base_params
+/workspace/pi05tests-openpi-multiarm/artifacts/twin_handover_packed_parallelization_20260309
+/workspace/pi05tests-openpi-multiarm/openpi
+/workspace/pi05tests-openpi-multiarm/openpi/.dockerignore
+/workspace/pi05tests-openpi-multiarm/openpi/.github
+/workspace/pi05tests-openpi-multiarm/openpi/.gitignore
+/workspace/pi05tests-openpi-multiarm/openpi/.gitmodules
+/workspace/pi05tests-openpi-multiarm/openpi/.pre-commit-config.yaml
+/workspace/pi05tests-openpi-multiarm/openpi/.python-version
+/workspace/pi05tests-openpi-multiarm/openpi/.venv
+/workspace/pi05tests-openpi-multiarm/openpi/.venv_partial_1773006322
+/workspace/pi05tests-openpi-multiarm/openpi/.vscode
+/workspace/pi05tests-openpi-multiarm/openpi/CONTRIBUTING.md
+/workspace/pi05tests-openpi-multiarm/openpi/LICENSE
+/workspace/pi05tests-openpi-multiarm/openpi/LICENSE_GEMMA.txt
+/workspace/pi05tests-openpi-multiarm/openpi/README.md
+/workspace/pi05tests-openpi-multiarm/openpi/assets
+/workspace/pi05tests-openpi-multiarm/openpi/checkpoints
+/workspace/pi05tests-openpi-multiarm/openpi/docs
+/workspace/pi05tests-openpi-multiarm/openpi/examples
+/workspace/pi05tests-openpi-multiarm/openpi/packages
+/workspace/pi05tests-openpi-multiarm/openpi/pyproject.toml
+/workspace/pi05tests-openpi-multiarm/openpi/scripts
+/workspace/pi05tests-openpi-multiarm/openpi/src
+/workspace/pi05tests-openpi-multiarm/openpi/uv.lock
+Selected sizes:
+410G	/workspace/pi05tests-openpi-multiarm
+2.9M	/workspace/checkpoints/pi05_base_single_pytorch
+2.9M	/workspace/checkpoints/pi05_base_parallel_packed_from_single

artifacts/twin_handover_packed_parallelization_20260309/metrics/norm_stats_verification.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+path=/workspace/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json
+keys=[actions,state]
+state_mean_len=16 state_std_len=16
+action_mean_len=16 action_std_len=16
+---
+path=/workspace/openpi/assets/pi05_twin_handover_256_packed_parallel_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json
+keys=[actions,state]
+state_mean_len=16 state_std_len=16
+action_mean_len=16 action_std_len=16

artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json ADDED Viewed

	@@ -0,0 +1,318 @@

+{
+  "train": {
+    "baseline": {
+      "steps": {
+        "250": {
+          "loss": 0.1975,
+          "smoothed_loss": 0.1166,
+          "lr": "2.50e-05",
+          "grad_norm": 1.0523,
+          "max_cuda_memory": "35.23GB"
+        },
+        "500": {
+          "loss": 0.0606,
+          "smoothed_loss": 0.0554,
+          "lr": "2.35e-05",
+          "grad_norm": 1.021,
+          "max_cuda_memory": "35.23GB"
+        },
+        "1000": {
+          "loss": 0.0245,
+          "smoothed_loss": 0.0387,
+          "lr": "1.58e-05",
+          "grad_norm": 1.0163,
+          "max_cuda_memory": "35.23GB"
+        },
+        "1500": {
+          "loss": 0.0155,
+          "smoothed_loss": 0.0331,
+          "lr": "6.60e-06",
+          "grad_norm": 0.7702,
+          "max_cuda_memory": "35.23GB"
+        },
+        "2000": {
+          "loss": 0.0391,
+          "smoothed_loss": 0.0278,
+          "lr": "2.50e-06",
+          "grad_norm": 0.7445,
+          "max_cuda_memory": "35.23GB"
+        }
+      },
+      "startup": {
+        "config_name": "pi05_twin_handover_256_packed_baseline_pytorch_2k",
+        "dataset_repo_id": "lsnu/twin_handover_256_train",
+        "norm_stats_file": "/workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json",
+        "checkpoint_source": "/workspace/checkpoints/pi05_base_single_pytorch",
+        "model_type": "baseline",
+        "packed_transforms": "True",
+        "world_size": "4",
+        "batch_size": "local=4, global=16",
+        "num_workers": "8",
+        "precision": "bfloat16",
+        "lr_schedule": "warmup_steps=200, peak_lr=2.50e-05, decay_steps=2000, decay_lr=2.50e-06",
+        "save_log_intervals": "save_interval=250, log_interval=10",
+        "action_loss_mask": "(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)",
+        "active_mask_dims": "[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]",
+        "masked_dims": "[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]",
+        "weight_missing_count": "0",
+        "weight_unexpected_count": "0",
+        "weight_missing_keys": "set()",
+        "weight_unexpected_keys": "[]"
+      },
+      "saves": {
+        "250": {
+          "timestamp": "00:25:28.986",
+          "path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/250"
+        },
+        "500": {
+          "timestamp": "00:29:40.355",
+          "path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/500"
+        },
+        "750": {
+          "timestamp": "00:35:01.426",
+          "path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/750"
+        },
+        "1000": {
+          "timestamp": "00:39:27.037",
+          "path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1000"
+        },
+        "1250": {
+          "timestamp": "00:43:25.467",
+          "path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1250"
+        },
+        "1500": {
+          "timestamp": "00:47:39.593",
+          "path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1500"
+        },
+        "1750": {
+          "timestamp": "00:51:38.690",
+          "path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1750"
+        },
+        "2000": {
+          "timestamp": "00:55:30.655",
+          "path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/2000"
+        }
+      },
+      "runtime": "33:27"
+    },
+    "parallel": {
+      "steps": {
+        "250": {
+          "loss": 0.1894,
+          "smoothed_loss": 0.1153,
+          "lr": "2.50e-05",
+          "grad_norm": 1.0751,
+          "max_cuda_memory": "35.27GB"
+        },
+        "500": {
+          "loss": 0.0633,
+          "smoothed_loss": 0.0565,
+          "lr": "2.35e-05",
+          "grad_norm": 1.001,
+          "max_cuda_memory": "35.27GB"
+        },
+        "1000": {
+          "loss": 0.0214,
+          "smoothed_loss": 0.0392,
+          "lr": "1.58e-05",
+          "grad_norm": 0.9669,
+          "max_cuda_memory": "35.27GB"
+        },
+        "1500": {
+          "loss": 0.0155,
+          "smoothed_loss": 0.0331,
+          "lr": "6.60e-06",
+          "grad_norm": 0.7305,
+          "max_cuda_memory": "35.27GB"
+        },
+        "2000": {
+          "loss": 0.0326,
+          "smoothed_loss": 0.027,
+          "lr": "2.50e-06",
+          "grad_norm": 0.735,
+          "max_cuda_memory": "35.27GB"
+        }
+      },
+      "startup": {
+        "config_name": "pi05_twin_handover_256_packed_parallel_pytorch_2k",
+        "dataset_repo_id": "lsnu/twin_handover_256_train",
+        "norm_stats_file": "/workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_parallel_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json",
+        "checkpoint_source": "/workspace/checkpoints/pi05_base_parallel_packed_from_single",
+        "model_type": "parallel",
+        "packed_transforms": "True",
+        "world_size": "4",
+        "batch_size": "local=4, global=16",
+        "num_workers": "8",
+        "precision": "bfloat16",
+        "lr_schedule": "warmup_steps=200, peak_lr=2.50e-05, decay_steps=2000, decay_lr=2.50e-06",
+        "save_log_intervals": "save_interval=250, log_interval=10",
+        "action_loss_mask": "(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)",
+        "active_mask_dims": "[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]",
+        "masked_dims": "[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]",
+        "weight_missing_count": "0",
+        "weight_unexpected_count": "0",
+        "weight_missing_keys": "set()",
+        "weight_unexpected_keys": "[]"
+      },
+      "saves": {
+        "250": {
+          "timestamp": "01:14:12.456",
+          "path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/250"
+        },
+        "500": {
+          "timestamp": "01:18:40.916",
+          "path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/500"
+        },
+        "750": {
+          "timestamp": "01:22:49.479",
+          "path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/750"
+        },
+        "1000": {
+          "timestamp": "01:26:47.884",
+          "path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1000"
+        },
+        "1250": {
+          "timestamp": "01:30:56.356",
+          "path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1250"
+        },
+        "1500": {
+          "timestamp": "01:34:31.362",
+          "path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1500"
+        },
+        "1750": {
+          "timestamp": "01:38:21.550",
+          "path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1750"
+        },
+        "2000": {
+          "timestamp": "01:42:18.699",
+          "path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/2000"
+        }
+      },
+      "runtime": "30:38"
+    }
+  },
+  "val": {
+    "baseline_1000": {
+      "checkpoint_path": "/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1000",
+      "repo_id_used": "lsnu/twin_handover_256_val",
+      "num_batches": 50,
+      "mean_val_loss": 0.052885,
+      "std_val_loss": 0.032533,
+      "timing": "mean=0.3108 std=0.1375 min=0.2230 max=1.1986",
+      "active_mask_dims": "[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]",
+      "masked_dims": "[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]",
+      "weight_loading_missing_keys": "[]",
+      "weight_loading_unexpected_keys": "[]"
+    },
+    "baseline_2000": {
+      "checkpoint_path": "/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/2000",
+      "repo_id_used": "lsnu/twin_handover_256_val",
+      "num_batches": 100,
+      "mean_val_loss": 0.035776,
+      "std_val_loss": 0.027648,
+      "timing": "mean=0.2587 std=0.1111 min=0.2224 max=1.2881",
+      "active_mask_dims": "[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]",
+      "masked_dims": "[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]",
+      "weight_loading_missing_keys": "[]",
+      "weight_loading_unexpected_keys": "[]"
+    },
+    "parallel_1000": {
+      "checkpoint_path": "/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1000",
+      "repo_id_used": "lsnu/twin_handover_256_val",
+      "num_batches": 50,
+      "mean_val_loss": 0.051214,
+      "std_val_loss": 0.028985,
+      "timing": "mean=0.2468 std=0.0900 min=0.2211 max=0.8606",
+      "active_mask_dims": "[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]",
+      "masked_dims": "[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]",
+      "weight_loading_missing_keys": "[]",
+      "weight_loading_unexpected_keys": "[]"
+    },
+    "parallel_2000": {
+      "checkpoint_path": "/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/2000",
+      "repo_id_used": "lsnu/twin_handover_256_val",
+      "num_batches": 100,
+      "mean_val_loss": 0.03568,
+      "std_val_loss": 0.026077,
+      "timing": "mean=0.2366 std=0.0593 min=0.2215 max=0.8235",
+      "active_mask_dims": "[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]",
+      "masked_dims": "[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]",
+      "weight_loading_missing_keys": "[]",
+      "weight_loading_unexpected_keys": "[]"
+    }
+  },
+  "wallclock": {
+    "followup_start_utc": "2026-03-09 00:31:32 UTC",
+    "followup_end_utc": "2026-03-09 01:49:19 UTC",
+    "pipeline_wallclock_from_baseline_start_to_final_val": "01:32:29",
+    "followup_runner_wallclock": "01:17:47",
+    "baseline_train_runtime": "33:27",
+    "parallel_train_runtime": "30:38",
+    "baseline_val_1000_runtime": "00:05:14",
+    "baseline_val_2000_runtime": "00:05:19",
+    "parallel_val_1000_runtime": "00:03:23",
+    "parallel_val_2000_runtime": "00:03:33"
+  },
+  "changed_files": [
+    {
+      "path": "openpi/src/openpi/transforms.py",
+      "description": "added PackPerArmBlocks and UnpackPerArmBlocks for semantic TWIN per-arm block packing"
+    },
+    {
+      "path": "openpi/src/openpi/training/config.py",
+      "description": "added packed TWIN model transforms, action_loss_mask, and 2K baseline/parallel configs"
+    },
+    {
+      "path": "openpi/src/openpi/training/data_loader.py",
+      "description": "added set_epoch and local dataset mirror handling / loader startup fixes"
+    },
+    {
+      "path": "openpi/src/openpi/models/model.py",
+      "description": "made pi0_pytorch import lazy"
+    },
+    {
+      "path": "openpi/src/openpi/models/tokenizer.py",
+      "description": "made AutoProcessor import lazy"
+    },
+    {
+      "path": "openpi/src/openpi/models_pytorch/pi0_pytorch.py",
+      "description": "disabled unconditional sample_actions torch.compile by default"
+    },
+    {
+      "path": "openpi/scripts/train_pytorch.py",
+      "description": "added startup logging, masked action loss, debug logging, and DDP/startup fixes"
+    },
+    {
+      "path": "openpi/scripts/eval_twin_val_loss_pytorch.py",
+      "description": "added masked val loss evaluation with configurable batches/workers and startup prints"
+    },
+    {
+      "path": "openpi/scripts/init_parallel_pi05_from_single_pytorch.py",
+      "description": "added exact packed parallel warm-start initialization from single-head checkpoint"
+    },
+    {
+      "path": "openpi/scripts/inspect_twin_packed_batch.py",
+      "description": "added packed batch inspection / zero-padding verification"
+    },
+    {
+      "path": "openpi/scripts/run_twin_handover_packed_followup.sh",
+      "description": "added detached follow-up automation for val passes and parallel launch"
+    },
+    {
+      "path": "openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json",
+      "description": "copied handover train norm stats for packed baseline config"
+    },
+    {
+      "path": "openpi/assets/pi05_twin_handover_256_packed_parallel_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json",
+      "description": "copied handover train norm stats for packed parallel config"
+    },
+    {
+      "path": "README.md",
+      "description": "new repo-level experiment summary for the uploaded artifact bundle"
+    },
+    {
+      "path": "REPORT.md",
+      "description": "new detailed experiment report tying outcomes to code and artifacts"
+    }
+  ]
+}

artifacts/twin_handover_packed_parallelization_20260309/metrics/train_loss_table.csv ADDED Viewed

	@@ -0,0 +1,11 @@

+model,step,loss,smoothed_loss,lr,grad_norm,max_cuda_memory
+baseline,250,0.1975,0.1166,2.50e-05,1.0523,35.23GB
+baseline,500,0.0606,0.0554,2.35e-05,1.021,35.23GB
+baseline,1000,0.0245,0.0387,1.58e-05,1.0163,35.23GB
+baseline,1500,0.0155,0.0331,6.60e-06,0.7702,35.23GB
+baseline,2000,0.0391,0.0278,2.50e-06,0.7445,35.23GB
+parallel,250,0.1894,0.1153,2.50e-05,1.0751,35.27GB
+parallel,500,0.0633,0.0565,2.35e-05,1.001,35.27GB
+parallel,1000,0.0214,0.0392,1.58e-05,0.9669,35.27GB
+parallel,1500,0.0155,0.0331,6.60e-06,0.7305,35.27GB
+parallel,2000,0.0326,0.027,2.50e-06,0.735,35.27GB

artifacts/twin_handover_packed_parallelization_20260309/metrics/val_loss_table.csv ADDED Viewed

	@@ -0,0 +1,5 @@

+model,checkpoint_step,num_batches,mean_val_loss,std_val_loss,timing
+baseline,1000,50,0.052885,0.032533,mean=0.3108 std=0.1375 min=0.2230 max=1.1986
+baseline,2000,100,0.035776,0.027648,mean=0.2587 std=0.1111 min=0.2224 max=1.2881
+parallel,1000,50,0.051214,0.028985,mean=0.2468 std=0.0900 min=0.2211 max=0.8606
+parallel,2000,100,0.03568,0.026077,mean=0.2366 std=0.0593 min=0.2215 max=0.8235

artifacts/twin_handover_packed_parallelization_20260309/repro/changed_files.txt ADDED Viewed

	@@ -0,0 +1,15 @@

+openpi/src/openpi/transforms.py	added PackPerArmBlocks and UnpackPerArmBlocks for semantic TWIN per-arm block packing
+openpi/src/openpi/training/config.py	added packed TWIN model transforms, action_loss_mask, and 2K baseline/parallel configs
+openpi/src/openpi/training/data_loader.py	added set_epoch and local dataset mirror handling / loader startup fixes
+openpi/src/openpi/models/model.py	made pi0_pytorch import lazy
+openpi/src/openpi/models/tokenizer.py	made AutoProcessor import lazy
+openpi/src/openpi/models_pytorch/pi0_pytorch.py	disabled unconditional sample_actions torch.compile by default
+openpi/scripts/train_pytorch.py	added startup logging, masked action loss, debug logging, and DDP/startup fixes
+openpi/scripts/eval_twin_val_loss_pytorch.py	added masked val loss evaluation with configurable batches/workers and startup prints
+openpi/scripts/init_parallel_pi05_from_single_pytorch.py	added exact packed parallel warm-start initialization from single-head checkpoint
+openpi/scripts/inspect_twin_packed_batch.py	added packed batch inspection / zero-padding verification
+openpi/scripts/run_twin_handover_packed_followup.sh	added detached follow-up automation for val passes and parallel launch
+openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json	copied handover train norm stats for packed baseline config
+openpi/assets/pi05_twin_handover_256_packed_parallel_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json	copied handover train norm stats for packed parallel config
+README.md	new repo-level experiment summary for the uploaded artifact bundle
+REPORT.md	new detailed experiment report tying outcomes to code and artifacts

artifacts/twin_handover_packed_parallelization_20260309/repro/checkpoint_locations.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+/workspace/checkpoints/pi05_base_single_pytorch
+/workspace/checkpoints/pi05_base_parallel_packed_from_single
+/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1000
+/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/2000
+/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1000
+/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/2000

artifacts/twin_handover_packed_parallelization_20260309/repro/commands_reproduce.sh ADDED Viewed

	@@ -0,0 +1,22 @@

+#!/usr/bin/env bash
+set -euo pipefail
+export HF_HOME=/workspace/.hf
+export HF_HUB_CACHE=/workspace/.hf/hub
+export HF_DATASETS_CACHE=/workspace/.hf/datasets
+export HUGGINGFACE_HUB_CACHE=/workspace/.hf/hub
+export XDG_CACHE_HOME=/workspace/.cache
+export OPENPI_LEROBOT_HOME=/workspace/lerobot
+export OPENPI_TORCH_COMPILE_SAMPLE_ACTIONS=0
+export TOKENIZERS_PARALLELISM=false
+export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
+cd /workspace/openpi
+source .venv/bin/activate
+python scripts/inspect_twin_packed_batch.py --config_name pi05_twin_handover_256_packed_baseline_pytorch_2k --repo_id lsnu/twin_handover_256_train
+python examples/convert_jax_model_to_pytorch.py --checkpoint_dir /workspace/pi05tests-openpi-multiarm/artifacts/pi05_base_params --config_name pi05_twin_handover_256_packed_baseline_pytorch_2k --output_path /workspace/checkpoints/pi05_base_single_pytorch --precision bfloat16
+python scripts/init_parallel_pi05_from_single_pytorch.py --single_ckpt /workspace/checkpoints/pi05_base_single_pytorch --config_name pi05_twin_handover_256_packed_parallel_pytorch_2k --output_path /workspace/checkpoints/pi05_base_parallel_packed_from_single
+torchrun --standalone --nproc_per_node=4 scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k --overwrite
+python scripts/eval_twin_val_loss_pytorch.py --config_name pi05_twin_handover_256_packed_baseline_pytorch_2k --checkpoint_dir /workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1000 --repo_id lsnu/twin_handover_256_val --num_batches 50 --num_workers 0
+python scripts/eval_twin_val_loss_pytorch.py --config_name pi05_twin_handover_256_packed_baseline_pytorch_2k --checkpoint_dir /workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/2000 --repo_id lsnu/twin_handover_256_val --num_batches 100 --num_workers 0
+torchrun --standalone --nproc_per_node=4 scripts/train_pytorch.py pi05_twin_handover_256_packed_parallel_pytorch_2k --exp_name handover_packed_parallel_2k --overwrite
+python scripts/eval_twin_val_loss_pytorch.py --config_name pi05_twin_handover_256_packed_parallel_pytorch_2k --checkpoint_dir /workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1000 --repo_id lsnu/twin_handover_256_val --num_batches 50 --num_workers 0
+python scripts/eval_twin_val_loss_pytorch.py --config_name pi05_twin_handover_256_packed_parallel_pytorch_2k --checkpoint_dir /workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/2000 --repo_id lsnu/twin_handover_256_val --num_batches 100 --num_workers 0

artifacts/twin_handover_packed_parallelization_20260309/run_logs/detach_test.log ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ hi
2	+ bye

artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k.log ADDED Viewed

The diff for this file is too large to render. See raw diff

artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k_val_1000.log ADDED Viewed

	@@ -0,0 +1,66 @@

+starting_eval config=pi05_twin_handover_256_packed_baseline_pytorch_2k checkpoint=/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1000 repo_id=lsnu/twin_handover_256_val
+eval_loader batch_size=16 num_batches=50 num_workers=0
+weight_loading missing=0 unexpected=0 device=cuda:0
+eval_batch=1 loss=0.031020 batch_time_s=1.1986
+eval_batch=2 loss=0.016421 batch_time_s=0.2400
+eval_batch=3 loss=0.019009 batch_time_s=0.2371
+eval_batch=4 loss=0.058900 batch_time_s=0.2230
+eval_batch=5 loss=0.039465 batch_time_s=0.2257
+eval_batch=6 loss=0.061871 batch_time_s=0.3408
+eval_batch=7 loss=0.039355 batch_time_s=0.2552
+eval_batch=8 loss=0.013108 batch_time_s=0.3001
+eval_batch=9 loss=0.037281 batch_time_s=0.3122
+eval_batch=10 loss=0.062332 batch_time_s=0.2296
+eval_batch=11 loss=0.026757 batch_time_s=0.2320
+eval_batch=12 loss=0.043025 batch_time_s=0.2359
+eval_batch=13 loss=0.047591 batch_time_s=0.2317
+eval_batch=14 loss=0.046923 batch_time_s=0.2352
+eval_batch=15 loss=0.048440 batch_time_s=0.3084
+eval_batch=16 loss=0.074316 batch_time_s=0.2294
+eval_batch=17 loss=0.068891 batch_time_s=0.2512
+eval_batch=18 loss=0.053325 batch_time_s=0.3206
+eval_batch=19 loss=0.035644 batch_time_s=0.3163
+eval_batch=20 loss=0.025946 batch_time_s=0.2289
+eval_batch=21 loss=0.048144 batch_time_s=0.2838
+eval_batch=22 loss=0.081570 batch_time_s=0.3150
+eval_batch=23 loss=0.062998 batch_time_s=0.3382
+eval_batch=24 loss=0.078956 batch_time_s=0.3765
+eval_batch=25 loss=0.045697 batch_time_s=0.3072
+eval_batch=26 loss=0.020523 batch_time_s=0.2988
+eval_batch=27 loss=0.035404 batch_time_s=0.3281
+eval_batch=28 loss=0.039222 batch_time_s=0.3669
+eval_batch=29 loss=0.053275 batch_time_s=0.3338
+eval_batch=30 loss=0.053682 batch_time_s=0.2773
+eval_batch=31 loss=0.124611 batch_time_s=0.3229
+eval_batch=32 loss=0.093004 batch_time_s=0.3327
+eval_batch=33 loss=0.100326 batch_time_s=0.3062
+eval_batch=34 loss=0.068221 batch_time_s=0.5203
+eval_batch=35 loss=0.067222 batch_time_s=0.3190
+eval_batch=36 loss=0.047065 batch_time_s=0.2393
+eval_batch=37 loss=0.019016 batch_time_s=0.2778
+eval_batch=38 loss=0.048523 batch_time_s=0.3234
+eval_batch=39 loss=0.075579 batch_time_s=0.2905
+eval_batch=40 loss=0.049607 batch_time_s=0.2612
+eval_batch=41 loss=0.047019 batch_time_s=0.3323
+eval_batch=42 loss=0.035811 batch_time_s=0.3344
+eval_batch=43 loss=0.021360 batch_time_s=0.3128
+eval_batch=44 loss=0.019255 batch_time_s=0.2885
+eval_batch=45 loss=0.022715 batch_time_s=0.3116
+eval_batch=46 loss=0.024246 batch_time_s=0.3442
+eval_batch=47 loss=0.077525 batch_time_s=0.2601
+eval_batch=48 loss=0.207067 batch_time_s=0.3068
+eval_batch=49 loss=0.033557 batch_time_s=0.2332
+eval_batch=50 loss=0.093434 batch_time_s=0.2469
+config_name: pi05_twin_handover_256_packed_baseline_pytorch_2k
+checkpoint_path: /workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1000
+repo_id_used: lsnu/twin_handover_256_val
+num_batches: 50
+mean_val_loss: 0.052885
+std_val_loss: 0.032533
+per_batch_timing_seconds: mean=0.3108 std=0.1375 min=0.2230 max=1.1986
+active_mask_dims: [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]
+masked_dims: [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]
+weight_loading_missing_keys: []
+weight_loading_unexpected_keys: []

artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k_val_2000.log ADDED Viewed

	@@ -0,0 +1,114 @@

+starting_eval config=pi05_twin_handover_256_packed_baseline_pytorch_2k checkpoint=/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/2000 repo_id=lsnu/twin_handover_256_val
+eval_loader batch_size=16 num_batches=100 num_workers=0
+weight_loading missing=0 unexpected=0 device=cuda:0
+eval_batch=1 loss=0.019216 batch_time_s=1.2881
+eval_batch=2 loss=0.013719 batch_time_s=0.2542
+eval_batch=3 loss=0.012779 batch_time_s=0.2498
+eval_batch=4 loss=0.026855 batch_time_s=0.2422
+eval_batch=5 loss=0.023092 batch_time_s=0.2363
+eval_batch=6 loss=0.063545 batch_time_s=0.2285
+eval_batch=7 loss=0.035285 batch_time_s=0.2961
+eval_batch=8 loss=0.014463 batch_time_s=0.2318
+eval_batch=9 loss=0.029309 batch_time_s=0.2403
+eval_batch=10 loss=0.043977 batch_time_s=0.2449
+eval_batch=11 loss=0.024810 batch_time_s=0.2426
+eval_batch=12 loss=0.031340 batch_time_s=0.2310
+eval_batch=13 loss=0.038825 batch_time_s=0.3180
+eval_batch=14 loss=0.036152 batch_time_s=0.2432
+eval_batch=15 loss=0.034914 batch_time_s=0.3352
+eval_batch=16 loss=0.053971 batch_time_s=0.2680
+eval_batch=17 loss=0.031400 batch_time_s=0.2827
+eval_batch=18 loss=0.040505 batch_time_s=0.2913
+eval_batch=19 loss=0.016300 batch_time_s=0.2329
+eval_batch=20 loss=0.023962 batch_time_s=0.2303
+eval_batch=21 loss=0.034431 batch_time_s=0.2705
+eval_batch=22 loss=0.056853 batch_time_s=0.2979
+eval_batch=23 loss=0.038143 batch_time_s=0.2601
+eval_batch=24 loss=0.075043 batch_time_s=0.3020
+eval_batch=25 loss=0.058564 batch_time_s=0.5796
+eval_batch=26 loss=0.032481 batch_time_s=0.2340
+eval_batch=27 loss=0.035333 batch_time_s=0.2701
+eval_batch=28 loss=0.042256 batch_time_s=0.3069
+eval_batch=29 loss=0.067687 batch_time_s=0.2336
+eval_batch=30 loss=0.048997 batch_time_s=0.2917
+eval_batch=31 loss=0.119097 batch_time_s=0.2272
+eval_batch=32 loss=0.060042 batch_time_s=0.2282
+eval_batch=33 loss=0.058640 batch_time_s=0.2405
+eval_batch=34 loss=0.062960 batch_time_s=0.2298
+eval_batch=35 loss=0.052300 batch_time_s=0.2224
+eval_batch=36 loss=0.036295 batch_time_s=0.2275
+eval_batch=37 loss=0.025163 batch_time_s=0.2301
+eval_batch=38 loss=0.032151 batch_time_s=0.2865
+eval_batch=39 loss=0.052523 batch_time_s=0.2395
+eval_batch=40 loss=0.017417 batch_time_s=0.2338
+eval_batch=41 loss=0.028829 batch_time_s=0.2308
+eval_batch=42 loss=0.031216 batch_time_s=0.2330
+eval_batch=43 loss=0.005192 batch_time_s=0.2345
+eval_batch=44 loss=0.011528 batch_time_s=0.2308
+eval_batch=45 loss=0.046379 batch_time_s=0.2311
+eval_batch=46 loss=0.026113 batch_time_s=0.2280
+eval_batch=47 loss=0.093653 batch_time_s=0.2313
+eval_batch=48 loss=0.219696 batch_time_s=0.2301
+eval_batch=49 loss=0.021639 batch_time_s=0.2477
+eval_batch=50 loss=0.062274 batch_time_s=0.2299
+eval_batch=51 loss=0.043294 batch_time_s=0.2282
+eval_batch=52 loss=0.020800 batch_time_s=0.2402
+eval_batch=53 loss=0.017962 batch_time_s=0.2315
+eval_batch=54 loss=0.011119 batch_time_s=0.2258
+eval_batch=55 loss=0.022601 batch_time_s=0.2330
+eval_batch=56 loss=0.063293 batch_time_s=0.2378
+eval_batch=57 loss=0.033958 batch_time_s=0.2375
+eval_batch=58 loss=0.025469 batch_time_s=0.2294
+eval_batch=59 loss=0.019972 batch_time_s=0.2376
+eval_batch=60 loss=0.004765 batch_time_s=0.2354
+eval_batch=61 loss=0.014635 batch_time_s=0.2449
+eval_batch=62 loss=0.006239 batch_time_s=0.2288
+eval_batch=63 loss=0.041332 batch_time_s=0.2520
+eval_batch=64 loss=0.016763 batch_time_s=0.2517
+eval_batch=65 loss=0.028758 batch_time_s=0.2447
+eval_batch=66 loss=0.026301 batch_time_s=0.2312
+eval_batch=67 loss=0.014657 batch_time_s=0.2353
+eval_batch=68 loss=0.043065 batch_time_s=0.2276
+eval_batch=69 loss=0.048954 batch_time_s=0.2282
+eval_batch=70 loss=0.047917 batch_time_s=0.2359
+eval_batch=71 loss=0.013441 batch_time_s=0.2318
+eval_batch=72 loss=0.023035 batch_time_s=0.2453
+eval_batch=73 loss=0.024245 batch_time_s=0.2530
+eval_batch=74 loss=0.021810 batch_time_s=0.2387
+eval_batch=75 loss=0.016290 batch_time_s=0.2281
+eval_batch=76 loss=0.019809 batch_time_s=0.2320
+eval_batch=77 loss=0.016700 batch_time_s=0.2462
+eval_batch=78 loss=0.049874 batch_time_s=0.2369
+eval_batch=79 loss=0.065255 batch_time_s=0.2548
+eval_batch=80 loss=0.077142 batch_time_s=0.2906
+eval_batch=81 loss=0.059736 batch_time_s=0.3057
+eval_batch=82 loss=0.011131 batch_time_s=0.2359
+eval_batch=83 loss=0.016865 batch_time_s=0.2454
+eval_batch=84 loss=0.007890 batch_time_s=0.2386
+eval_batch=85 loss=0.044606 batch_time_s=0.2352
+eval_batch=86 loss=0.014035 batch_time_s=0.2365
+eval_batch=87 loss=0.020954 batch_time_s=0.2419
+eval_batch=88 loss=0.042758 batch_time_s=0.2262
+eval_batch=89 loss=0.019468 batch_time_s=0.2352
+eval_batch=90 loss=0.004773 batch_time_s=0.2292
+eval_batch=91 loss=0.005070 batch_time_s=0.2296
+eval_batch=92 loss=0.007161 batch_time_s=0.2291
+eval_batch=93 loss=0.026996 batch_time_s=0.2361
+eval_batch=94 loss=0.011121 batch_time_s=0.2456
+eval_batch=95 loss=0.041840 batch_time_s=0.2409
+eval_batch=96 loss=0.054416 batch_time_s=0.2333
+eval_batch=97 loss=0.024979 batch_time_s=0.2276
+eval_batch=98 loss=0.062096 batch_time_s=0.2403
+eval_batch=99 loss=0.032598 batch_time_s=0.2326
+eval_batch=100 loss=0.022353 batch_time_s=0.2274
+config_name: pi05_twin_handover_256_packed_baseline_pytorch_2k
+checkpoint_path: /workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/2000
+repo_id_used: lsnu/twin_handover_256_val
+num_batches: 100
+mean_val_loss: 0.035776
+std_val_loss: 0.027648
+per_batch_timing_seconds: mean=0.2587 std=0.1111 min=0.2224 max=1.2881
+active_mask_dims: [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]
+masked_dims: [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]
+weight_loading_missing_keys: []
+weight_loading_unexpected_keys: []

artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k.log ADDED Viewed

The diff for this file is too large to render. See raw diff

artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k_val_1000.log ADDED Viewed

	@@ -0,0 +1,64 @@

+starting_eval config=pi05_twin_handover_256_packed_parallel_pytorch_2k checkpoint=/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1000 repo_id=lsnu/twin_handover_256_val
+eval_loader batch_size=16 num_batches=50 num_workers=0
+weight_loading missing=0 unexpected=0 device=cuda:0
+eval_batch=1 loss=0.039282 batch_time_s=0.8606
+eval_batch=2 loss=0.059935 batch_time_s=0.2233
+eval_batch=3 loss=0.029645 batch_time_s=0.2237
+eval_batch=4 loss=0.030436 batch_time_s=0.2312
+eval_batch=5 loss=0.029398 batch_time_s=0.2255
+eval_batch=6 loss=0.046098 batch_time_s=0.2291
+eval_batch=7 loss=0.031397 batch_time_s=0.2243
+eval_batch=8 loss=0.013987 batch_time_s=0.2256
+eval_batch=9 loss=0.046950 batch_time_s=0.3194
+eval_batch=10 loss=0.055185 batch_time_s=0.2211
+eval_batch=11 loss=0.045538 batch_time_s=0.2270
+eval_batch=12 loss=0.034314 batch_time_s=0.2221
+eval_batch=13 loss=0.053436 batch_time_s=0.2306
+eval_batch=14 loss=0.048917 batch_time_s=0.2322
+eval_batch=15 loss=0.059734 batch_time_s=0.2346
+eval_batch=16 loss=0.072608 batch_time_s=0.2275
+eval_batch=17 loss=0.071442 batch_time_s=0.2257
+eval_batch=18 loss=0.056916 batch_time_s=0.2247
+eval_batch=19 loss=0.025555 batch_time_s=0.2238
+eval_batch=20 loss=0.031001 batch_time_s=0.2557
+eval_batch=21 loss=0.054189 batch_time_s=0.2259
+eval_batch=22 loss=0.046724 batch_time_s=0.2544
+eval_batch=23 loss=0.048790 batch_time_s=0.2389
+eval_batch=24 loss=0.073533 batch_time_s=0.2283
+eval_batch=25 loss=0.060645 batch_time_s=0.2387
+eval_batch=26 loss=0.020740 batch_time_s=0.2323
+eval_batch=27 loss=0.027174 batch_time_s=0.2226
+eval_batch=28 loss=0.030402 batch_time_s=0.2211
+eval_batch=29 loss=0.037136 batch_time_s=0.2303
+eval_batch=30 loss=0.057298 batch_time_s=0.2221
+eval_batch=31 loss=0.133256 batch_time_s=0.2228
+eval_batch=32 loss=0.081425 batch_time_s=0.2285
+eval_batch=33 loss=0.101147 batch_time_s=0.2291
+eval_batch=34 loss=0.084155 batch_time_s=0.2763
+eval_batch=35 loss=0.050369 batch_time_s=0.2300
+eval_batch=36 loss=0.037849 batch_time_s=0.2228
+eval_batch=37 loss=0.016911 batch_time_s=0.2211
+eval_batch=38 loss=0.035706 batch_time_s=0.2215
+eval_batch=39 loss=0.074094 batch_time_s=0.2247
+eval_batch=40 loss=0.031583 batch_time_s=0.2256
+eval_batch=41 loss=0.063281 batch_time_s=0.2345
+eval_batch=42 loss=0.034781 batch_time_s=0.2247
+eval_batch=43 loss=0.021991 batch_time_s=0.3036
+eval_batch=44 loss=0.006788 batch_time_s=0.2310
+eval_batch=45 loss=0.029891 batch_time_s=0.2888
+eval_batch=46 loss=0.024711 batch_time_s=0.2320
+eval_batch=47 loss=0.139781 batch_time_s=0.2281
+eval_batch=48 loss=0.129609 batch_time_s=0.2421
+eval_batch=49 loss=0.039653 batch_time_s=0.2222
+eval_batch=50 loss=0.085291 batch_time_s=0.2304
+config_name: pi05_twin_handover_256_packed_parallel_pytorch_2k
+checkpoint_path: /workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1000
+repo_id_used: lsnu/twin_handover_256_val
+num_batches: 50
+mean_val_loss: 0.051214
+std_val_loss: 0.028985
+per_batch_timing_seconds: mean=0.2468 std=0.0900 min=0.2211 max=0.8606
+active_mask_dims: [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]
+masked_dims: [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]
+weight_loading_missing_keys: []
+weight_loading_unexpected_keys: []

artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k_val_2000.log ADDED Viewed

	@@ -0,0 +1,114 @@

+starting_eval config=pi05_twin_handover_256_packed_parallel_pytorch_2k checkpoint=/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/2000 repo_id=lsnu/twin_handover_256_val
+eval_loader batch_size=16 num_batches=100 num_workers=0
+weight_loading missing=0 unexpected=0 device=cuda:0
+eval_batch=1 loss=0.019788 batch_time_s=0.8235
+eval_batch=2 loss=0.010034 batch_time_s=0.2312
+eval_batch=3 loss=0.006535 batch_time_s=0.2283
+eval_batch=4 loss=0.019442 batch_time_s=0.2249
+eval_batch=5 loss=0.023646 batch_time_s=0.2275
+eval_batch=6 loss=0.045010 batch_time_s=0.2273
+eval_batch=7 loss=0.021796 batch_time_s=0.2327
+eval_batch=8 loss=0.019273 batch_time_s=0.2319
+eval_batch=9 loss=0.021624 batch_time_s=0.2248
+eval_batch=10 loss=0.035467 batch_time_s=0.2359
+eval_batch=11 loss=0.034351 batch_time_s=0.2552
+eval_batch=12 loss=0.027341 batch_time_s=0.2308
+eval_batch=13 loss=0.047439 batch_time_s=0.2257
+eval_batch=14 loss=0.037939 batch_time_s=0.2329
+eval_batch=15 loss=0.043057 batch_time_s=0.2215
+eval_batch=16 loss=0.038503 batch_time_s=0.2317
+eval_batch=17 loss=0.043592 batch_time_s=0.2290
+eval_batch=18 loss=0.037270 batch_time_s=0.2265
+eval_batch=19 loss=0.020304 batch_time_s=0.2329
+eval_batch=20 loss=0.030268 batch_time_s=0.2234
+eval_batch=21 loss=0.041346 batch_time_s=0.2263
+eval_batch=22 loss=0.028159 batch_time_s=0.2268
+eval_batch=23 loss=0.065991 batch_time_s=0.2251
+eval_batch=24 loss=0.064603 batch_time_s=0.2268
+eval_batch=25 loss=0.068628 batch_time_s=0.2282
+eval_batch=26 loss=0.023403 batch_time_s=0.2302
+eval_batch=27 loss=0.031110 batch_time_s=0.2274
+eval_batch=28 loss=0.022352 batch_time_s=0.2289
+eval_batch=29 loss=0.046446 batch_time_s=0.2292
+eval_batch=30 loss=0.043246 batch_time_s=0.2321
+eval_batch=31 loss=0.101922 batch_time_s=0.2274
+eval_batch=32 loss=0.072581 batch_time_s=0.2300
+eval_batch=33 loss=0.056358 batch_time_s=0.2252
+eval_batch=34 loss=0.065017 batch_time_s=0.2306
+eval_batch=35 loss=0.048672 batch_time_s=0.2388
+eval_batch=36 loss=0.022249 batch_time_s=0.2322
+eval_batch=37 loss=0.014201 batch_time_s=0.2266
+eval_batch=38 loss=0.039009 batch_time_s=0.2261
+eval_batch=39 loss=0.033967 batch_time_s=0.2303
+eval_batch=40 loss=0.021915 batch_time_s=0.2462
+eval_batch=41 loss=0.024328 batch_time_s=0.2613
+eval_batch=42 loss=0.050496 batch_time_s=0.2354
+eval_batch=43 loss=0.010375 batch_time_s=0.2300
+eval_batch=44 loss=0.016967 batch_time_s=0.2276
+eval_batch=45 loss=0.026333 batch_time_s=0.2552
+eval_batch=46 loss=0.019980 batch_time_s=0.2267
+eval_batch=47 loss=0.089578 batch_time_s=0.2327
+eval_batch=48 loss=0.209416 batch_time_s=0.2445
+eval_batch=49 loss=0.011339 batch_time_s=0.2359
+eval_batch=50 loss=0.066028 batch_time_s=0.2251
+eval_batch=51 loss=0.035093 batch_time_s=0.2288
+eval_batch=52 loss=0.020534 batch_time_s=0.2276
+eval_batch=53 loss=0.006331 batch_time_s=0.2313
+eval_batch=54 loss=0.012782 batch_time_s=0.2247
+eval_batch=55 loss=0.022509 batch_time_s=0.2299
+eval_batch=56 loss=0.047079 batch_time_s=0.2317
+eval_batch=57 loss=0.023989 batch_time_s=0.2302
+eval_batch=58 loss=0.019615 batch_time_s=0.2322
+eval_batch=59 loss=0.026347 batch_time_s=0.2346
+eval_batch=60 loss=0.004678 batch_time_s=0.2323
+eval_batch=61 loss=0.007068 batch_time_s=0.2324
+eval_batch=62 loss=0.013162 batch_time_s=0.2336
+eval_batch=63 loss=0.047115 batch_time_s=0.2236
+eval_batch=64 loss=0.017077 batch_time_s=0.2299
+eval_batch=65 loss=0.047049 batch_time_s=0.2288
+eval_batch=66 loss=0.035518 batch_time_s=0.2257
+eval_batch=67 loss=0.016819 batch_time_s=0.2306
+eval_batch=68 loss=0.051586 batch_time_s=0.2215
+eval_batch=69 loss=0.043497 batch_time_s=0.2312
+eval_batch=70 loss=0.072536 batch_time_s=0.2301
+eval_batch=71 loss=0.018621 batch_time_s=0.2365
+eval_batch=72 loss=0.043862 batch_time_s=0.2305
+eval_batch=73 loss=0.034882 batch_time_s=0.2314
+eval_batch=74 loss=0.028771 batch_time_s=0.2286
+eval_batch=75 loss=0.012547 batch_time_s=0.2269
+eval_batch=76 loss=0.023966 batch_time_s=0.2317
+eval_batch=77 loss=0.023444 batch_time_s=0.2290
+eval_batch=78 loss=0.048585 batch_time_s=0.2343
+eval_batch=79 loss=0.065904 batch_time_s=0.2264
+eval_batch=80 loss=0.072660 batch_time_s=0.2255
+eval_batch=81 loss=0.038694 batch_time_s=0.2281
+eval_batch=82 loss=0.013027 batch_time_s=0.2302
+eval_batch=83 loss=0.022540 batch_time_s=0.2336
+eval_batch=84 loss=0.010291 batch_time_s=0.2216
+eval_batch=85 loss=0.054119 batch_time_s=0.2286
+eval_batch=86 loss=0.021808 batch_time_s=0.2305
+eval_batch=87 loss=0.018521 batch_time_s=0.2330
+eval_batch=88 loss=0.042638 batch_time_s=0.2329
+eval_batch=89 loss=0.023391 batch_time_s=0.2352
+eval_batch=90 loss=0.004995 batch_time_s=0.2289
+eval_batch=91 loss=0.006358 batch_time_s=0.2311
+eval_batch=92 loss=0.024077 batch_time_s=0.2306
+eval_batch=93 loss=0.039791 batch_time_s=0.2334
+eval_batch=94 loss=0.046554 batch_time_s=0.2327
+eval_batch=95 loss=0.038985 batch_time_s=0.2279
+eval_batch=96 loss=0.034484 batch_time_s=0.2243
+eval_batch=97 loss=0.037144 batch_time_s=0.2285
+eval_batch=98 loss=0.069108 batch_time_s=0.2318
+eval_batch=99 loss=0.035033 batch_time_s=0.2335
+eval_batch=100 loss=0.024118 batch_time_s=0.2258
+config_name: pi05_twin_handover_256_packed_parallel_pytorch_2k
+checkpoint_path: /workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/2000
+repo_id_used: lsnu/twin_handover_256_val
+num_batches: 100
+mean_val_loss: 0.035680
+std_val_loss: 0.026077
+per_batch_timing_seconds: mean=0.2366 std=0.0593 min=0.2215 max=0.8235
+active_mask_dims: [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]
+masked_dims: [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]
+weight_loading_missing_keys: []
+weight_loading_unexpected_keys: []

artifacts/twin_handover_packed_parallelization_20260309/run_logs/importtime_train_pytorch.log ADDED Viewed

	@@ -0,0 +1,349 @@

+import time: self [us] | cumulative | imported package
+import time:       459 |        459 |   _io
+import time:       100 |        100 |   marshal
+import time:      1005 |       1005 |   posix
+import time:      2124 |       3687 | _frozen_importlib_external
+import time:       521 |        521 |   time
+import time:       542 |       1062 | zipimport
+import time:       126 |        126 |     _codecs
+import time:      1309 |       1435 |   codecs
+import time:      1112 |       1112 |   encodings.aliases
+import time:      2172 |       4718 | encodings
+import time:       579 |        579 | encodings.utf_8
+import time:       307 |        307 | _signal
+import time:       100 |        100 |     _abc
+import time:       561 |        660 |   abc
+import time:       825 |       1484 | io
+import time:       115 |        115 |       _stat
+import time:       578 |        693 |     stat
+import time:      2208 |       2208 |     _collections_abc
+import time:       104 |        104 |       genericpath
+import time:       550 |        653 |     posixpath
+import time:      1911 |       5463 |   os
+import time:       177 |        177 |   _sitebuiltins
+import time:     27305 |      27305 |   _virtualenv
+import time:     29406 |      29406 |   _distutils_hack
+import time:       427 |        427 |   sitecustomize
+import time:    150941 |     213717 | site
+import time:     44311 |      44311 |   scripts
+import time:      2030 |       2030 |         types
+import time:       257 |        257 |           _operator
+import time:      2479 |       2735 |         operator
+import time:       339 |        339 |             itertools
+import time:      1371 |       1371 |             keyword
+import time:      1450 |       1450 |             reprlib
+import time:       191 |        191 |             _collections
+import time:      5447 |       8797 |           collections
+import time:       173 |        173 |           _functools
+import time:      2662 |      11631 |         functools
+import time:      8546 |      24941 |       enum
+import time:       219 |        219 |         _sre
+import time:       791 |        791 |           re._constants
+import time:      1361 |       2151 |         re._parser
+import time:       365 |        365 |         re._casefix
+import time:      2178 |       4912 |       re._compiler
+import time:      1533 |       1533 |       copyreg
+import time:      3330 |      34714 |     re
+import time:      1611 |       1611 |         _weakrefset
+import time:      3100 |       4710 |       weakref
+import time:      9576 |       9576 |           org
+import time:       255 |       9830 |         org.python
+import time:       224 |      10053 |       org.python.core
+import time:      2268 |      17030 |     copy
+import time:      3210 |       3210 |         _ast
+import time:      4811 |       4811 |         contextlib
+import time:      7411 |      15431 |       ast
+import time:       164 |        164 |           _opcode
+import time:      4583 |       4747 |         opcode
+import time:      5950 |      10696 |       dis
+import time:       612 |        612 |       collections.abc
+import time:      2739 |       2739 |           warnings
+import time:      2281 |       5019 |         importlib
+import time:       368 |       5387 |       importlib.machinery
+import time:      3055 |       3055 |           token
+import time:      6195 |       9250 |         tokenize
+import time:      2342 |      11591 |       linecache
+import time:      8114 |      51829 |     inspect
+import time:      4396 |     107967 |   dataclasses
+import time:       186 |        186 |   gc
+import time:      4839 |       4839 |       textwrap
+import time:      3120 |       7958 |     traceback
+import time:       130 |        130 |       _string
+import time:      3121 |       3250 |     string
+import time:      4201 |       4201 |     threading
+import time:       127 |        127 |     atexit
+import time:      7952 |      23486 |   logging
+import time:      6543 |       6543 |   platform
+import time:      2295 |       2295 |     fnmatch
+import time:       287 |        287 |     errno
+import time:       336 |        336 |     zlib
+import time:      4029 |       4029 |       _compression
+import time:      2862 |       2862 |       _bz2
+import time:      4040 |      10930 |     bz2
+import time:      4869 |       4869 |       _lzma
+import time:      4931 |       9800 |     lzma
+import time:      6201 |      29847 |   shutil
+import time:      4466 |       4466 |       __future__
+import time:       222 |        222 |         math
+import time:       341 |        341 |         _datetime
+import time:      9607 |      10169 |       datetime
+import time:      8169 |       8169 |           _winapi
+import time:     11261 |      11261 |           nt
+import time:      9784 |       9784 |           nt
+import time:      8028 |       8028 |           nt
+import time:     10836 |      10836 |           nt
+import time:      8338 |       8338 |           nt
+import time:      3115 |      59529 |         ntpath
+import time:      3503 |       3503 |           urllib
+import time:      7606 |       7606 |           ipaddress
+import time:      3508 |      14616 |         urllib.parse
+import time:      7032 |      81177 |       pathlib
+import time:       279 |        279 |           _locale
+import time:      7805 |       8083 |         locale
+import time:      5367 |       5367 |         signal
+import time:       213 |        213 |         fcntl
+import time:      9064 |       9064 |         msvcrt
+import time:       169 |        169 |         _posixsubprocess
+import time:       236 |        236 |         select
+import time:      6301 |       6301 |         selectors
+import time:     11615 |      41045 |       subprocess
+import time:     42021 |     178875 |     jax.version
+import time:     56236 |      56236 |       jax._src
+import time:      8786 |       8786 |           _typing
+import time:     13264 |      22049 |         typing
+import time:     58851 |      58851 |             jaxlib.version
+import time:     95721 |     154572 |           jaxlib
+import time:     21162 |      21162 |           jaxlib.cpu_feature_guard
+import time:     17965 |      17965 |           jaxlib.utils
+import time:       357 |        357 |                 _struct
+import time:      1673 |       2029 |               struct
+import time:     12406 |      14434 |             gzip
+import time:     72050 |      72050 |                             numpy._utils._convertions
+import time:     90873 |     162922 |                           numpy._utils
+import time:     58590 |     221512 |                         numpy._globals
+import time:     53224 |      53224 |                         numpy.exceptions
+import time:     52905 |      52905 |                         numpy.version
+import time:       609 |        609 |                           numpy._distributor_init_local
+import time:     55586 |      56194 |                         numpy._distributor_init
+import time:     35135 |      35135 |                                   numpy._utils._inspect
+import time:     12046 |      12046 |                                     numpy.core._exceptions
+import time:      8271 |       8271 |                                     numpy.dtypes
+import time:    202114 |     222430 |                                   numpy.core._multiarray_umath
+import time:     37402 |     294966 |                                 numpy.core.overrides
+import time:     53534 |     348500 |                               numpy.core.multiarray
+import time:      5565 |       5565 |                               numpy.core.umath
+import time:      1567 |       1567 |                                 numbers
+import time:      8550 |       8550 |                                 numpy.core._string_helpers
+import time:      8226 |       8226 |                                       pickle5
+import time:      1179 |       1179 |                                         _compat_pickle
+import time:       512 |        512 |                                         _pickle
+import time:      1864 |       1864 |                                             org
+import time:       329 |       2192 |                                           org.python
+import time:       776 |       2968 |                                         org.python.core
+import time:      4888 |       9545 |                                       pickle
+import time:     10531 |      28301 |                                     numpy.compat.py3k
+import time:     25123 |      53423 |                                   numpy.compat
+import time:      7502 |       7502 |                                   numpy.core._dtype
+import time:      8090 |      69014 |                                 numpy.core._type_aliases
+import time:      6044 |      85174 |                               numpy.core.numerictypes
+import time:       646 |        646 |                                           _contextvars
+import time:       805 |       1451 |                                         contextvars
+import time:      7510 |       8961 |                                       numpy.core._ufunc_config
+import time:     21816 |      30776 |                                     numpy.core._methods
+import time:     10769 |      41545 |                                   numpy.core.fromnumeric
+import time:      7787 |      49331 |                                 numpy.core.shape_base
+import time:      6325 |       6325 |                                 numpy.core.arrayprint
+import time:      4369 |       4369 |                                 numpy.core._asarray
+import time:     10436 |      70460 |                               numpy.core.numeric
+import time:      5458 |       5458 |                               numpy.core.defchararray
+import time:      6653 |       6653 |                               numpy.core.records
+import time:      2659 |       2659 |                               numpy.core.memmap
+import time:      3430 |       3430 |                               numpy.core.function_base
+import time:      3739 |       3739 |                               numpy.core._machar
+import time:      4821 |       4821 |                               numpy.core.getlimits
+import time:      5141 |       5141 |                               numpy.core.einsumfunc
+import time:      2892 |       2892 |                                 numpy.core._multiarray_tests
+import time:      7349 |      10241 |                               numpy.core._add_newdocs
+import time:     10209 |      10209 |                               numpy.core._add_newdocs_scalars
+import time:      4958 |       4958 |                               numpy.core._dtype_ctypes
+import time:      1331 |       1331 |                                   _ctypes
+import time:      1038 |       1038 |                                   ctypes._endian
+import time:      3302 |       5670 |                                 ctypes
+import time:      8903 |      14573 |                               numpy.core._internal
+import time:      7543 |       7543 |                               numpy._pytesttester
+import time:     81885 |     671000 |                             numpy.core
+import time:       153 |     671152 |                           numpy.core._multiarray_umath
+import time:     56994 |     728146 |                         numpy.__config__
+import time:      7653 |       7653 |                           numpy.lib.mixins
+import time:      9676 |       9676 |                               numpy.lib.ufunclike
+import time:      7766 |      17441 |                             numpy.lib.type_check
+import time:     10010 |      27450 |                           numpy.lib.scimath
+import time:     22351 |      22351 |                                       numpy.lib.stride_tricks
+import time:     11303 |      33654 |                                     numpy.lib.twodim_base
+import time:      8761 |       8761 |                                     numpy.linalg._umath_linalg
+import time:     16569 |      16569 |                                       numpy._typing._nested_sequence
+import time:     13982 |      13982 |                                       numpy._typing._nbit
+import time:     20263 |      20263 |                                       numpy._typing._char_codes
+import time:     11700 |      11700 |                                       numpy._typing._scalars
+import time:      8982 |       8982 |                                       numpy._typing._shape
+import time:     24532 |      24532 |                                       numpy._typing._dtype_like
+import time:     44660 |      44660 |                                       numpy._typing._array_like
+import time:     29866 |     170550 |                                     numpy._typing
+import time:     17677 |     230640 |                                   numpy.linalg.linalg
+import time:    237805 |     468444 |                                 numpy.linalg
+import time:      9029 |     477473 |                               numpy.matrixlib.defmatrix
+import time:     10944 |     488417 |                             numpy.matrixlib
+import time:      8745 |       8745 |                               numpy.lib.histograms
+import time:     27873 |      36617 |                             numpy.lib.function_base
+import time:     17216 |     542249 |                           numpy.lib.index_tricks
+import time:     16518 |      16518 |                           numpy.lib.nanfunctions
+import time:     14925 |      14925 |                           numpy.lib.shape_base
+import time:      8883 |       8883 |                           numpy.lib.polynomial
+import time:     13341 |      13341 |                           numpy.lib.utils
+import time:     13347 |      13347 |                           numpy.lib.arraysetops
+import time:     18662 |      18662 |                             numpy.lib.format
+import time:      9834 |       9834 |                             numpy.lib._datasource
+import time:     10465 |      10465 |                             numpy.lib._iotools
+import time:     26974 |      65935 |                           numpy.lib.npyio
+import time:     14808 |      14808 |                           numpy.lib.arrayterator
+import time:     28751 |      28751 |                           numpy.lib.arraypad
+import time:     31641 |      31641 |                           numpy.lib._version
+import time:     16718 |     802213 |                         numpy.lib
+import time:     13764 |      13764 |                             numpy.fft._pocketfft_internal
+import time:     47189 |      60952 |                           numpy.fft._pocketfft
+import time:     34176 |      34176 |                           numpy.fft.helper
+import time:     57859 |     152987 |                         numpy.fft
+import time:     32723 |      32723 |                             numpy.polynomial.polyutils
+import time:     20810 |      20810 |                             numpy.polynomial._polybase
+import time:     47703 |     101235 |                           numpy.polynomial.polynomial
+import time:     22597 |      22597 |                           numpy.polynomial.chebyshev
+import time:     15190 |      15190 |                           numpy.polynomial.legendre
+import time:     12249 |      12249 |                           numpy.polynomial.hermite
+import time:     15883 |      15883 |                           numpy.polynomial.hermite_e
+import time:     20997 |      20997 |                           numpy.polynomial.laguerre
+import time:     57756 |     245905 |                         numpy.polynomial
+import time:     11659 |      11659 |                                   backports_abc
+import time:      8899 |      20558 |                                 numpy.random._common
+import time:       609 |        609 |                                     binascii
+import time:      1895 |       2503 |                                   base64
+import time:      6404 |       6404 |                                     _hashlib
+import time:       184 |        184 |                                       _blake2
+import time:      1554 |       1737 |                                     hashlib
+import time:      2119 |      10260 |                                   hmac
+import time:        96 |         96 |                                       _bisect
+import time:      1252 |       1347 |                                     bisect
+import time:       164 |        164 |                                     _random
+import time:       176 |        176 |                                     _sha512
+import time:      2855 |       4541 |                                   random
+import time:      1966 |      19268 |                                 secrets
+import time:      8364 |      48189 |                               numpy.random.bit_generator
+import time:      5773 |       5773 |                               numpy.random._bounded_integers
+import time:      6014 |       6014 |                               numpy.random._mt19937
+import time:      9760 |      69734 |                             numpy.random.mtrand
+import time:      7331 |       7331 |                             numpy.random._philox
+import time:      5862 |       5862 |                             numpy.random._pcg64
+import time:      5462 |       5462 |                             numpy.random._sfc64
+import time:      8031 |       8031 |                             numpy.random._generator
+import time:     22729 |     119147 |                           numpy.random._pickle
+import time:     23124 |     142271 |                         numpy.random
+import time:     20592 |      20592 |                         numpy.ctypeslib
+import time:     40900 |      40900 |                           numpy.ma.core
+import time:     26643 |      26643 |                           numpy.ma.extras
+import time:     31513 |      99055 |                         numpy.ma
+import time:     75854 |    2650852 |                       numpy
+import time:     22335 |    2673187 |                     numpy._core
+import time:     29059 |    2702245 |                   numpy._core._multiarray_umath
+import time:     22101 |    2724346 |                 ml_dtypes._ml_dtypes_ext
+import time:     53315 |    2777661 |               ml_dtypes._finfo
+import time:     15604 |      15604 |               ml_dtypes._iinfo
+import time:     93641 |    2886905 |             ml_dtypes
+import time:     62057 |      62057 |             jaxlib.xla_extension
+import time:     46707 |    3010102 |           jaxlib.xla_client
+import time:     31965 |      31965 |             jaxlib.cpu
+import time:     45309 |      45309 |             jaxlib.cpu._lapack
+import time:     28009 |     105282 |           jaxlib.lapack
+import time:       378 |        378 |             jaxlib.cuda
+import time:       468 |        846 |           jaxlib.cuda._versions
+import time:     40192 |      40192 |             jax_cuda12_plugin
+import time:     39973 |      80164 |           jax_cuda12_plugin._versions
+import time:      5772 |       5772 |             jaxlib.plugin_support
+import time:   1036132 |    1041903 |           jaxlib.gpu_solver
+import time:      7872 |       7872 |               jaxlib.mlir
+import time:    167478 |     167478 |                     jaxlib.mlir._mlir_libs._mlir
+import time:     22966 |     190443 |                   jaxlib.mlir._mlir_libs
+import time:       599 |     191041 |                 jaxlib.mlir._mlir_libs._mlir
+import time:       311 |     191352 |               jaxlib.mlir._mlir_libs._mlir.ir
+import time:     15499 |     214723 |             jaxlib.mlir.ir
+import time:      4822 |       4822 |                 jaxlib.mlir.dialects
+import time:      9680 |       9680 |                   jaxlib.mlir.dialects._ods_common
+import time:     20956 |      30636 |                 jaxlib.mlir.dialects._stablehlo_ops_gen
+import time:      6744 |       6744 |                 jaxlib.mlir._mlir_libs._stablehlo
+import time:     15548 |      57749 |               jaxlib.mlir.dialects.stablehlo
+import time:      8550 |      66299 |             jaxlib.hlo_helpers
+import time:     20744 |     301764 |           jaxlib.gpu_sparse
+import time:     26892 |      26892 |           jaxlib.gpu_prng
+import time:     14945 |      14945 |           jaxlib.gpu_linalg
+import time:      8409 |       8409 |             jaxlib.gpu_common_utils
+import time:     19412 |      27821 |           jaxlib.gpu_rnn
+import time:     18330 |      18330 |           jaxlib.gpu_triton
+import time:      7304 |       7304 |               jaxlib.mosaic
+import time:     17191 |      24495 |             jaxlib.mosaic.python
+import time:      8072 |       8072 |                 jaxlib.mosaic.dialect
+import time:     12479 |      20550 |               jaxlib.mosaic.dialect.gpu
+import time:     39792 |      60342 |             jaxlib.mosaic.dialect.gpu._mosaic_gpu_gen_ops
+import time:     33098 |      33098 |             jaxlib.mosaic.dialect.gpu._mosaic_gpu_gen_enums
+import time:     17182 |      17182 |             jaxlib.mlir._mlir_libs._mosaic_gpu_ext
+import time:     44715 |     179830 |           jaxlib.mosaic.python.mosaic_gpu
+import time:     38772 |      38772 |             jaxlib.mosaic.python._tpu_gen
+import time:     19006 |      19006 |             jaxlib.mlir._mlir_libs._tpu_ext
+import time:     35539 |      93316 |           jaxlib.mosaic.python.tpu
+import time:     52437 |      52437 |           nvidia
+import time:     33733 |      33733 |           nvidia.cuda_nvcc
+import time:     94038 |    5275093 |         jax._src.lib
+import time:     43893 |      43893 |         jax._src.logging_config
+import time:     58138 |    5399172 |       jax._src.config
+import time:      5142 |       5142 |         glob
+import time:     32810 |      37951 |       jax._src.hardware_utils
+import time:     89410 |    5582767 |     jax._src.cloud_tpu_init
+import time:      1474 |       1474 |     libtpu
+import time:     16083 |      16083 |             jax._src.basearray
+import time:     10237 |      26320 |           jax._src.typing
+import time:     14374 |      14374 |           jax._src.util
+import time:     40141 |      40141 |           jax._src.traceback_util
+import time:     19514 |     100348 |         jax._src.dtypes
+import time:     44949 |      44949 |         jax._src.effects
+import time:     45640 |      45640 |         jax._src.compute_on
+import time:       827 |        827 |                   _json
+import time:      1678 |       2504 |                 json.scanner
+import time:      1613 |       4117 |               json.decoder
+import time:      1159 |       1159 |               json.encoder
+import time:      1849 |       7124 |             json
+import time:       867 |        867 |                 importlib._abc
+import time:       675 |       1541 |               importlib.util
+import time:      1466 |       3007 |             pkgutil
+import time:     74826 |      74826 |                 jax._src.clusters.cluster
+import time:     44029 |      44029 |                 jax._src.clusters.ompi_cluster
+import time:     45845 |      45845 |                 jax._src.clusters.slurm_cluster
+import time:       664 |        664 |                     _socket
+import time:       283 |        283 |                     array
+import time:      6869 |       7815 |                   socket
+import time:     46749 |      54564 |                 jax._src.clusters.mpi4py_cluster
+import time:     52800 |      52800 |                 jax._src.clusters.cloud_tpu_cluster
+import time:     45723 |      45723 |                 jax._src.clusters.k8s_cluster
+import time:     98001 |     415785 |               jax._src.clusters
+import time:     53385 |     469170 |             jax._src.distributed
+import time:     83403 |      83403 |             jax_plugins
+import time:     60154 |     622856 |           jax._src.xla_bridge
+import time:     54548 |     677403 |         jax._src.mesh
+import time:     72161 |      72161 |         jax._src.partition_spec
+import time:     85965 |      85965 |         jax._src.errors
+import time:       208 |        208 |                 _heapq
+import time:      9637 |       9845 |               heapq
+import time:      9570 |      19414 |             difflib
+import time:     60664 |      80078 |           jax._src.tree_util
+import time:     58352 |     138430 |         jax._src.linear_util
+import time:     49579 |      49579 |           sysconfig
+import time:     10829 |      10829 |           _sysconfigdata__x86_64-linux-gnu
+import time:     61018 |     121425 |         jax._src.source_info_util
+import time:      9007 |       9007 |           colorama
+import time:     55120 |      64126 |         jax._src.pretty_printer

artifacts/twin_handover_packed_parallelization_20260309/run_logs/inspect_twin_packed_batch_handover_train.log ADDED Viewed

	@@ -0,0 +1,176 @@

+config_name: pi05_twin_handover_256_packed_baseline_pytorch_2k
+repo_id: lsnu/twin_handover_256_train
+sample_index: 0
+norm_stats_path: /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json
+norm_stats_keys: ['actions', 'state']
+norm_stats_lengths: state_mean=16 state_std=16 action_mean=16 action_std=16
+block_boundaries: [0:8] [8:16] [16:24] [24:32]
+raw_state_16d_shape: (16,)
+raw_state_16d:
+[ 7.1883e-07  1.7515e-01 -5.6890e-06 -8.7299e-01 -6.3130e-06  1.2216e+00
+  7.8540e-01  1.0000e+00  1.1957e-06  1.7514e-01 -9.2062e-07 -8.7312e-01
+  1.6098e-05  1.2216e+00  7.8539e-01  1.0000e+00]
+raw_actions_16d_shape: (16, 16)
+raw_actions_16d:
+[[ 2.3842e-05 -8.2493e-04 -5.7220e-05  3.9577e-04  2.8610e-05  7.8201e-04
+  -1.2398e-04  1.0000e+00  9.5367e-05  4.0293e-03  9.5367e-06  7.2479e-04
+   1.8120e-04 -1.4305e-05 -2.2411e-04  1.0000e+00]
+ [ 5.0068e-04 -1.5645e-02  2.6083e-03 -5.5575e-02  1.8883e-03  2.5430e-02
+  -1.9326e-02  1.0000e+00  2.7800e-02  2.4877e-02 -2.7924e-02 -2.7843e-02
+  -1.6832e-02  1.0629e-02  3.8543e-02  1.0000e+00]
+ [ 1.7738e-03 -7.6041e-02  8.9645e-03 -1.7257e-01  6.0558e-03  8.7943e-02
+  -6.4831e-02  1.0000e+00  9.2287e-02  5.8761e-02 -9.3136e-02 -7.6413e-02
+  -5.3630e-02  4.2353e-02  1.2606e-01  1.0000e+00]
+ [ 3.2425e-03 -1.3747e-01  1.5845e-02 -3.1527e-01  1.0653e-02  1.6477e-01
+  -1.1840e-01  1.0000e+00  1.7036e-01  1.0629e-01 -1.7153e-01 -1.4015e-01
+  -9.7461e-02  7.8468e-02  2.3009e-01  1.0000e+00]
+ [ 5.5885e-03 -2.1545e-01  2.4767e-02 -4.6663e-01  1.6103e-02  2.4452e-01
+  -1.7446e-01  1.0000e+00  2.5305e-01  1.5107e-01 -2.5392e-01 -2.1260e-01
+  -1.4490e-01  1.1766e-01  3.4122e-01  1.0000e+00]
+ [ 6.1035e-03 -2.8390e-01  3.3288e-02 -6.1909e-01  2.1739e-02  3.2683e-01
+  -2.3199e-01  1.0000e+00  3.3677e-01  1.9970e-01 -3.3804e-01 -2.8173e-01
+  -1.9161e-01  1.5831e-01  4.5282e-01  1.0000e+00]
+ [ 9.3937e-03 -3.1736e-01  3.8815e-02 -7.2264e-01  2.9097e-02  3.8407e-01
+  -2.9788e-01  1.0000e+00  3.9431e-01  2.3764e-01 -3.9650e-01 -3.2045e-01
+  -2.2884e-01  1.8487e-01  5.3961e-01  1.0000e+00]
+ [ 1.1177e-02 -3.3051e-01  4.2367e-02 -7.4072e-01  3.5295e-02  4.0234e-01
+  -3.4810e-01  1.0000e+00  4.1353e-01  2.4687e-01 -4.1600e-01 -3.4033e-01
+  -2.4390e-01  1.9067e-01  5.7513e-01  1.0000e+00]
+ [ 1.2674e-02 -3.1841e-01  4.3559e-02 -7.5366e-01  3.7665e-02  4.1035e-01
+  -3.7488e-01  1.0000e+00  4.2095e-01  2.5672e-01 -4.2238e-01 -3.4335e-01
+  -2.4950e-01  1.9567e-01  5.8634e-01  1.0000e+00]
+ [ 1.5645e-02 -3.0324e-01  4.3592e-02 -7.4167e-01  4.2624e-02  4.1367e-01
+  -4.1199e-01  1.0000e+00  4.2353e-01  2.6254e-01 -4.2444e-01 -3.4899e-01
+  -2.5064e-01  1.9762e-01  5.8977e-01  1.0000e+00]
+ [ 1.6398e-02 -2.9560e-01  4.2553e-02 -7.3503e-01  4.5595e-02  4.1383e-01
+  -4.3354e-01  1.0000e+00  4.2382e-01  2.5776e-01 -4.2612e-01 -3.5491e-01
+  -2.5177e-01  1.9462e-01  5.9134e-01  1.0000e+00]
+ [ 2.0757e-02 -2.9058e-01  4.2739e-02 -7.3133e-01  4.6840e-02  4.1339e-01
+  -4.5310e-01  1.0000e+00  4.2468e-01  2.5057e-01 -4.2498e-01 -3.4835e-01
+  -2.5149e-01  2.0029e-01  5.9138e-01  1.0000e+00]
+ [ 2.3303e-02 -2.7753e-01  4.1437e-02 -7.2254e-01  4.8075e-02  4.1380e-01
+  -4.7155e-01  1.0000e+00  4.2468e-01  2.5254e-01 -4.2522e-01 -3.4195e-01
+  -2.5130e-01  1.9623e-01  5.9127e-01  1.0000e+00]
+ [ 2.7924e-02 -2.5505e-01  4.0684e-02 -7.0069e-01  5.3768e-02  4.1076e-01
+  -5.1048e-01  1.0000e+00  4.2446e-01  2.5574e-01 -4.2656e-01 -3.5101e-01
+  -2.5181e-01  1.9645e-01  5.9101e-01  1.0000e+00]
+ [ 3.2401e-02 -2.4053e-01  4.1451e-02 -6.8364e-01  5.6882e-02  4.1132e-01
+  -5.4158e-01  1.0000e+00  4.2435e-01  2.5109e-01 -4.2632e-01 -3.5082e-01
+  -2.5095e-01  1.9805e-01  5.9107e-01  1.0000e+00]
+ [ 3.4809e-02 -2.2431e-01  4.0565e-02 -6.7288e-01  5.6076e-02  4.0839e-01
+  -5.6400e-01  1.0000e+00  4.2504e-01  2.5486e-01 -4.2588e-01 -3.4874e-01
+  -2.5139e-01  1.9783e-01  5.9183e-01  1.0000e+00]]
+normalized_state_16d_shape: (16,)
+normalized_state_16d:
+[-0.174   0.1055 -0.0061  1.0124  0.086  -0.4741  0.2016  1.0004  0.0951
+  0.0668  0.0549  1.0086 -0.053  -0.3299 -1.0068  1.0004]
+normalized_actions_16d_shape: (16, 16)
+normalized_actions_16d:
+[[-0.2378  0.0147  0.1124  0.1989  0.1562  0.1251  0.0182  1.0004  0.1108
+   0.0624  0.0823  0.9208  0.055  -0.5935 -0.7448  1.0004]
+ [-0.2367 -0.0063  0.1178  0.1174  0.1593  0.1567 -0.0046  1.0004  0.1686
+   0.107   0.02    0.7676  0.0127 -0.5697 -0.6371  1.0004]
+ [-0.2338 -0.092   0.1305 -0.0529  0.1664  0.2368 -0.0585  1.0004  0.303
+   0.1794 -0.1254  0.5072 -0.0788 -0.499  -0.3941  1.0004]
+ [-0.2306 -0.1792  0.1444 -0.2606  0.1742  0.3352 -0.1219  1.0004  0.4658
+   0.2811 -0.3003  0.1655 -0.1877 -0.4185 -0.1052  1.0004]
+ [-0.2253 -0.2898  0.1623 -0.4809  0.1834  0.4374 -0.1883  1.0004  0.6382
+   0.3768 -0.484  -0.223  -0.3056 -0.3311  0.2034  1.0004]
+ [-0.2242 -0.3869  0.1795 -0.7028  0.193   0.5429 -0.2564  1.0004  0.8128
+   0.4808 -0.6717 -0.5936 -0.4217 -0.2404  0.5133  1.0004]
+ [-0.2168 -0.4344  0.1906 -0.8535  0.2055  0.6163 -0.3344  1.0004  0.9328
+   0.5619 -0.8021 -0.8012 -0.5143 -0.1812  0.7543  1.0004]
+ [-0.2129 -0.4531  0.1977 -0.8798  0.216   0.6397 -0.3939  1.0004  0.9729
+   0.5816 -0.8455 -0.9078 -0.5517 -0.1682  0.8529  1.0004]
+ [-0.2095 -0.4359  0.2001 -0.8986  0.2201  0.6499 -0.4256  1.0004  0.9883
+   0.6027 -0.8598 -0.924  -0.5656 -0.1571  0.8841  1.0004]
+ [-0.2029 -0.4144  0.2002 -0.8812  0.2285  0.6542 -0.4695  1.0004  0.9937
+   0.6151 -0.8644 -0.9542 -0.5684 -0.1527  0.8936  1.0004]
+ [-0.2012 -0.4035  0.1981 -0.8715  0.2335  0.6544 -0.495   1.0004  0.9943
+   0.6049 -0.8681 -0.986  -0.5713 -0.1594  0.8979  1.0004]
+ [-0.1915 -0.3964  0.1985 -0.8661  0.2356  0.6538 -0.5182  1.0004  0.9961
+   0.5895 -0.8656 -0.9508 -0.5705 -0.1468  0.8981  1.0004]
+ [-0.1858 -0.3779  0.1959 -0.8533  0.2377  0.6544 -0.54    1.0004  0.9961
+   0.5937 -0.8661 -0.9165 -0.5701 -0.1558  0.8978  1.0004]
+ [-0.1755 -0.346   0.1944 -0.8215  0.2474  0.6505 -0.5861  1.0004  0.9956
+   0.6006 -0.8691 -0.9651 -0.5713 -0.1554  0.897   1.0004]
+ [-0.1655 -0.3254  0.1959 -0.7967  0.2527  0.6512 -0.623   1.0004  0.9954
+   0.5907 -0.8686 -0.9641 -0.5692 -0.1518  0.8972  1.0004]
+ [-0.1601 -0.3024  0.1941 -0.7811  0.2513  0.6474 -0.6495  1.0004  0.9969
+   0.5987 -0.8676 -0.9529 -0.5703 -0.1523  0.8993  1.0004]]
+packed_state_32d_shape: (32,)
+packed_state_32d:
+[-0.174   0.1055 -0.0061  1.0124  0.086  -0.4741  0.2016  1.0004  0.
+  0.      0.      0.      0.      0.      0.      0.      0.0951  0.0668
+  0.0549  1.0086 -0.053  -0.3299 -1.0068  1.0004  0.      0.      0.
+  0.      0.      0.      0.      0.    ]
+packed_actions_32d_shape: (16, 32)
+packed_actions_32d:
+[[-0.2378  0.0147  0.1124  0.1989  0.1562  0.1251  0.0182  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.1108  0.0624
+   0.0823  0.9208  0.055  -0.5935 -0.7448  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2367 -0.0063  0.1178  0.1174  0.1593  0.1567 -0.0046  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.1686  0.107
+   0.02    0.7676  0.0127 -0.5697 -0.6371  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2338 -0.092   0.1305 -0.0529  0.1664  0.2368 -0.0585  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.303   0.1794
+  -0.1254  0.5072 -0.0788 -0.499  -0.3941  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2306 -0.1792  0.1444 -0.2606  0.1742  0.3352 -0.1219  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.4658  0.2811
+  -0.3003  0.1655 -0.1877 -0.4185 -0.1052  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2253 -0.2898  0.1623 -0.4809  0.1834  0.4374 -0.1883  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.6382  0.3768
+  -0.484  -0.223  -0.3056 -0.3311  0.2034  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2242 -0.3869  0.1795 -0.7028  0.193   0.5429 -0.2564  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.8128  0.4808
+  -0.6717 -0.5936 -0.4217 -0.2404  0.5133  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2168 -0.4344  0.1906 -0.8535  0.2055  0.6163 -0.3344  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9328  0.5619
+  -0.8021 -0.8012 -0.5143 -0.1812  0.7543  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2129 -0.4531  0.1977 -0.8798  0.216   0.6397 -0.3939  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9729  0.5816
+  -0.8455 -0.9078 -0.5517 -0.1682  0.8529  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2095 -0.4359  0.2001 -0.8986  0.2201  0.6499 -0.4256  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9883  0.6027
+  -0.8598 -0.924  -0.5656 -0.1571  0.8841  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2029 -0.4144  0.2002 -0.8812  0.2285  0.6542 -0.4695  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9937  0.6151
+  -0.8644 -0.9542 -0.5684 -0.1527  0.8936  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2012 -0.4035  0.1981 -0.8715  0.2335  0.6544 -0.495   1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9943  0.6049
+  -0.8681 -0.986  -0.5713 -0.1594  0.8979  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.1915 -0.3964  0.1985 -0.8661  0.2356  0.6538 -0.5182  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9961  0.5895
+  -0.8656 -0.9508 -0.5705 -0.1468  0.8981  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.1858 -0.3779  0.1959 -0.8533  0.2377  0.6544 -0.54    1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9961  0.5937
+  -0.8661 -0.9165 -0.5701 -0.1558  0.8978  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.1755 -0.346   0.1944 -0.8215  0.2474  0.6505 -0.5861  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9956  0.6006
+  -0.8691 -0.9651 -0.5713 -0.1554  0.897   1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.1655 -0.3254  0.1959 -0.7967  0.2527  0.6512 -0.623   1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9954  0.5907
+  -0.8686 -0.9641 -0.5692 -0.1518  0.8972  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.1601 -0.3024  0.1941 -0.7811  0.2513  0.6474 -0.6495  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9969  0.5987
+  -0.8676 -0.9529 -0.5703 -0.1523  0.8993  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]]
+state_padded_zero_count: 16 / 16
+actions_padded_zero_count: 256 / 256
+state_padded_exact_zero: True
+actions_padded_exact_zero: True

artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20.log ADDED Viewed

	@@ -0,0 +1,241 @@

+W0308 22:58:43.681000 16356 torch/distributed/run.py:766]
+W0308 22:58:43.681000 16356 torch/distributed/run.py:766] *****************************************
+W0308 22:58:43.681000 16356 torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
+W0308 22:58:43.681000 16356 torch/distributed/run.py:766] *****************************************
+23:00:43.715 [I] Overwriting checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20 (16451:train_pytorch.py:451)
+23:00:43.718 [I] Created experiment checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20 (16451:train_pytorch.py:458)
+23:00:43.762 [I] Using batch size per GPU: 4 (total batch size across 4 GPUs: 16)                 (16451:train_pytorch.py:474)
+23:00:43.844 [I] Loaded norm stats from /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train (16451:config.py:234)
+23:00:43.846 [I] data_config: DataConfig(repo_id='lsnu/twin_handover_256_train', asset_id='lsnu/twin_handover_256_train', norm_stats={'state': NormStats(mean=array([ 0.40321857,  0.17899239, -0.07588876, -2.06326795, -0.46418607,
+        1.79356563,  0.70229131,  0.48194093,  0.93952829,  0.86693275,
+       -1.03168762, -1.9056077 , -0.53421056,  1.87584054,  2.36738205,
+        0.91249251]), std=array([0.73344636, 0.47653052, 0.72710407, 0.42399687, 0.63613892,
+       0.61144608, 1.11724186, 0.49967375, 0.86981195, 0.75071597,
+       0.90787333, 0.35008711, 0.51183224, 0.36600712, 0.56947577,
+       0.28257725]), q01=array([-1.52408956, -1.32446341, -1.91092197, -2.89885788, -1.66315554,
+        0.59010215, -2.27611645,  0.        , -1.77352981, -1.62131719,
+       -1.77092851, -2.19172778, -2.03159353,  0.55409113,  0.79255736,
+        0.        ]), q99=array([ 2.16638614,  1.38857444,  1.93436338, -0.88548369,  1.39976143,
+        2.99162304,  2.8194857 ,  0.9998    ,  1.46557211,  1.74660106,
+        1.58644652, -0.87876934,  2.25910752,  2.54628449,  2.89347284,
+        0.9998    ])), 'actions': NormStats(mean=array([ 0.05879939, -0.00704042, -0.02719213, -0.07685276, -0.07520971,
+       -0.00498583,  0.03577602,  0.48164892,  0.06564316,  0.06023132,
+       -0.10068271, -0.09547432, -0.0526481 ,  0.08205888,  0.13954687,
+        0.88333535]), std=array([0.18337056, 0.28128958, 0.18525195, 0.29767084, 0.22944973,
+       0.40312037, 0.3896611 , 0.49966311, 0.21938531, 0.16883859,
+       0.20206179, 0.14864719, 0.12629333, 0.15546791, 0.23423795,
+       0.32102022]), q01=array([-0.34140511, -0.71597991, -0.55301429, -0.8233152 , -0.68097536,
+       -0.87723451, -0.86000918,  0.        , -0.53261366, -0.49289397,
+       -0.48524564, -0.35752607, -0.42426748, -0.18230745, -0.09212705,
+        0.        ]), q99=array([0.55444025, 0.69361174, 0.44115428, 0.550829  , 0.49707318,
+       0.68353445, 0.82907713, 0.9998    , 0.42654409, 0.44255511,
+       0.4114292 , 0.01550327, 0.38038206, 0.71452535, 0.62808441,
+       0.9998    ]))}, repack_transforms=Group(inputs=[RepackTransform(structure={'images': {'cam_high': 'front_image', 'cam_left_wrist': 'wrist_left_image', 'cam_right_wrist': 'wrist_right_image'}, 'state': 'state', 'actions': 'action', 'prompt': 'task'})], outputs=()), data_transforms=Group(inputs=[AlohaInputs(adapt_to_pi=False)], outputs=[]), model_transforms=Group(inputs=[InjectDefaultPrompt(prompt=None), ResizeImages(height=224, width=224), TokenizePrompt(tokenizer=<openpi.models.tokenizer.PaligemmaTokenizer object at 0x702ed02c29d0>, discrete_state_input=True), PackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))], outputs=[UnpackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))]), use_quantile_norm=True, action_sequence_keys=('action',), prompt_from_task=False, rlds_data_dir=None, action_space=None, datasets=()) (16451:data_loader.py:282)
+23:00:43.849 [I] Using existing local LeRobot dataset mirror for lsnu/twin_handover_256_train: /workspace/lerobot/lsnu/twin_handover_256_train (16451:data_loader.py:148)
+23:00:43.958 [I] Overwriting checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20 (16454:train_pytorch.py:451)
+23:00:43.959 [I] Created experiment checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20 (16454:train_pytorch.py:458)
+23:00:43.959 [I] Using batch size per GPU: 4 (total batch size across 4 GPUs: 16)                 (16454:train_pytorch.py:474)
+23:00:44.046 [I] Loaded norm stats from /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train (16454:config.py:234)
+23:00:44.048 [I] data_config: DataConfig(repo_id='lsnu/twin_handover_256_train', asset_id='lsnu/twin_handover_256_train', norm_stats={'state': NormStats(mean=array([ 0.40321857,  0.17899239, -0.07588876, -2.06326795, -0.46418607,
+        1.79356563,  0.70229131,  0.48194093,  0.93952829,  0.86693275,
+       -1.03168762, -1.9056077 , -0.53421056,  1.87584054,  2.36738205,
+        0.91249251]), std=array([0.73344636, 0.47653052, 0.72710407, 0.42399687, 0.63613892,
+       0.61144608, 1.11724186, 0.49967375, 0.86981195, 0.75071597,
+       0.90787333, 0.35008711, 0.51183224, 0.36600712, 0.56947577,
+       0.28257725]), q01=array([-1.52408956, -1.32446341, -1.91092197, -2.89885788, -1.66315554,
+        0.59010215, -2.27611645,  0.        , -1.77352981, -1.62131719,
+       -1.77092851, -2.19172778, -2.03159353,  0.55409113,  0.79255736,
+        0.        ]), q99=array([ 2.16638614,  1.38857444,  1.93436338, -0.88548369,  1.39976143,
+        2.99162304,  2.8194857 ,  0.9998    ,  1.46557211,  1.74660106,
+        1.58644652, -0.87876934,  2.25910752,  2.54628449,  2.89347284,
+        0.9998    ])), 'actions': NormStats(mean=array([ 0.05879939, -0.00704042, -0.02719213, -0.07685276, -0.07520971,
+       -0.00498583,  0.03577602,  0.48164892,  0.06564316,  0.06023132,
+       -0.10068271, -0.09547432, -0.0526481 ,  0.08205888,  0.13954687,
+        0.88333535]), std=array([0.18337056, 0.28128958, 0.18525195, 0.29767084, 0.22944973,
+       0.40312037, 0.3896611 , 0.49966311, 0.21938531, 0.16883859,
+       0.20206179, 0.14864719, 0.12629333, 0.15546791, 0.23423795,
+       0.32102022]), q01=array([-0.34140511, -0.71597991, -0.55301429, -0.8233152 , -0.68097536,
+       -0.87723451, -0.86000918,  0.        , -0.53261366, -0.49289397,
+       -0.48524564, -0.35752607, -0.42426748, -0.18230745, -0.09212705,
+        0.        ]), q99=array([0.55444025, 0.69361174, 0.44115428, 0.550829  , 0.49707318,
+       0.68353445, 0.82907713, 0.9998    , 0.42654409, 0.44255511,
+       0.4114292 , 0.01550327, 0.38038206, 0.71452535, 0.62808441,
+       0.9998    ]))}, repack_transforms=Group(inputs=[RepackTransform(structure={'images': {'cam_high': 'front_image', 'cam_left_wrist': 'wrist_left_image', 'cam_right_wrist': 'wrist_right_image'}, 'state': 'state', 'actions': 'action', 'prompt': 'task'})], outputs=()), data_transforms=Group(inputs=[AlohaInputs(adapt_to_pi=False)], outputs=[]), model_transforms=Group(inputs=[InjectDefaultPrompt(prompt=None), ResizeImages(height=224, width=224), TokenizePrompt(tokenizer=<openpi.models.tokenizer.PaligemmaTokenizer object at 0x79acff7466d0>, discrete_state_input=True), PackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))], outputs=[UnpackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))]), use_quantile_norm=True, action_sequence_keys=('action',), prompt_from_task=False, rlds_data_dir=None, action_space=None, datasets=()) (16454:data_loader.py:282)
+23:00:44.049 [I] Using existing local LeRobot dataset mirror for lsnu/twin_handover_256_train: /workspace/lerobot/lsnu/twin_handover_256_train (16454:data_loader.py:148)
+23:00:45.456 [I] Overwriting checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20 (16452:train_pytorch.py:451)
+23:00:45.458 [I] Created experiment checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20 (16452:train_pytorch.py:458)
+23:00:45.458 [I] Using batch size per GPU: 4 (total batch size across 4 GPUs: 16)                 (16452:train_pytorch.py:474)
+23:00:45.548 [I] Loaded norm stats from /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train (16452:config.py:234)
+23:00:45.549 [I] data_config: DataConfig(repo_id='lsnu/twin_handover_256_train', asset_id='lsnu/twin_handover_256_train', norm_stats={'state': NormStats(mean=array([ 0.40321857,  0.17899239, -0.07588876, -2.06326795, -0.46418607,
+        1.79356563,  0.70229131,  0.48194093,  0.93952829,  0.86693275,
+       -1.03168762, -1.9056077 , -0.53421056,  1.87584054,  2.36738205,
+        0.91249251]), std=array([0.73344636, 0.47653052, 0.72710407, 0.42399687, 0.63613892,
+       0.61144608, 1.11724186, 0.49967375, 0.86981195, 0.75071597,
+       0.90787333, 0.35008711, 0.51183224, 0.36600712, 0.56947577,
+       0.28257725]), q01=array([-1.52408956, -1.32446341, -1.91092197, -2.89885788, -1.66315554,
+        0.59010215, -2.27611645,  0.        , -1.77352981, -1.62131719,
+       -1.77092851, -2.19172778, -2.03159353,  0.55409113,  0.79255736,
+        0.        ]), q99=array([ 2.16638614,  1.38857444,  1.93436338, -0.88548369,  1.39976143,
+        2.99162304,  2.8194857 ,  0.9998    ,  1.46557211,  1.74660106,
+        1.58644652, -0.87876934,  2.25910752,  2.54628449,  2.89347284,
+        0.9998    ])), 'actions': NormStats(mean=array([ 0.05879939, -0.00704042, -0.02719213, -0.07685276, -0.07520971,
+       -0.00498583,  0.03577602,  0.48164892,  0.06564316,  0.06023132,
+       -0.10068271, -0.09547432, -0.0526481 ,  0.08205888,  0.13954687,
+        0.88333535]), std=array([0.18337056, 0.28128958, 0.18525195, 0.29767084, 0.22944973,
+       0.40312037, 0.3896611 , 0.49966311, 0.21938531, 0.16883859,
+       0.20206179, 0.14864719, 0.12629333, 0.15546791, 0.23423795,
+       0.32102022]), q01=array([-0.34140511, -0.71597991, -0.55301429, -0.8233152 , -0.68097536,
+       -0.87723451, -0.86000918,  0.        , -0.53261366, -0.49289397,
+       -0.48524564, -0.35752607, -0.42426748, -0.18230745, -0.09212705,
+        0.        ]), q99=array([0.55444025, 0.69361174, 0.44115428, 0.550829  , 0.49707318,
+       0.68353445, 0.82907713, 0.9998    , 0.42654409, 0.44255511,
+       0.4114292 , 0.01550327, 0.38038206, 0.71452535, 0.62808441,
+       0.9998    ]))}, repack_transforms=Group(inputs=[RepackTransform(structure={'images': {'cam_high': 'front_image', 'cam_left_wrist': 'wrist_left_image', 'cam_right_wrist': 'wrist_right_image'}, 'state': 'state', 'actions': 'action', 'prompt': 'task'})], outputs=()), data_transforms=Group(inputs=[AlohaInputs(adapt_to_pi=False)], outputs=[]), model_transforms=Group(inputs=[InjectDefaultPrompt(prompt=None), ResizeImages(height=224, width=224), TokenizePrompt(tokenizer=<openpi.models.tokenizer.PaligemmaTokenizer object at 0x7736f700ba90>, discrete_state_input=True), PackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))], outputs=[UnpackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))]), use_quantile_norm=True, action_sequence_keys=('action',), prompt_from_task=False, rlds_data_dir=None, action_space=None, datasets=()) (16452:data_loader.py:282)
+23:00:45.551 [I] Using existing local LeRobot dataset mirror for lsnu/twin_handover_256_train: /workspace/lerobot/lsnu/twin_handover_256_train (16452:data_loader.py:148)
+23:00:45.562 [I] local_batch_size: 4                                                              (16451:data_loader.py:363)
+23:00:45.861 [I] local_batch_size: 4                                                              (16454:data_loader.py:363)
+23:00:47.007 [I] local_batch_size: 4                                                              (16452:data_loader.py:363)
+23:00:47.287 [I] Overwriting checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20 (16453:train_pytorch.py:451)
+23:00:47.290 [I] Created experiment checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20 (16453:train_pytorch.py:458)
+23:00:47.291 [I] Using batch size per GPU: 4 (total batch size across 4 GPUs: 16)                 (16453:train_pytorch.py:474)
+INFO:2026-03-08 23:00:47,419:jax._src.xla_bridge:925: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
+23:00:47.419 [I] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' (16454:xla_bridge.py:925)
+INFO:2026-03-08 23:00:47,435:jax._src.xla_bridge:925: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
+23:00:47.435 [I] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory (16454:xla_bridge.py:925)
+23:00:47.437 [I] Loaded norm stats from /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train (16453:config.py:234)
+23:00:47.440 [I] data_config: DataConfig(repo_id='lsnu/twin_handover_256_train', asset_id='lsnu/twin_handover_256_train', norm_stats={'state': NormStats(mean=array([ 0.40321857,  0.17899239, -0.07588876, -2.06326795, -0.46418607,
+        1.79356563,  0.70229131,  0.48194093,  0.93952829,  0.86693275,
+       -1.03168762, -1.9056077 , -0.53421056,  1.87584054,  2.36738205,
+        0.91249251]), std=array([0.73344636, 0.47653052, 0.72710407, 0.42399687, 0.63613892,
+       0.61144608, 1.11724186, 0.49967375, 0.86981195, 0.75071597,
+       0.90787333, 0.35008711, 0.51183224, 0.36600712, 0.56947577,
+       0.28257725]), q01=array([-1.52408956, -1.32446341, -1.91092197, -2.89885788, -1.66315554,
+        0.59010215, -2.27611645,  0.        , -1.77352981, -1.62131719,
+       -1.77092851, -2.19172778, -2.03159353,  0.55409113,  0.79255736,
+        0.        ]), q99=array([ 2.16638614,  1.38857444,  1.93436338, -0.88548369,  1.39976143,
+        2.99162304,  2.8194857 ,  0.9998    ,  1.46557211,  1.74660106,
+        1.58644652, -0.87876934,  2.25910752,  2.54628449,  2.89347284,
+        0.9998    ])), 'actions': NormStats(mean=array([ 0.05879939, -0.00704042, -0.02719213, -0.07685276, -0.07520971,
+       -0.00498583,  0.03577602,  0.48164892,  0.06564316,  0.06023132,
+       -0.10068271, -0.09547432, -0.0526481 ,  0.08205888,  0.13954687,
+        0.88333535]), std=array([0.18337056, 0.28128958, 0.18525195, 0.29767084, 0.22944973,
+       0.40312037, 0.3896611 , 0.49966311, 0.21938531, 0.16883859,
+       0.20206179, 0.14864719, 0.12629333, 0.15546791, 0.23423795,
+       0.32102022]), q01=array([-0.34140511, -0.71597991, -0.55301429, -0.8233152 , -0.68097536,
+       -0.87723451, -0.86000918,  0.        , -0.53261366, -0.49289397,
+       -0.48524564, -0.35752607, -0.42426748, -0.18230745, -0.09212705,
+        0.        ]), q99=array([0.55444025, 0.69361174, 0.44115428, 0.550829  , 0.49707318,
+       0.68353445, 0.82907713, 0.9998    , 0.42654409, 0.44255511,
+       0.4114292 , 0.01550327, 0.38038206, 0.71452535, 0.62808441,
+       0.9998    ]))}, repack_transforms=Group(inputs=[RepackTransform(structure={'images': {'cam_high': 'front_image', 'cam_left_wrist': 'wrist_left_image', 'cam_right_wrist': 'wrist_right_image'}, 'state': 'state', 'actions': 'action', 'prompt': 'task'})], outputs=()), data_transforms=Group(inputs=[AlohaInputs(adapt_to_pi=False)], outputs=[]), model_transforms=Group(inputs=[InjectDefaultPrompt(prompt=None), ResizeImages(height=224, width=224), TokenizePrompt(tokenizer=<openpi.models.tokenizer.PaligemmaTokenizer object at 0x728778855290>, discrete_state_input=True), PackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))], outputs=[UnpackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))]), use_quantile_norm=True, action_sequence_keys=('action',), prompt_from_task=False, rlds_data_dir=None, action_space=None, datasets=()) (16453:data_loader.py:282)
+23:00:47.459 [I] Using existing local LeRobot dataset mirror for lsnu/twin_handover_256_train: /workspace/lerobot/lsnu/twin_handover_256_train (16453:data_loader.py:148)
+INFO:2026-03-08 23:00:47,514:jax._src.xla_bridge:925: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
+23:00:47.514 [I] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' (16451:xla_bridge.py:925)
+INFO:2026-03-08 23:00:47,530:jax._src.xla_bridge:925: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
+23:00:47.530 [I] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory (16451:xla_bridge.py:925)
+INFO:2026-03-08 23:00:48,755:jax._src.xla_bridge:925: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
+23:00:48.755 [I] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' (16452:xla_bridge.py:925)
+INFO:2026-03-08 23:00:48,768:jax._src.xla_bridge:925: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
+23:00:48.768 [I] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory (16452:xla_bridge.py:925)
+23:00:49.029 [I] local_batch_size: 4                                                              (16453:data_loader.py:363)
+INFO:2026-03-08 23:00:49,834:jax._src.xla_bridge:925: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
+23:00:49.834 [I] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' (16453:xla_bridge.py:925)
+INFO:2026-03-08 23:00:49,836:jax._src.xla_bridge:925: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
+23:00:49.836 [I] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory (16453:xla_bridge.py:925)
+23:01:43.138 [I] Enabled gradient checkpointing for PI0Pytorch model                              (16451:pi0_pytorch.py:148)
+23:01:43.139 [I] Enabled gradient checkpointing for memory optimization                           (16451:train_pytorch.py:535)
+23:01:43.139 [I] Step 0 (after_model_creation): GPU memory - allocated: 7.47GB, reserved: 7.48GB, free: 0.01GB, peak_allocated: 7.47GB, peak_reserved: 7.48GB | DDP: rank=0, world_size=4 (16451:train_pytorch.py:422)
+23:01:43.801 [I] Enabled gradient checkpointing for PI0Pytorch model                              (16454:pi0_pytorch.py:148)
+23:01:43.802 [I] Enabled gradient checkpointing for memory optimization                           (16454:train_pytorch.py:535)
+23:01:44.623 [I] Enabled gradient checkpointing for PI0Pytorch model                              (16452:pi0_pytorch.py:148)
+23:01:44.623 [I] Enabled gradient checkpointing for memory optimization                           (16452:train_pytorch.py:535)
+23:01:45.354 [I] Enabled gradient checkpointing for PI0Pytorch model                              (16453:pi0_pytorch.py:148)
+23:01:45.354 [I] Enabled gradient checkpointing for memory optimization                           (16453:train_pytorch.py:535)
+23:01:46.643 [I] Loading weights from: /workspace/checkpoints/pi05_base_single_pytorch            (16451:train_pytorch.py:564)
+23:01:46.648 [I] Loading weights from: /workspace/checkpoints/pi05_base_single_pytorch            (16454:train_pytorch.py:564)
+23:01:46.648 [I] Loading weights from: /workspace/checkpoints/pi05_base_single_pytorch            (16453:train_pytorch.py:564)
+23:01:46.648 [I] Loading weights from: /workspace/checkpoints/pi05_base_single_pytorch            (16452:train_pytorch.py:564)
+23:01:48.714 [I] Weight loading missing key count: 0                                              (16451:train_pytorch.py:572)
+23:01:48.714 [I] Weight loading missing keys: set()                                               (16451:train_pytorch.py:573)
+23:01:48.715 [I] Weight loading unexpected key count: 0                                           (16451:train_pytorch.py:574)
+23:01:48.715 [I] Weight loading unexpected keys: []                                               (16451:train_pytorch.py:575)
+23:01:48.715 [I] Loaded PyTorch weights from /workspace/checkpoints/pi05_base_single_pytorch      (16451:train_pytorch.py:576)
+23:01:48.722 [I] Running on: 9e9e564d5d6e | world_size=4                                          (16451:train_pytorch.py:616)
+23:01:48.722 [I] Training config: batch_size=16, effective_batch_size=4, num_train_steps=20       (16451:train_pytorch.py:617)
+23:01:48.723 [I] Memory optimizations: gradient_checkpointing=True                                (16451:train_pytorch.py:620)
+23:01:48.724 [I] LR schedule: warmup=200, peak_lr=2.50e-05, decay_steps=2000, end_lr=2.50e-06     (16451:train_pytorch.py:621)
+23:01:48.724 [I] Optimizer: AdamW, weight_decay=1e-10, clip_norm=1.0                              (16451:train_pytorch.py:624)
+23:01:48.724 [I] EMA is not supported for PyTorch training                                        (16451:train_pytorch.py:627)
+23:01:48.725 [I] Training precision: bfloat16                                                     (16451:train_pytorch.py:628)
+23:01:48.733 [I] Resolved config name: pi05_twin_handover_256_packed_baseline_pytorch_2k          (16451:train_pytorch.py:234)
+23:01:48.733 [I] Dataset repo_id: lsnu/twin_handover_256_train                                    (16451:train_pytorch.py:235)
+23:01:48.733 [I] Norm-stats file path: /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json (16451:train_pytorch.py:236)
+23:01:48.734 [I] Norm-stats summary: {'keys': ['actions', 'state'], 'state_mean_len': 16, 'state_std_len': 16, 'actions_mean_len': 16, 'actions_std_len': 16} (16451:train_pytorch.py:237)
+23:01:48.734 [I] Checkpoint source path: /workspace/checkpoints/pi05_base_single_pytorch          (16451:train_pytorch.py:238)
+23:01:48.734 [I] Model type: baseline                                                             (16451:train_pytorch.py:239)
+23:01:48.734 [I] Packed transforms active: True                                                   (16451:train_pytorch.py:240)
+23:01:48.734 [I] World size: 4                                                                    (16451:train_pytorch.py:241)
+23:01:48.735 [I] Batch size: local=4, global=16                                                   (16451:train_pytorch.py:242)
+23:01:48.735 [I] num_workers: 8                                                                   (16451:train_pytorch.py:243)
+23:01:48.735 [I] Precision: bfloat16                                                              (16451:train_pytorch.py:244)
+23:01:48.735 [I] LR schedule summary: warmup_steps=200, peak_lr=2.50e-05, decay_steps=2000, decay_lr=2.50e-06 (16451:train_pytorch.py:245)
+23:01:48.736 [I] Save/log intervals: save_interval=250, log_interval=10                           (16451:train_pytorch.py:252)
+23:01:48.736 [I] Action-loss mask: (1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) (16451:train_pytorch.py:253)
+23:01:48.736 [I] Active mask dims: [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]       (16451:train_pytorch.py:254)
+23:01:48.736 [I] Masked dims: [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]      (16451:train_pytorch.py:255)
+23:01:48.822 [I] Weight loading missing key count: 0                                              (16453:train_pytorch.py:572)
+23:01:48.822 [I] Weight loading missing keys: set()                                               (16454:train_pytorch.py:573)
+23:01:48.823 [I] Weight loading missing keys: set()                                               (16453:train_pytorch.py:573)
+23:01:48.823 [I] Weight loading unexpected key count: 0                                           (16454:train_pytorch.py:574)
+23:01:48.823 [I] Weight loading missing key count: 0                                              (16452:train_pytorch.py:572)
+23:01:48.823 [I] Weight loading unexpected key count: 0                                           (16453:train_pytorch.py:574)
+23:01:48.823 [I] Weight loading unexpected keys: []                                               (16454:train_pytorch.py:575)
+23:01:48.823 [I] Weight loading missing keys: set()                                               (16452:train_pytorch.py:573)
+23:01:48.824 [I] Weight loading unexpected keys: []                                               (16453:train_pytorch.py:575)
+23:01:48.824 [I] Loaded PyTorch weights from /workspace/checkpoints/pi05_base_single_pytorch      (16454:train_pytorch.py:576)
+23:01:48.824 [I] Weight loading unexpected key count: 0                                           (16452:train_pytorch.py:574)
+23:01:48.824 [I] Loaded PyTorch weights from /workspace/checkpoints/pi05_base_single_pytorch      (16453:train_pytorch.py:576)
+23:01:48.825 [I] Weight loading unexpected keys: []                                               (16452:train_pytorch.py:575)
+23:01:48.825 [I] Loaded PyTorch weights from /workspace/checkpoints/pi05_base_single_pytorch      (16452:train_pytorch.py:576)
+W0308 23:06:44.622000 16356 torch/distributed/elastic/agent/server/api.py:719] Received 15 death signal, shutting down workers
+W0308 23:06:44.645000 16356 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 16451 closing signal SIGTERM
+W0308 23:06:44.659000 16356 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 16452 closing signal SIGTERM
+W0308 23:06:44.679000 16356 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 16453 closing signal SIGTERM
+W0308 23:06:44.728000 16356 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 16454 closing signal SIGTERM
+/usr/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 14 leaked semaphore objects to clean up at shutdown
+  warnings.warn('resource_tracker: There appear to be %d '
+/usr/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 14 leaked semaphore objects to clean up at shutdown
+  warnings.warn('resource_tracker: There appear to be %d '
+/usr/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 14 leaked semaphore objects to clean up at shutdown
+  warnings.warn('resource_tracker: There appear to be %d '
+/usr/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 14 leaked semaphore objects to clean up at shutdown
+  warnings.warn('resource_tracker: There appear to be %d '
+Traceback (most recent call last):
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/bin/torchrun", line 10, in <module>
+    sys.exit(main())
+             ^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
+    return f(*args, **kwargs)
+           ^^^^^^^^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in main
+    run(args)
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/run.py", line 883, in run
+    elastic_launch(
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
+    return launch_agent(self._config, self._entrypoint, list(args))
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
+    result = agent.run()
+             ^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 138, in wrapper
+    result = f(*args, **kwargs)
+             ^^^^^^^^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 711, in run
+    result = self._invoke_run(role)
+             ^^^^^^^^^^^^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 870, in _invoke_run
+    time.sleep(monitor_interval)
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 84, in _terminate_process_handler
+    raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
+torch.distributed.elastic.multiprocessing.api.SignalException: Process 16356 got signal: 15

artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20b.log ADDED Viewed

File without changes

artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20d.log ADDED Viewed

	@@ -0,0 +1,34 @@

+W0308 23:09:45.070000 19958 torch/distributed/run.py:766]
+W0308 23:09:45.070000 19958 torch/distributed/run.py:766] *****************************************
+W0308 23:09:45.070000 19958 torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
+W0308 23:09:45.070000 19958 torch/distributed/run.py:766] *****************************************
+W0308 23:12:25.090000 19958 torch/distributed/elastic/agent/server/api.py:719] Received 15 death signal, shutting down workers
+W0308 23:12:25.147000 19958 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 20051 closing signal SIGTERM
+Traceback (most recent call last):
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/bin/torchrun", line 10, in <module>
+    sys.exit(main())
+             ^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
+    return f(*args, **kwargs)
+           ^^^^^^^^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in main
+    run(args)
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/run.py", line 883, in run
+    elastic_launch(
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
+    return launch_agent(self._config, self._entrypoint, list(args))
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
+    result = agent.run()
+             ^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 138, in wrapper
+    result = f(*args, **kwargs)
+             ^^^^^^^^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 711, in run
+    result = self._invoke_run(role)
+             ^^^^^^^^^^^^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 870, in _invoke_run
+    time.sleep(monitor_interval)
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 84, in _terminate_process_handler
+    raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
+torch.distributed.elastic.multiprocessing.api.SignalException: Process 19958 got signal: 15

artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20e.log ADDED Viewed

	@@ -0,0 +1,34 @@

+W0308 23:13:16.278000 20146 torch/distributed/run.py:766]
+W0308 23:13:16.278000 20146 torch/distributed/run.py:766] *****************************************
+W0308 23:13:16.278000 20146 torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
+W0308 23:13:16.278000 20146 torch/distributed/run.py:766] *****************************************
+W0308 23:15:58.203000 20146 torch/distributed/elastic/agent/server/api.py:719] Received 15 death signal, shutting down workers
+W0308 23:15:58.263000 20146 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 20244 closing signal SIGTERM
+Traceback (most recent call last):
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/bin/torchrun", line 10, in <module>
+    sys.exit(main())
+             ^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
+    return f(*args, **kwargs)
+           ^^^^^^^^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in main
+    run(args)
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/run.py", line 883, in run
+    elastic_launch(
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
+    return launch_agent(self._config, self._entrypoint, list(args))
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
+    result = agent.run()
+             ^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 138, in wrapper
+    result = f(*args, **kwargs)
+             ^^^^^^^^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 711, in run
+    result = self._invoke_run(role)
+             ^^^^^^^^^^^^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 870, in _invoke_run
+    time.sleep(monitor_interval)
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 84, in _terminate_process_handler
+    raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
+torch.distributed.elastic.multiprocessing.api.SignalException: Process 20146 got signal: 15

artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20k.log ADDED Viewed

	@@ -0,0 +1,234 @@

+W0308 23:45:59.171000 25558 torch/distributed/run.py:766]
+W0308 23:45:59.171000 25558 torch/distributed/run.py:766] *****************************************
+W0308 23:45:59.171000 25558 torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
+W0308 23:45:59.171000 25558 torch/distributed/run.py:766] *****************************************
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+[rank1]:[W308 23:48:06.218806836 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 1]  using GPU 1 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+[rank3]:[W308 23:48:09.583585113 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 3]  using GPU 3 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
+23:48:18.157 [I] Created experiment checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20k (25643:train_pytorch.py:478)
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+[rank0]:[W308 23:48:18.631390841 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 0]  using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+[rank2]:[W308 23:48:20.490054230 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 2]  using GPU 2 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
+23:48:21.532 [I] Using batch size per GPU: 4 (total batch size across 4 GPUs: 16)                 (25643:train_pytorch.py:497)
+23:48:21.656 [I] Loaded norm stats from /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train (25643:config.py:234)
+23:48:21.658 [I] data_config: DataConfig(repo_id='lsnu/twin_handover_256_train', asset_id='lsnu/twin_handover_256_train', norm_stats={'state': NormStats(mean=array([ 0.40321857,  0.17899239, -0.07588876, -2.06326795, -0.46418607,
+        1.79356563,  0.70229131,  0.48194093,  0.93952829,  0.86693275,
+       -1.03168762, -1.9056077 , -0.53421056,  1.87584054,  2.36738205,
+        0.91249251]), std=array([0.73344636, 0.47653052, 0.72710407, 0.42399687, 0.63613892,
+       0.61144608, 1.11724186, 0.49967375, 0.86981195, 0.75071597,
+       0.90787333, 0.35008711, 0.51183224, 0.36600712, 0.56947577,
+       0.28257725]), q01=array([-1.52408956, -1.32446341, -1.91092197, -2.89885788, -1.66315554,
+        0.59010215, -2.27611645,  0.        , -1.77352981, -1.62131719,
+       -1.77092851, -2.19172778, -2.03159353,  0.55409113,  0.79255736,
+        0.        ]), q99=array([ 2.16638614,  1.38857444,  1.93436338, -0.88548369,  1.39976143,
+        2.99162304,  2.8194857 ,  0.9998    ,  1.46557211,  1.74660106,
+        1.58644652, -0.87876934,  2.25910752,  2.54628449,  2.89347284,
+        0.9998    ])), 'actions': NormStats(mean=array([ 0.05879939, -0.00704042, -0.02719213, -0.07685276, -0.07520971,
+       -0.00498583,  0.03577602,  0.48164892,  0.06564316,  0.06023132,
+       -0.10068271, -0.09547432, -0.0526481 ,  0.08205888,  0.13954687,
+        0.88333535]), std=array([0.18337056, 0.28128958, 0.18525195, 0.29767084, 0.22944973,
+       0.40312037, 0.3896611 , 0.49966311, 0.21938531, 0.16883859,
+       0.20206179, 0.14864719, 0.12629333, 0.15546791, 0.23423795,
+       0.32102022]), q01=array([-0.34140511, -0.71597991, -0.55301429, -0.8233152 , -0.68097536,
+       -0.87723451, -0.86000918,  0.        , -0.53261366, -0.49289397,
+       -0.48524564, -0.35752607, -0.42426748, -0.18230745, -0.09212705,
+        0.        ]), q99=array([0.55444025, 0.69361174, 0.44115428, 0.550829  , 0.49707318,
+       0.68353445, 0.82907713, 0.9998    , 0.42654409, 0.44255511,
+       0.4114292 , 0.01550327, 0.38038206, 0.71452535, 0.62808441,
+       0.9998    ]))}, repack_transforms=Group(inputs=[RepackTransform(structure={'images': {'cam_high': 'front_image', 'cam_left_wrist': 'wrist_left_image', 'cam_right_wrist': 'wrist_right_image'}, 'state': 'state', 'actions': 'action', 'prompt': 'task'})], outputs=()), data_transforms=Group(inputs=[AlohaInputs(adapt_to_pi=False)], outputs=[]), model_transforms=Group(inputs=[InjectDefaultPrompt(prompt=None), ResizeImages(height=224, width=224), TokenizePrompt(tokenizer=<openpi.models.tokenizer.PaligemmaTokenizer object at 0x7ded44f10710>, discrete_state_input=True), PackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))], outputs=[UnpackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))]), use_quantile_norm=True, action_sequence_keys=('action',), prompt_from_task=False, rlds_data_dir=None, action_space=None, datasets=()) (25643:data_loader.py:283)
+23:48:21.665 [I] Using existing local LeRobot dataset mirror for lsnu/twin_handover_256_train: /workspace/lerobot/lsnu/twin_handover_256_train (25643:data_loader.py:149)
+23:48:27.988 [I] local_batch_size: 4                                                              (25643:data_loader.py:364)
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+23:50:52.339 [I] Enabled gradient checkpointing for PI0Pytorch model                              (25643:pi0_pytorch.py:150)
+23:50:52.344 [I] Enabled gradient checkpointing for memory optimization                           (25643:train_pytorch.py:569)
+23:50:52.345 [I] Step 0 (after_model_creation): GPU memory - allocated: 7.47GB, reserved: 7.48GB, free: 0.01GB, peak_allocated: 7.47GB, peak_reserved: 7.48GB | DDP: rank=0, world_size=4 (25643:train_pytorch.py:438)
+23:51:03.555 [I] Loading weights from: /workspace/checkpoints/pi05_base_single_pytorch            (25643:train_pytorch.py:598)
+23:51:05.643 [I] Weight loading missing key count: 0                                              (25643:train_pytorch.py:606)
+23:51:05.643 [I] Weight loading missing keys: set()                                               (25643:train_pytorch.py:607)
+23:51:05.643 [I] Weight loading unexpected key count: 0                                           (25643:train_pytorch.py:608)
+23:51:05.644 [I] Weight loading unexpected keys: []                                               (25643:train_pytorch.py:609)
+23:51:05.644 [I] Loaded PyTorch weights from /workspace/checkpoints/pi05_base_single_pytorch      (25643:train_pytorch.py:610)
+23:51:05.647 [I] Running on: 9e9e564d5d6e | world_size=4                                          (25643:train_pytorch.py:650)
+23:51:05.648 [I] Training config: batch_size=16, effective_batch_size=4, num_train_steps=20       (25643:train_pytorch.py:651)
+23:51:05.648 [I] Memory optimizations: gradient_checkpointing=True                                (25643:train_pytorch.py:654)
+23:51:05.648 [I] LR schedule: warmup=200, peak_lr=2.50e-05, decay_steps=2000, end_lr=2.50e-06     (25643:train_pytorch.py:655)
+23:51:05.649 [I] Optimizer: AdamW, weight_decay=1e-10, clip_norm=1.0                              (25643:train_pytorch.py:658)
+23:51:05.649 [I] EMA is not supported for PyTorch training                                        (25643:train_pytorch.py:661)
+23:51:05.650 [I] Training precision: bfloat16                                                     (25643:train_pytorch.py:662)
+23:51:05.671 [I] Resolved config name: pi05_twin_handover_256_packed_baseline_pytorch_2k          (25643:train_pytorch.py:249)
+23:51:05.671 [I] Dataset repo_id: lsnu/twin_handover_256_train                                    (25643:train_pytorch.py:250)
+23:51:05.672 [I] Norm-stats file path: /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json (25643:train_pytorch.py:251)
+23:51:05.672 [I] Norm-stats summary: {'keys': ['actions', 'state'], 'state_mean_len': 16, 'state_std_len': 16, 'actions_mean_len': 16, 'actions_std_len': 16} (25643:train_pytorch.py:252)
+23:51:05.673 [I] Checkpoint source path: /workspace/checkpoints/pi05_base_single_pytorch          (25643:train_pytorch.py:253)
+23:51:05.673 [I] Model type: baseline                                                             (25643:train_pytorch.py:254)
+23:51:05.674 [I] Packed transforms active: True                                                   (25643:train_pytorch.py:255)
+23:51:05.674 [I] World size: 4                                                                    (25643:train_pytorch.py:256)
+23:51:05.674 [I] Batch size: local=4, global=16                                                   (25643:train_pytorch.py:257)
+23:51:05.674 [I] num_workers: 8                                                                   (25643:train_pytorch.py:258)
+23:51:05.675 [I] Precision: bfloat16                                                              (25643:train_pytorch.py:259)
+23:51:05.675 [I] LR schedule summary: warmup_steps=200, peak_lr=2.50e-05, decay_steps=2000, decay_lr=2.50e-06 (25643:train_pytorch.py:260)
+23:51:05.676 [I] Save/log intervals: save_interval=250, log_interval=10                           (25643:train_pytorch.py:267)
+23:51:05.676 [I] Action-loss mask: (1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) (25643:train_pytorch.py:268)
+23:51:05.676 [I] Active mask dims: [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]       (25643:train_pytorch.py:269)
+23:51:05.677 [I] Masked dims: [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]      (25643:train_pytorch.py:270)
+  self.pid = os.fork()
+/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
+  self.pid = os.fork()
+/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
+  self.pid = os.fork()
+/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
+  self.pid = os.fork()
+23:51:12.079 [I] debug_step=1 observation.state shape=(4, 32) dtype=torch.float64 actions shape=(4, 16, 32) dtype=torch.float32 (25643:train_pytorch.py:762)
+23:51:12.080 [I] debug_step=1 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (25643:train_pytorch.py:766)
+23:51:12.080 [I] debug_step=1 prompt_token_lengths=[74, 72, 76, 78]                               (25643:train_pytorch.py:769)
+23:51:12.080 [I] debug_step=1 state_stats min=-1.0000 max=1.0004 mean=0.0715 std=0.4362           (25643:train_pytorch.py:770)
+23:51:12.080 [I] debug_step=1 action_stats min=-1.0000 max=1.0947 mean=0.0331 std=0.4134          (25643:train_pytorch.py:773)
+23:51:12.092 [I] debug_step=1 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (25643:train_pytorch.py:776)
+23:51:12.221 [I] debug_step=1 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (25643:train_pytorch.py:780)
+23:51:12.222 [I] debug_step=1 lr=1.24e-07 grad_norm=6.6952 data_time=2.5702s step_time=3.8197s gpu_mem_allocated=28.49GB gpu_mem_reserved=35.24GB gpu_mem_max_allocated=35.23GB gpu_mem_max_reserved=35.24GB (25643:train_pytorch.py:785)
+[rank3]:   File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 862, in <module>
+[rank3]:     main()
+[rank3]:   File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 858, in main
+[rank3]:     train_loop(config)
+[rank3]:   File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 703, in train_loop
+[rank3]:     losses = model(observation, actions)
+[rank3]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank3]:   File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
+[rank3]:     return self._call_impl(*args, **kwargs)
+[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank3]:   File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
+[rank3]:     return forward_call(*args, **kwargs)
+[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank3]:   File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1633, in forward
+[rank3]:     inputs, kwargs = self._pre_forward(*inputs, **kwargs)
+[rank3]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank3]:   File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1522, in _pre_forward
+[rank3]:     if torch.is_grad_enabled() and self.reducer._rebuild_buckets():
+[rank3]:                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank3]: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
+[rank3]: making sure all `forward` function outputs participate in calculating loss.
+[rank3]: If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
+[rank3]: Parameter indices which did not receive grad for rank 3: 596 597 598 599 601 602 803
+[rank3]:  In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error
+[rank1]: Traceback (most recent call last):
+[rank1]:   File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 862, in <module>
+[rank1]:     main()
+[rank1]:   File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 858, in main
+[rank1]:     train_loop(config)
+[rank1]:   File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 703, in train_loop
+[rank1]:     losses = model(observation, actions)
+[rank1]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank1]:   File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
+[rank1]:     return self._call_impl(*args, **kwargs)
+[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank1]:   File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
+[rank1]:     return forward_call(*args, **kwargs)
+[rank1]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank1]:   File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1633, in forward
+[rank1]:     inputs, kwargs = self._pre_forward(*inputs, **kwargs)
+[rank1]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank1]:   File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1522, in _pre_forward
+[rank1]:     if torch.is_grad_enabled() and self.reducer._rebuild_buckets():
+[rank1]:                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank1]: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
+[rank1]: making sure all `forward` function outputs participate in calculating loss.
+[rank1]: If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
+[rank1]: Parameter indices which did not receive grad for rank 1: 596 597 598 599 601 602 803
+[rank1]:  In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error
+[rank2]: Traceback (most recent call last):
+[rank2]:   File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 862, in <module>
+[rank2]:     main()
+[rank2]:   File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 858, in main
+[rank2]:     train_loop(config)
+[rank2]:   File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 703, in train_loop
+[rank2]:     losses = model(observation, actions)
+[rank2]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank2]:   File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
+[rank2]:     return self._call_impl(*args, **kwargs)
+[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank2]:   File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
+[rank2]:     return forward_call(*args, **kwargs)
+[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank2]:   File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1633, in forward
+[rank2]:     inputs, kwargs = self._pre_forward(*inputs, **kwargs)
+[rank2]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank2]:   File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1522, in _pre_forward
+[rank2]:     if torch.is_grad_enabled() and self.reducer._rebuild_buckets():
+[rank2]:                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank2]: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
+[rank2]: making sure all `forward` function outputs participate in calculating loss.
+[rank2]: If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
+[rank2]: Parameter indices which did not receive grad for rank 2: 596 597 598 599 601 602 803
+[rank2]:  In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error
+[rank0]: Traceback (most recent call last):
+[rank0]:   File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 862, in <module>
+[rank0]:     main()
+[rank0]:   File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 858, in main
+[rank0]:     train_loop(config)
+[rank0]:   File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 703, in train_loop
+[rank0]:     losses = model(observation, actions)
+[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank0]:   File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
+[rank0]:     return self._call_impl(*args, **kwargs)
+[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank0]:   File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
+[rank0]:     return forward_call(*args, **kwargs)
+[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank0]:   File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1633, in forward
+[rank0]:     inputs, kwargs = self._pre_forward(*inputs, **kwargs)
+[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank0]:   File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1522, in _pre_forward
+[rank0]:     if torch.is_grad_enabled() and self.reducer._rebuild_buckets():
+[rank0]:                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+[rank0]: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
+[rank0]: making sure all `forward` function outputs participate in calculating loss.
+[rank0]: If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
+[rank0]: Parameter indices which did not receive grad for rank 0: 596 597 598 599 601 602 803
+[rank0]:  In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error
+[rank0]:[W308 23:51:13.598698202 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
+W0308 23:51:15.249000 25558 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 25644 closing signal SIGTERM
+W0308 23:51:15.305000 25558 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 25645 closing signal SIGTERM
+W0308 23:51:15.328000 25558 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 25646 closing signal SIGTERM
+E0308 23:51:16.314000 25558 torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 25643) of binary: /workspace/pi05tests-openpi-multiarm/openpi/.venv/bin/python
+Traceback (most recent call last):
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/bin/torchrun", line 10, in <module>
+    sys.exit(main())
+             ^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
+    return f(*args, **kwargs)
+           ^^^^^^^^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in main
+    run(args)
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/run.py", line 883, in run
+    elastic_launch(
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
+    return launch_agent(self._config, self._entrypoint, list(args))
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 270, in launch_agent
+    raise ChildFailedError(
+torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
+============================================================
+scripts/train_pytorch.py FAILED
+------------------------------------------------------------
+Failures:
+  <NO_OTHER_FAILURES>
+------------------------------------------------------------
+Root Cause (first observed failure):
+[0]:
+  time      : 2026-03-08_23:51:15
+  host      : 9e9e564d5d6e
+  rank      : 0 (local_rank: 0)
+  exitcode  : 1 (pid: 25643)
+  error_file: <N/A>
+  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
+============================================================

artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20l.log ADDED Viewed

	@@ -0,0 +1,141 @@

+W0308 23:57:51.073000 28870 torch/distributed/run.py:766]
+W0308 23:57:51.073000 28870 torch/distributed/run.py:766] *****************************************
+W0308 23:57:51.073000 28870 torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
+W0308 23:57:51.073000 28870 torch/distributed/run.py:766] *****************************************
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+[rank1]:[W309 00:00:38.424269437 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 1]  using GPU 1 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+[rank2]:[W309 00:00:39.886552746 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 2]  using GPU 2 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+[rank3]:[W309 00:00:48.235773018 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 3]  using GPU 3 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
+00:00:50.394 [I] Created experiment checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20l (28954:train_pytorch.py:478)
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+[rank0]:[W309 00:00:50.868996725 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 0]  using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
+00:00:52.168 [I] Using batch size per GPU: 4 (total batch size across 4 GPUs: 16)                 (28954:train_pytorch.py:497)
+00:00:52.345 [I] Loaded norm stats from /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train (28954:config.py:234)
+00:00:52.350 [I] data_config: DataConfig(repo_id='lsnu/twin_handover_256_train', asset_id='lsnu/twin_handover_256_train', norm_stats={'state': NormStats(mean=array([ 0.40321857,  0.17899239, -0.07588876, -2.06326795, -0.46418607,
+        1.79356563,  0.70229131,  0.48194093,  0.93952829,  0.86693275,
+       -1.03168762, -1.9056077 , -0.53421056,  1.87584054,  2.36738205,
+        0.91249251]), std=array([0.73344636, 0.47653052, 0.72710407, 0.42399687, 0.63613892,
+       0.61144608, 1.11724186, 0.49967375, 0.86981195, 0.75071597,
+       0.90787333, 0.35008711, 0.51183224, 0.36600712, 0.56947577,
+       0.28257725]), q01=array([-1.52408956, -1.32446341, -1.91092197, -2.89885788, -1.66315554,
+        0.59010215, -2.27611645,  0.        , -1.77352981, -1.62131719,
+       -1.77092851, -2.19172778, -2.03159353,  0.55409113,  0.79255736,
+        0.        ]), q99=array([ 2.16638614,  1.38857444,  1.93436338, -0.88548369,  1.39976143,
+        2.99162304,  2.8194857 ,  0.9998    ,  1.46557211,  1.74660106,
+        1.58644652, -0.87876934,  2.25910752,  2.54628449,  2.89347284,
+        0.9998    ])), 'actions': NormStats(mean=array([ 0.05879939, -0.00704042, -0.02719213, -0.07685276, -0.07520971,
+       -0.00498583,  0.03577602,  0.48164892,  0.06564316,  0.06023132,
+       -0.10068271, -0.09547432, -0.0526481 ,  0.08205888,  0.13954687,
+        0.88333535]), std=array([0.18337056, 0.28128958, 0.18525195, 0.29767084, 0.22944973,
+       0.40312037, 0.3896611 , 0.49966311, 0.21938531, 0.16883859,
+       0.20206179, 0.14864719, 0.12629333, 0.15546791, 0.23423795,
+       0.32102022]), q01=array([-0.34140511, -0.71597991, -0.55301429, -0.8233152 , -0.68097536,
+       -0.87723451, -0.86000918,  0.        , -0.53261366, -0.49289397,
+       -0.48524564, -0.35752607, -0.42426748, -0.18230745, -0.09212705,
+        0.        ]), q99=array([0.55444025, 0.69361174, 0.44115428, 0.550829  , 0.49707318,
+       0.68353445, 0.82907713, 0.9998    , 0.42654409, 0.44255511,
+       0.4114292 , 0.01550327, 0.38038206, 0.71452535, 0.62808441,
+       0.9998    ]))}, repack_transforms=Group(inputs=[RepackTransform(structure={'images': {'cam_high': 'front_image', 'cam_left_wrist': 'wrist_left_image', 'cam_right_wrist': 'wrist_right_image'}, 'state': 'state', 'actions': 'action', 'prompt': 'task'})], outputs=()), data_transforms=Group(inputs=[AlohaInputs(adapt_to_pi=False)], outputs=[]), model_transforms=Group(inputs=[InjectDefaultPrompt(prompt=None), ResizeImages(height=224, width=224), TokenizePrompt(tokenizer=<openpi.models.tokenizer.PaligemmaTokenizer object at 0x7fceff4c7710>, discrete_state_input=True), PackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))], outputs=[UnpackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))]), use_quantile_norm=True, action_sequence_keys=('action',), prompt_from_task=False, rlds_data_dir=None, action_space=None, datasets=()) (28954:data_loader.py:283)
+00:00:52.360 [I] Using existing local LeRobot dataset mirror for lsnu/twin_handover_256_train: /workspace/lerobot/lsnu/twin_handover_256_train (28954:data_loader.py:149)
+00:00:59.307 [I] local_batch_size: 4                                                              (28954:data_loader.py:364)
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+00:02:31.673 [I] Enabled gradient checkpointing for PI0Pytorch model                              (28954:pi0_pytorch.py:150)
+00:02:31.680 [I] Enabled gradient checkpointing for memory optimization                           (28954:train_pytorch.py:569)
+00:02:31.681 [I] Step 0 (after_model_creation): GPU memory - allocated: 7.47GB, reserved: 7.48GB, free: 0.01GB, peak_allocated: 7.47GB, peak_reserved: 7.48GB | DDP: rank=0, world_size=4 (28954:train_pytorch.py:438)
+00:02:46.133 [I] Loading weights from: /workspace/checkpoints/pi05_base_single_pytorch            (28954:train_pytorch.py:598)
+00:02:48.254 [I] Weight loading missing key count: 0                                              (28954:train_pytorch.py:606)
+00:02:48.254 [I] Weight loading missing keys: set()                                               (28954:train_pytorch.py:607)
+00:02:48.255 [I] Weight loading unexpected key count: 0                                           (28954:train_pytorch.py:608)
+00:02:48.255 [I] Weight loading unexpected keys: []                                               (28954:train_pytorch.py:609)
+00:02:48.255 [I] Loaded PyTorch weights from /workspace/checkpoints/pi05_base_single_pytorch      (28954:train_pytorch.py:610)
+00:02:48.259 [I] Running on: 9e9e564d5d6e | world_size=4                                          (28954:train_pytorch.py:650)
+00:02:48.259 [I] Training config: batch_size=16, effective_batch_size=4, num_train_steps=20       (28954:train_pytorch.py:651)
+/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
+  self.pid = os.fork()
+/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
+  self.pid = os.fork()
+00:02:48.260 [I] Memory optimizations: gradient_checkpointing=True                                (28954:train_pytorch.py:654)
+00:02:48.261 [I] DDP settings: find_unused_parameters=False, gradient_as_bucket_view=True, static_graph=True (28954:train_pytorch.py:655)
+00:02:48.261 [I] LR schedule: warmup=200, peak_lr=2.50e-05, decay_steps=2000, end_lr=2.50e-06     (28954:train_pytorch.py:656)
+/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
+  self.pid = os.fork()
+00:02:48.261 [I] Optimizer: AdamW, weight_decay=1e-10, clip_norm=1.0                              (28954:train_pytorch.py:659)
+00:02:48.262 [I] EMA is not supported for PyTorch training                                        (28954:train_pytorch.py:662)
+00:02:48.262 [I] Training precision: bfloat16                                                     (28954:train_pytorch.py:663)
+00:02:48.266 [I] Resolved config name: pi05_twin_handover_256_packed_baseline_pytorch_2k          (28954:train_pytorch.py:249)
+00:02:48.266 [I] Dataset repo_id: lsnu/twin_handover_256_train                                    (28954:train_pytorch.py:250)
+00:02:48.266 [I] Norm-stats file path: /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json (28954:train_pytorch.py:251)
+00:02:48.266 [I] Norm-stats summary: {'keys': ['actions', 'state'], 'state_mean_len': 16, 'state_std_len': 16, 'actions_mean_len': 16, 'actions_std_len': 16} (28954:train_pytorch.py:252)
+00:02:48.266 [I] Checkpoint source path: /workspace/checkpoints/pi05_base_single_pytorch          (28954:train_pytorch.py:253)
+00:02:48.267 [I] Model type: baseline                                                             (28954:train_pytorch.py:254)
+00:02:48.267 [I] Packed transforms active: True                                                   (28954:train_pytorch.py:255)
+00:02:48.267 [I] World size: 4                                                                    (28954:train_pytorch.py:256)
+00:02:48.267 [I] Batch size: local=4, global=16                                                   (28954:train_pytorch.py:257)
+00:02:48.267 [I] num_workers: 8                                                                   (28954:train_pytorch.py:258)
+00:02:48.267 [I] Precision: bfloat16                                                              (28954:train_pytorch.py:259)
+00:02:48.268 [I] LR schedule summary: warmup_steps=200, peak_lr=2.50e-05, decay_steps=2000, decay_lr=2.50e-06 (28954:train_pytorch.py:260)
+00:02:48.268 [I] Save/log intervals: save_interval=250, log_interval=10                           (28954:train_pytorch.py:267)
+00:02:48.268 [I] Action-loss mask: (1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) (28954:train_pytorch.py:268)
+00:02:48.268 [I] Active mask dims: [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]       (28954:train_pytorch.py:269)
+00:02:48.268 [I] Masked dims: [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]      (28954:train_pytorch.py:270)
+  self.pid = os.fork()
+00:02:51.626 [I] debug_step=1 observation.state shape=(4, 32) dtype=torch.float64 actions shape=(4, 16, 32) dtype=torch.float32 (28954:train_pytorch.py:763)
+00:02:51.627 [I] debug_step=1 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (28954:train_pytorch.py:767)
+00:02:51.627 [I] debug_step=1 prompt_token_lengths=[74, 72, 76, 78]                               (28954:train_pytorch.py:770)
+00:02:51.627 [I] debug_step=1 state_stats min=-1.0000 max=1.0004 mean=0.0715 std=0.4362           (28954:train_pytorch.py:771)
+00:02:51.627 [I] debug_step=1 action_stats min=-1.0000 max=1.0947 mean=0.0331 std=0.4134          (28954:train_pytorch.py:774)
+00:02:51.628 [I] debug_step=1 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (28954:train_pytorch.py:777)
+00:02:51.645 [I] debug_step=1 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (28954:train_pytorch.py:781)
+00:02:51.645 [I] debug_step=1 lr=1.24e-07 grad_norm=15.9656 data_time=1.1114s step_time=2.2178s gpu_mem_allocated=28.49GB gpu_mem_reserved=35.24GB gpu_mem_max_allocated=35.23GB gpu_mem_max_reserved=35.24GB (28954:train_pytorch.py:786)
+00:02:52.155 [I] debug_step=2 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (28954:train_pytorch.py:767)
+00:02:52.156 [I] debug_step=2 prompt_token_lengths=[79, 76, 69, 69]                               (28954:train_pytorch.py:770)
+00:02:52.157 [I] debug_step=2 state_stats min=-1.0000 max=1.0004 mean=0.0430 std=0.4223           (28954:train_pytorch.py:771)
+00:02:52.157 [I] debug_step=2 action_stats min=-1.0000 max=1.0071 mean=0.0532 std=0.4394          (28954:train_pytorch.py:774)
+00:02:52.158 [I] debug_step=2 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (28954:train_pytorch.py:777)
+00:02:52.159 [I] debug_step=2 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (28954:train_pytorch.py:781)
+00:02:52.159 [I] debug_step=2 lr=2.49e-07 grad_norm=7.5785 data_time=0.0858s step_time=0.4435s gpu_mem_allocated=28.49GB gpu_mem_reserved=35.24GB gpu_mem_max_allocated=35.23GB gpu_mem_max_reserved=35.24GB (28954:train_pytorch.py:786)
+00:02:52.947 [I] debug_step=3 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (28954:train_pytorch.py:767)
+00:02:52.948 [I] debug_step=3 prompt_token_lengths=[74, 68, 72, 73]                               (28954:train_pytorch.py:770)
+00:02:52.949 [I] debug_step=3 state_stats min=-1.1677 max=1.0004 mean=0.0099 std=0.5093           (28954:train_pytorch.py:771)
+00:02:52.949 [I] debug_step=3 action_stats min=-1.1487 max=1.1439 mean=0.0173 std=0.4079          (28954:train_pytorch.py:774)
+00:02:52.950 [I] debug_step=3 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (28954:train_pytorch.py:777)
+00:02:52.951 [I] debug_step=3 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (28954:train_pytorch.py:781)
+00:02:52.951 [I] debug_step=3 lr=3.73e-07 grad_norm=10.5944 data_time=0.1892s step_time=0.6031s gpu_mem_allocated=28.49GB gpu_mem_reserved=35.24GB gpu_mem_max_allocated=35.23GB gpu_mem_max_reserved=35.24GB (28954:train_pytorch.py:786)
+00:02:53.749 [I] debug_step=4 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (28954:train_pytorch.py:767)
+00:02:53.750 [I] debug_step=4 prompt_token_lengths=[75, 73, 76, 71]                               (28954:train_pytorch.py:770)
+00:02:53.750 [I] debug_step=4 state_stats min=-1.0000 max=1.0708 mean=0.0711 std=0.4551           (28954:train_pytorch.py:771)
+00:02:53.750 [I] debug_step=4 action_stats min=-1.0000 max=1.4460 mean=0.0674 std=0.4311          (28954:train_pytorch.py:774)
+00:02:53.751 [I] debug_step=4 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (28954:train_pytorch.py:777)
+00:02:53.752 [I] debug_step=4 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (28954:train_pytorch.py:781)
+00:02:53.752 [I] debug_step=4 lr=4.98e-07 grad_norm=13.1086 data_time=0.1977s step_time=0.6039s gpu_mem_allocated=28.49GB gpu_mem_reserved=35.24GB gpu_mem_max_allocated=35.23GB gpu_mem_max_reserved=35.24GB (28954:train_pytorch.py:786)
+00:02:54.234 [I] debug_step=5 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (28954:train_pytorch.py:767)
+00:02:54.234 [I] debug_step=5 prompt_token_lengths=[73, 75, 70, 73]                               (28954:train_pytorch.py:770)
+00:02:54.234 [I] debug_step=5 state_stats min=-1.0000 max=1.0004 mean=0.0188 std=0.4734           (28954:train_pytorch.py:771)
+00:02:54.235 [I] debug_step=5 action_stats min=-1.0000 max=1.0647 mean=0.0147 std=0.3985          (28954:train_pytorch.py:774)
+00:02:54.235 [I] debug_step=5 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (28954:train_pytorch.py:777)
+00:02:54.235 [I] debug_step=5 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (28954:train_pytorch.py:781)
+00:02:54.236 [I] debug_step=5 lr=6.22e-07 grad_norm=21.4053 data_time=0.0611s step_time=0.4238s gpu_mem_allocated=28.49GB gpu_mem_reserved=35.24GB gpu_mem_max_allocated=35.23GB gpu_mem_max_reserved=35.24GB (28954:train_pytorch.py:786)
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+00:04:31.529 [I] Saved checkpoint at step 20 -> /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20l/20 (28954:train_pytorch.py:323)
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once

artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_parallel_20a.log ADDED Viewed

	@@ -0,0 +1,141 @@

+W0309 00:05:58.586000 31870 torch/distributed/run.py:766]
+W0309 00:05:58.586000 31870 torch/distributed/run.py:766] *****************************************
+W0309 00:05:58.586000 31870 torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
+W0309 00:05:58.586000 31870 torch/distributed/run.py:766] *****************************************
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+[rank3]:[W309 00:07:35.438460211 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 3]  using GPU 3 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+[rank2]:[W309 00:07:38.377129614 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 2]  using GPU 2 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
+00:07:39.654 [I] Created experiment checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/smoke_handover_packed_parallel_20a (31952:train_pytorch.py:478)
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+[rank0]:[W309 00:07:39.073712842 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 0]  using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+[rank1]:[W309 00:07:43.016127248 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 1]  using GPU 1 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
+00:07:45.272 [I] Using batch size per GPU: 4 (total batch size across 4 GPUs: 16)                 (31952:train_pytorch.py:497)
+00:07:45.376 [I] Loaded norm stats from /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_parallel_pytorch_2k/lsnu/twin_handover_256_train (31952:config.py:234)
+00:07:45.378 [I] data_config: DataConfig(repo_id='lsnu/twin_handover_256_train', asset_id='lsnu/twin_handover_256_train', norm_stats={'state': NormStats(mean=array([ 0.40321857,  0.17899239, -0.07588876, -2.06326795, -0.46418607,
+        1.79356563,  0.70229131,  0.48194093,  0.93952829,  0.86693275,
+       -1.03168762, -1.9056077 , -0.53421056,  1.87584054,  2.36738205,
+        0.91249251]), std=array([0.73344636, 0.47653052, 0.72710407, 0.42399687, 0.63613892,
+       0.61144608, 1.11724186, 0.49967375, 0.86981195, 0.75071597,
+       0.90787333, 0.35008711, 0.51183224, 0.36600712, 0.56947577,
+       0.28257725]), q01=array([-1.52408956, -1.32446341, -1.91092197, -2.89885788, -1.66315554,
+        0.59010215, -2.27611645,  0.        , -1.77352981, -1.62131719,
+       -1.77092851, -2.19172778, -2.03159353,  0.55409113,  0.79255736,
+        0.        ]), q99=array([ 2.16638614,  1.38857444,  1.93436338, -0.88548369,  1.39976143,
+        2.99162304,  2.8194857 ,  0.9998    ,  1.46557211,  1.74660106,
+        1.58644652, -0.87876934,  2.25910752,  2.54628449,  2.89347284,
+        0.9998    ])), 'actions': NormStats(mean=array([ 0.05879939, -0.00704042, -0.02719213, -0.07685276, -0.07520971,
+       -0.00498583,  0.03577602,  0.48164892,  0.06564316,  0.06023132,
+       -0.10068271, -0.09547432, -0.0526481 ,  0.08205888,  0.13954687,
+        0.88333535]), std=array([0.18337056, 0.28128958, 0.18525195, 0.29767084, 0.22944973,
+       0.40312037, 0.3896611 , 0.49966311, 0.21938531, 0.16883859,
+       0.20206179, 0.14864719, 0.12629333, 0.15546791, 0.23423795,
+       0.32102022]), q01=array([-0.34140511, -0.71597991, -0.55301429, -0.8233152 , -0.68097536,
+       -0.87723451, -0.86000918,  0.        , -0.53261366, -0.49289397,
+       -0.48524564, -0.35752607, -0.42426748, -0.18230745, -0.09212705,
+        0.        ]), q99=array([0.55444025, 0.69361174, 0.44115428, 0.550829  , 0.49707318,
+       0.68353445, 0.82907713, 0.9998    , 0.42654409, 0.44255511,
+       0.4114292 , 0.01550327, 0.38038206, 0.71452535, 0.62808441,
+       0.9998    ]))}, repack_transforms=Group(inputs=[RepackTransform(structure={'images': {'cam_high': 'front_image', 'cam_left_wrist': 'wrist_left_image', 'cam_right_wrist': 'wrist_right_image'}, 'state': 'state', 'actions': 'action', 'prompt': 'task'})], outputs=()), data_transforms=Group(inputs=[AlohaInputs(adapt_to_pi=False)], outputs=[]), model_transforms=Group(inputs=[InjectDefaultPrompt(prompt=None), ResizeImages(height=224, width=224), TokenizePrompt(tokenizer=<openpi.models.tokenizer.PaligemmaTokenizer object at 0x70ac18e479d0>, discrete_state_input=True), PackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))], outputs=[UnpackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))]), use_quantile_norm=True, action_sequence_keys=('action',), prompt_from_task=False, rlds_data_dir=None, action_space=None, datasets=()) (31952:data_loader.py:283)
+00:07:45.381 [I] Using existing local LeRobot dataset mirror for lsnu/twin_handover_256_train: /workspace/lerobot/lsnu/twin_handover_256_train (31952:data_loader.py:149)
+00:07:51.404 [I] local_batch_size: 4                                                              (31952:data_loader.py:364)
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+00:09:48.120 [I] Enabled gradient checkpointing for PI0Pytorch model                              (31952:pi0_pytorch.py:150)
+00:09:48.121 [I] Enabled gradient checkpointing for memory optimization                           (31952:train_pytorch.py:569)
+00:09:48.122 [I] Step 0 (after_model_creation): GPU memory - allocated: 7.48GB, reserved: 7.48GB, free: 0.00GB, peak_allocated: 7.48GB, peak_reserved: 7.48GB | DDP: rank=0, world_size=4 (31952:train_pytorch.py:438)
+00:10:05.891 [I] Loading weights from: /workspace/checkpoints/pi05_base_parallel_packed_from_single (31952:train_pytorch.py:598)
+/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
+  self.pid = os.fork()
+/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
+  self.pid = os.fork()
+00:12:47.760 [I] Weight loading missing key count: 0                                              (31952:train_pytorch.py:606)
+00:12:47.761 [I] Weight loading missing keys: set()                                               (31952:train_pytorch.py:607)
+00:12:47.761 [I] Weight loading unexpected key count: 0                                           (31952:train_pytorch.py:608)
+00:12:47.761 [I] Weight loading unexpected keys: []                                               (31952:train_pytorch.py:609)
+00:12:47.762 [I] Loaded PyTorch weights from /workspace/checkpoints/pi05_base_parallel_packed_from_single (31952:train_pytorch.py:610)
+00:12:47.766 [I] Running on: 9e9e564d5d6e | world_size=4                                          (31952:train_pytorch.py:650)
+00:12:47.766 [I] Training config: batch_size=16, effective_batch_size=4, num_train_steps=20       (31952:train_pytorch.py:651)
+00:12:47.766 [I] Memory optimizations: gradient_checkpointing=True                                (31952:train_pytorch.py:654)
+00:12:47.766 [I] DDP settings: find_unused_parameters=False, gradient_as_bucket_view=True, static_graph=True (31952:train_pytorch.py:655)
+00:12:47.767 [I] LR schedule: warmup=200, peak_lr=2.50e-05, decay_steps=2000, end_lr=2.50e-06     (31952:train_pytorch.py:656)
+00:12:47.767 [I] Optimizer: AdamW, weight_decay=1e-10, clip_norm=1.0                              (31952:train_pytorch.py:659)
+00:12:47.767 [I] EMA is not supported for PyTorch training                                        (31952:train_pytorch.py:662)
+00:12:47.767 [I] Training precision: bfloat16                                                     (31952:train_pytorch.py:663)
+00:12:47.771 [I] Resolved config name: pi05_twin_handover_256_packed_parallel_pytorch_2k          (31952:train_pytorch.py:249)
+00:12:47.771 [I] Dataset repo_id: lsnu/twin_handover_256_train                                    (31952:train_pytorch.py:250)
+00:12:47.771 [I] Norm-stats file path: /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_parallel_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json (31952:train_pytorch.py:251)
+00:12:47.771 [I] Norm-stats summary: {'keys': ['actions', 'state'], 'state_mean_len': 16, 'state_std_len': 16, 'actions_mean_len': 16, 'actions_std_len': 16} (31952:train_pytorch.py:252)
+00:12:47.771 [I] Checkpoint source path: /workspace/checkpoints/pi05_base_parallel_packed_from_single (31952:train_pytorch.py:253)
+00:12:47.771 [I] Model type: parallel                                                             (31952:train_pytorch.py:254)
+00:12:47.771 [I] Packed transforms active: True                                                   (31952:train_pytorch.py:255)
+00:12:47.772 [I] World size: 4                                                                    (31952:train_pytorch.py:256)
+00:12:47.772 [I] Batch size: local=4, global=16                                                   (31952:train_pytorch.py:257)
+00:12:47.772 [I] num_workers: 8                                                                   (31952:train_pytorch.py:258)
+00:12:47.772 [I] Precision: bfloat16                                                              (31952:train_pytorch.py:259)
+00:12:47.772 [I] LR schedule summary: warmup_steps=200, peak_lr=2.50e-05, decay_steps=2000, decay_lr=2.50e-06 (31952:train_pytorch.py:260)
+00:12:47.772 [I] Save/log intervals: save_interval=250, log_interval=10                           (31952:train_pytorch.py:267)
+00:12:47.772 [I] Action-loss mask: (1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) (31952:train_pytorch.py:268)
+00:12:47.772 [I] Active mask dims: [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]       (31952:train_pytorch.py:269)
+00:12:47.772 [I] Masked dims: [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]      (31952:train_pytorch.py:270)
+  self.pid = os.fork()
+/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
+  self.pid = os.fork()
+00:12:51.535 [I] debug_step=1 observation.state shape=(4, 32) dtype=torch.float64 actions shape=(4, 16, 32) dtype=torch.float32 (31952:train_pytorch.py:763)
+00:12:51.536 [I] debug_step=1 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (31952:train_pytorch.py:767)
+00:12:51.536 [I] debug_step=1 prompt_token_lengths=[74, 72, 76, 78]                               (31952:train_pytorch.py:770)
+00:12:51.536 [I] debug_step=1 state_stats min=-1.0000 max=1.0004 mean=0.0715 std=0.4362           (31952:train_pytorch.py:771)
+00:12:51.536 [I] debug_step=1 action_stats min=-1.0000 max=1.0947 mean=0.0331 std=0.4134          (31952:train_pytorch.py:774)
+00:12:51.537 [I] debug_step=1 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (31952:train_pytorch.py:777)
+00:12:51.560 [I] debug_step=1 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (31952:train_pytorch.py:781)
+00:12:51.560 [I] debug_step=1 lr=1.24e-07 grad_norm=16.1250 data_time=1.1500s step_time=2.5752s gpu_mem_allocated=28.53GB gpu_mem_reserved=35.28GB gpu_mem_max_allocated=35.27GB gpu_mem_max_reserved=35.28GB (31952:train_pytorch.py:786)
+00:12:52.214 [I] debug_step=2 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (31952:train_pytorch.py:767)
+00:12:52.214 [I] debug_step=2 prompt_token_lengths=[79, 76, 69, 69]                               (31952:train_pytorch.py:770)
+00:12:52.214 [I] debug_step=2 state_stats min=-1.0000 max=1.0004 mean=0.0430 std=0.4223           (31952:train_pytorch.py:771)
+00:12:52.215 [I] debug_step=2 action_stats min=-1.0000 max=1.0071 mean=0.0532 std=0.4394          (31952:train_pytorch.py:774)
+00:12:52.215 [I] debug_step=2 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (31952:train_pytorch.py:777)
+00:12:52.216 [I] debug_step=2 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (31952:train_pytorch.py:781)
+00:12:52.216 [I] debug_step=2 lr=2.49e-07 grad_norm=7.6422 data_time=0.1756s step_time=0.5095s gpu_mem_allocated=28.53GB gpu_mem_reserved=35.28GB gpu_mem_max_allocated=35.27GB gpu_mem_max_reserved=35.28GB (31952:train_pytorch.py:786)
+00:12:52.866 [I] debug_step=3 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (31952:train_pytorch.py:767)
+00:12:52.867 [I] debug_step=3 prompt_token_lengths=[74, 68, 72, 73]                               (31952:train_pytorch.py:770)
+00:12:52.868 [I] debug_step=3 state_stats min=-1.1677 max=1.0004 mean=0.0099 std=0.5093           (31952:train_pytorch.py:771)
+00:12:52.868 [I] debug_step=3 action_stats min=-1.1487 max=1.1439 mean=0.0173 std=0.4079          (31952:train_pytorch.py:774)
+00:12:52.870 [I] debug_step=3 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (31952:train_pytorch.py:777)
+00:12:52.871 [I] debug_step=3 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (31952:train_pytorch.py:781)
+00:12:52.871 [I] debug_step=3 lr=3.73e-07 grad_norm=10.7104 data_time=0.1504s step_time=0.5022s gpu_mem_allocated=28.53GB gpu_mem_reserved=35.28GB gpu_mem_max_allocated=35.27GB gpu_mem_max_reserved=35.28GB (31952:train_pytorch.py:786)
+00:12:53.506 [I] debug_step=4 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (31952:train_pytorch.py:767)
+00:12:53.507 [I] debug_step=4 prompt_token_lengths=[75, 73, 76, 71]                               (31952:train_pytorch.py:770)
+00:12:53.507 [I] debug_step=4 state_stats min=-1.0000 max=1.0708 mean=0.0711 std=0.4551           (31952:train_pytorch.py:771)
+00:12:53.507 [I] debug_step=4 action_stats min=-1.0000 max=1.4460 mean=0.0674 std=0.4311          (31952:train_pytorch.py:774)
+00:12:53.508 [I] debug_step=4 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (31952:train_pytorch.py:777)
+00:12:53.509 [I] debug_step=4 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (31952:train_pytorch.py:781)
+00:12:53.509 [I] debug_step=4 lr=4.98e-07 grad_norm=13.2371 data_time=0.1376s step_time=0.5020s gpu_mem_allocated=28.53GB gpu_mem_reserved=35.28GB gpu_mem_max_allocated=35.27GB gpu_mem_max_reserved=35.28GB (31952:train_pytorch.py:786)
+00:12:54.201 [I] debug_step=5 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (31952:train_pytorch.py:767)
+00:12:54.202 [I] debug_step=5 prompt_token_lengths=[73, 75, 70, 73]                               (31952:train_pytorch.py:770)
+00:12:54.203 [I] debug_step=5 state_stats min=-1.0000 max=1.0004 mean=0.0188 std=0.4734           (31952:train_pytorch.py:771)
+00:12:54.203 [I] debug_step=5 action_stats min=-1.0000 max=1.0647 mean=0.0147 std=0.3985          (31952:train_pytorch.py:774)
+00:12:54.203 [I] debug_step=5 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (31952:train_pytorch.py:777)
+00:12:54.204 [I] debug_step=5 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (31952:train_pytorch.py:781)
+00:12:54.204 [I] debug_step=5 lr=6.22e-07 grad_norm=21.7693 data_time=0.1479s step_time=0.5475s gpu_mem_allocated=28.53GB gpu_mem_reserved=35.28GB gpu_mem_max_allocated=35.27GB gpu_mem_max_reserved=35.28GB (31952:train_pytorch.py:786)
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once
+00:14:36.586 [I] Saved checkpoint at step 20 -> /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/smoke_handover_packed_parallel_20a/20 (31952:train_pytorch.py:323)
+/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
+  warnings.warn(  # warn only once

artifacts/twin_handover_packed_parallelization_20260309/run_logs/twin_handover_followup.log ADDED Viewed

	@@ -0,0 +1,37 @@

+[2026-03-09 00:31:32 UTC] follow-up runner started
+[2026-03-09 00:31:32 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:32:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:33:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:34:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:35:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:36:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:37:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:38:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:39:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:40:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:41:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:42:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:43:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:44:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:45:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:46:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:47:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:48:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:49:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:50:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:51:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:52:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:53:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:54:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:55:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
+[2026-03-09 00:56:33 UTC] eval start config=pi05_twin_handover_256_packed_baseline_pytorch_2k ckpt=/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1000 batches=50
+[2026-03-09 01:01:47 UTC] eval done log=/workspace/run_logs/handover_packed_baseline_2k_val_1000.log
+[2026-03-09 01:01:47 UTC] eval start config=pi05_twin_handover_256_packed_baseline_pytorch_2k ckpt=/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/2000 batches=100
+[2026-03-09 01:07:06 UTC] eval done log=/workspace/run_logs/handover_packed_baseline_2k_val_2000.log
+[2026-03-09 01:07:06 UTC] launching parallel run
+[2026-03-09 01:42:23 UTC] parallel run finished
+[2026-03-09 01:42:23 UTC] eval start config=pi05_twin_handover_256_packed_parallel_pytorch_2k ckpt=/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1000 batches=50
+[2026-03-09 01:45:46 UTC] eval done log=/workspace/run_logs/handover_packed_parallel_2k_val_1000.log
+[2026-03-09 01:45:46 UTC] eval start config=pi05_twin_handover_256_packed_parallel_pytorch_2k ckpt=/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/2000 batches=100
+[2026-03-09 01:49:19 UTC] eval done log=/workspace/run_logs/handover_packed_parallel_2k_val_2000.log
+[2026-03-09 01:49:19 UTC] follow-up runner finished

artifacts/twin_handover_packed_parallelization_20260309/sanity_checks/inspect_twin_packed_batch_handover_train.log ADDED Viewed

	@@ -0,0 +1,176 @@

+config_name: pi05_twin_handover_256_packed_baseline_pytorch_2k
+repo_id: lsnu/twin_handover_256_train
+sample_index: 0
+norm_stats_path: /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json
+norm_stats_keys: ['actions', 'state']
+norm_stats_lengths: state_mean=16 state_std=16 action_mean=16 action_std=16
+block_boundaries: [0:8] [8:16] [16:24] [24:32]
+raw_state_16d_shape: (16,)
+raw_state_16d:
+[ 7.1883e-07  1.7515e-01 -5.6890e-06 -8.7299e-01 -6.3130e-06  1.2216e+00
+  7.8540e-01  1.0000e+00  1.1957e-06  1.7514e-01 -9.2062e-07 -8.7312e-01
+  1.6098e-05  1.2216e+00  7.8539e-01  1.0000e+00]
+raw_actions_16d_shape: (16, 16)
+raw_actions_16d:
+[[ 2.3842e-05 -8.2493e-04 -5.7220e-05  3.9577e-04  2.8610e-05  7.8201e-04
+  -1.2398e-04  1.0000e+00  9.5367e-05  4.0293e-03  9.5367e-06  7.2479e-04
+   1.8120e-04 -1.4305e-05 -2.2411e-04  1.0000e+00]
+ [ 5.0068e-04 -1.5645e-02  2.6083e-03 -5.5575e-02  1.8883e-03  2.5430e-02
+  -1.9326e-02  1.0000e+00  2.7800e-02  2.4877e-02 -2.7924e-02 -2.7843e-02
+  -1.6832e-02  1.0629e-02  3.8543e-02  1.0000e+00]
+ [ 1.7738e-03 -7.6041e-02  8.9645e-03 -1.7257e-01  6.0558e-03  8.7943e-02
+  -6.4831e-02  1.0000e+00  9.2287e-02  5.8761e-02 -9.3136e-02 -7.6413e-02
+  -5.3630e-02  4.2353e-02  1.2606e-01  1.0000e+00]
+ [ 3.2425e-03 -1.3747e-01  1.5845e-02 -3.1527e-01  1.0653e-02  1.6477e-01
+  -1.1840e-01  1.0000e+00  1.7036e-01  1.0629e-01 -1.7153e-01 -1.4015e-01
+  -9.7461e-02  7.8468e-02  2.3009e-01  1.0000e+00]
+ [ 5.5885e-03 -2.1545e-01  2.4767e-02 -4.6663e-01  1.6103e-02  2.4452e-01
+  -1.7446e-01  1.0000e+00  2.5305e-01  1.5107e-01 -2.5392e-01 -2.1260e-01
+  -1.4490e-01  1.1766e-01  3.4122e-01  1.0000e+00]
+ [ 6.1035e-03 -2.8390e-01  3.3288e-02 -6.1909e-01  2.1739e-02  3.2683e-01
+  -2.3199e-01  1.0000e+00  3.3677e-01  1.9970e-01 -3.3804e-01 -2.8173e-01
+  -1.9161e-01  1.5831e-01  4.5282e-01  1.0000e+00]
+ [ 9.3937e-03 -3.1736e-01  3.8815e-02 -7.2264e-01  2.9097e-02  3.8407e-01
+  -2.9788e-01  1.0000e+00  3.9431e-01  2.3764e-01 -3.9650e-01 -3.2045e-01
+  -2.2884e-01  1.8487e-01  5.3961e-01  1.0000e+00]
+ [ 1.1177e-02 -3.3051e-01  4.2367e-02 -7.4072e-01  3.5295e-02  4.0234e-01
+  -3.4810e-01  1.0000e+00  4.1353e-01  2.4687e-01 -4.1600e-01 -3.4033e-01
+  -2.4390e-01  1.9067e-01  5.7513e-01  1.0000e+00]
+ [ 1.2674e-02 -3.1841e-01  4.3559e-02 -7.5366e-01  3.7665e-02  4.1035e-01
+  -3.7488e-01  1.0000e+00  4.2095e-01  2.5672e-01 -4.2238e-01 -3.4335e-01
+  -2.4950e-01  1.9567e-01  5.8634e-01  1.0000e+00]
+ [ 1.5645e-02 -3.0324e-01  4.3592e-02 -7.4167e-01  4.2624e-02  4.1367e-01
+  -4.1199e-01  1.0000e+00  4.2353e-01  2.6254e-01 -4.2444e-01 -3.4899e-01
+  -2.5064e-01  1.9762e-01  5.8977e-01  1.0000e+00]
+ [ 1.6398e-02 -2.9560e-01  4.2553e-02 -7.3503e-01  4.5595e-02  4.1383e-01
+  -4.3354e-01  1.0000e+00  4.2382e-01  2.5776e-01 -4.2612e-01 -3.5491e-01
+  -2.5177e-01  1.9462e-01  5.9134e-01  1.0000e+00]
+ [ 2.0757e-02 -2.9058e-01  4.2739e-02 -7.3133e-01  4.6840e-02  4.1339e-01
+  -4.5310e-01  1.0000e+00  4.2468e-01  2.5057e-01 -4.2498e-01 -3.4835e-01
+  -2.5149e-01  2.0029e-01  5.9138e-01  1.0000e+00]
+ [ 2.3303e-02 -2.7753e-01  4.1437e-02 -7.2254e-01  4.8075e-02  4.1380e-01
+  -4.7155e-01  1.0000e+00  4.2468e-01  2.5254e-01 -4.2522e-01 -3.4195e-01
+  -2.5130e-01  1.9623e-01  5.9127e-01  1.0000e+00]
+ [ 2.7924e-02 -2.5505e-01  4.0684e-02 -7.0069e-01  5.3768e-02  4.1076e-01
+  -5.1048e-01  1.0000e+00  4.2446e-01  2.5574e-01 -4.2656e-01 -3.5101e-01
+  -2.5181e-01  1.9645e-01  5.9101e-01  1.0000e+00]
+ [ 3.2401e-02 -2.4053e-01  4.1451e-02 -6.8364e-01  5.6882e-02  4.1132e-01
+  -5.4158e-01  1.0000e+00  4.2435e-01  2.5109e-01 -4.2632e-01 -3.5082e-01
+  -2.5095e-01  1.9805e-01  5.9107e-01  1.0000e+00]
+ [ 3.4809e-02 -2.2431e-01  4.0565e-02 -6.7288e-01  5.6076e-02  4.0839e-01
+  -5.6400e-01  1.0000e+00  4.2504e-01  2.5486e-01 -4.2588e-01 -3.4874e-01
+  -2.5139e-01  1.9783e-01  5.9183e-01  1.0000e+00]]
+normalized_state_16d_shape: (16,)
+normalized_state_16d:
+[-0.174   0.1055 -0.0061  1.0124  0.086  -0.4741  0.2016  1.0004  0.0951
+  0.0668  0.0549  1.0086 -0.053  -0.3299 -1.0068  1.0004]
+normalized_actions_16d_shape: (16, 16)
+normalized_actions_16d:
+[[-0.2378  0.0147  0.1124  0.1989  0.1562  0.1251  0.0182  1.0004  0.1108
+   0.0624  0.0823  0.9208  0.055  -0.5935 -0.7448  1.0004]
+ [-0.2367 -0.0063  0.1178  0.1174  0.1593  0.1567 -0.0046  1.0004  0.1686
+   0.107   0.02    0.7676  0.0127 -0.5697 -0.6371  1.0004]
+ [-0.2338 -0.092   0.1305 -0.0529  0.1664  0.2368 -0.0585  1.0004  0.303
+   0.1794 -0.1254  0.5072 -0.0788 -0.499  -0.3941  1.0004]
+ [-0.2306 -0.1792  0.1444 -0.2606  0.1742  0.3352 -0.1219  1.0004  0.4658
+   0.2811 -0.3003  0.1655 -0.1877 -0.4185 -0.1052  1.0004]
+ [-0.2253 -0.2898  0.1623 -0.4809  0.1834  0.4374 -0.1883  1.0004  0.6382
+   0.3768 -0.484  -0.223  -0.3056 -0.3311  0.2034  1.0004]
+ [-0.2242 -0.3869  0.1795 -0.7028  0.193   0.5429 -0.2564  1.0004  0.8128
+   0.4808 -0.6717 -0.5936 -0.4217 -0.2404  0.5133  1.0004]
+ [-0.2168 -0.4344  0.1906 -0.8535  0.2055  0.6163 -0.3344  1.0004  0.9328
+   0.5619 -0.8021 -0.8012 -0.5143 -0.1812  0.7543  1.0004]
+ [-0.2129 -0.4531  0.1977 -0.8798  0.216   0.6397 -0.3939  1.0004  0.9729
+   0.5816 -0.8455 -0.9078 -0.5517 -0.1682  0.8529  1.0004]
+ [-0.2095 -0.4359  0.2001 -0.8986  0.2201  0.6499 -0.4256  1.0004  0.9883
+   0.6027 -0.8598 -0.924  -0.5656 -0.1571  0.8841  1.0004]
+ [-0.2029 -0.4144  0.2002 -0.8812  0.2285  0.6542 -0.4695  1.0004  0.9937
+   0.6151 -0.8644 -0.9542 -0.5684 -0.1527  0.8936  1.0004]
+ [-0.2012 -0.4035  0.1981 -0.8715  0.2335  0.6544 -0.495   1.0004  0.9943
+   0.6049 -0.8681 -0.986  -0.5713 -0.1594  0.8979  1.0004]
+ [-0.1915 -0.3964  0.1985 -0.8661  0.2356  0.6538 -0.5182  1.0004  0.9961
+   0.5895 -0.8656 -0.9508 -0.5705 -0.1468  0.8981  1.0004]
+ [-0.1858 -0.3779  0.1959 -0.8533  0.2377  0.6544 -0.54    1.0004  0.9961
+   0.5937 -0.8661 -0.9165 -0.5701 -0.1558  0.8978  1.0004]
+ [-0.1755 -0.346   0.1944 -0.8215  0.2474  0.6505 -0.5861  1.0004  0.9956
+   0.6006 -0.8691 -0.9651 -0.5713 -0.1554  0.897   1.0004]
+ [-0.1655 -0.3254  0.1959 -0.7967  0.2527  0.6512 -0.623   1.0004  0.9954
+   0.5907 -0.8686 -0.9641 -0.5692 -0.1518  0.8972  1.0004]
+ [-0.1601 -0.3024  0.1941 -0.7811  0.2513  0.6474 -0.6495  1.0004  0.9969
+   0.5987 -0.8676 -0.9529 -0.5703 -0.1523  0.8993  1.0004]]
+packed_state_32d_shape: (32,)
+packed_state_32d:
+[-0.174   0.1055 -0.0061  1.0124  0.086  -0.4741  0.2016  1.0004  0.
+  0.      0.      0.      0.      0.      0.      0.      0.0951  0.0668
+  0.0549  1.0086 -0.053  -0.3299 -1.0068  1.0004  0.      0.      0.
+  0.      0.      0.      0.      0.    ]
+packed_actions_32d_shape: (16, 32)
+packed_actions_32d:
+[[-0.2378  0.0147  0.1124  0.1989  0.1562  0.1251  0.0182  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.1108  0.0624
+   0.0823  0.9208  0.055  -0.5935 -0.7448  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2367 -0.0063  0.1178  0.1174  0.1593  0.1567 -0.0046  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.1686  0.107
+   0.02    0.7676  0.0127 -0.5697 -0.6371  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2338 -0.092   0.1305 -0.0529  0.1664  0.2368 -0.0585  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.303   0.1794
+  -0.1254  0.5072 -0.0788 -0.499  -0.3941  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2306 -0.1792  0.1444 -0.2606  0.1742  0.3352 -0.1219  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.4658  0.2811
+  -0.3003  0.1655 -0.1877 -0.4185 -0.1052  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2253 -0.2898  0.1623 -0.4809  0.1834  0.4374 -0.1883  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.6382  0.3768
+  -0.484  -0.223  -0.3056 -0.3311  0.2034  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2242 -0.3869  0.1795 -0.7028  0.193   0.5429 -0.2564  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.8128  0.4808
+  -0.6717 -0.5936 -0.4217 -0.2404  0.5133  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2168 -0.4344  0.1906 -0.8535  0.2055  0.6163 -0.3344  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9328  0.5619
+  -0.8021 -0.8012 -0.5143 -0.1812  0.7543  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2129 -0.4531  0.1977 -0.8798  0.216   0.6397 -0.3939  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9729  0.5816
+  -0.8455 -0.9078 -0.5517 -0.1682  0.8529  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2095 -0.4359  0.2001 -0.8986  0.2201  0.6499 -0.4256  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9883  0.6027
+  -0.8598 -0.924  -0.5656 -0.1571  0.8841  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2029 -0.4144  0.2002 -0.8812  0.2285  0.6542 -0.4695  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9937  0.6151
+  -0.8644 -0.9542 -0.5684 -0.1527  0.8936  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.2012 -0.4035  0.1981 -0.8715  0.2335  0.6544 -0.495   1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9943  0.6049
+  -0.8681 -0.986  -0.5713 -0.1594  0.8979  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.1915 -0.3964  0.1985 -0.8661  0.2356  0.6538 -0.5182  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9961  0.5895
+  -0.8656 -0.9508 -0.5705 -0.1468  0.8981  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.1858 -0.3779  0.1959 -0.8533  0.2377  0.6544 -0.54    1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9961  0.5937
+  -0.8661 -0.9165 -0.5701 -0.1558  0.8978  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.1755 -0.346   0.1944 -0.8215  0.2474  0.6505 -0.5861  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9956  0.6006
+  -0.8691 -0.9651 -0.5713 -0.1554  0.897   1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.1655 -0.3254  0.1959 -0.7967  0.2527  0.6512 -0.623   1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9954  0.5907
+  -0.8686 -0.9641 -0.5692 -0.1518  0.8972  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]
+ [-0.1601 -0.3024  0.1941 -0.7811  0.2513  0.6474 -0.6495  1.0004  0.
+   0.      0.      0.      0.      0.      0.      0.      0.9969  0.5987
+  -0.8676 -0.9529 -0.5703 -0.1523  0.8993  1.0004  0.      0.      0.
+   0.      0.      0.      0.      0.    ]]
+state_padded_zero_count: 16 / 16
+actions_padded_zero_count: 256 / 256
+state_padded_exact_zero: True
+actions_padded_exact_zero: True