Add files using upload-large-folder tool
Browse files- README.md +58 -0
- REPORT.md +347 -0
- artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/pi05_base_parallel_packed_from_single/config.json +14 -0
- artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/pi05_base_parallel_packed_from_single/init_parallel_metadata.json +27 -0
- artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/pi05_base_single_pytorch/config.json +7 -0
- artifacts/twin_handover_packed_parallelization_20260309/environment/gpu_info.txt +10 -0
- artifacts/twin_handover_packed_parallelization_20260309/environment/hf_env.txt +3 -0
- artifacts/twin_handover_packed_parallelization_20260309/environment/openpi_source_snapshot.txt +5 -0
- artifacts/twin_handover_packed_parallelization_20260309/environment/pip_freeze.txt +242 -0
- artifacts/twin_handover_packed_parallelization_20260309/environment/python_env.txt +11 -0
- artifacts/twin_handover_packed_parallelization_20260309/environment/selected_env_vars.json +1 -0
- artifacts/twin_handover_packed_parallelization_20260309/environment/system_info.txt +7 -0
- artifacts/twin_handover_packed_parallelization_20260309/environment/workspace_snapshot.txt +49 -0
- artifacts/twin_handover_packed_parallelization_20260309/metrics/norm_stats_verification.txt +9 -0
- artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json +318 -0
- artifacts/twin_handover_packed_parallelization_20260309/metrics/train_loss_table.csv +11 -0
- artifacts/twin_handover_packed_parallelization_20260309/metrics/val_loss_table.csv +5 -0
- artifacts/twin_handover_packed_parallelization_20260309/repro/changed_files.txt +15 -0
- artifacts/twin_handover_packed_parallelization_20260309/repro/checkpoint_locations.txt +6 -0
- artifacts/twin_handover_packed_parallelization_20260309/repro/commands_reproduce.sh +22 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/detach_test.log +2 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k.log +0 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k_val_1000.log +66 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k_val_2000.log +114 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k.log +0 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k_val_1000.log +64 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k_val_2000.log +114 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/importtime_train_pytorch.log +349 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/inspect_twin_packed_batch_handover_train.log +176 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20.log +241 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20b.log +0 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20d.log +34 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20e.log +34 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20k.log +234 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20l.log +141 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_parallel_20a.log +141 -0
- artifacts/twin_handover_packed_parallelization_20260309/run_logs/twin_handover_followup.log +37 -0
- artifacts/twin_handover_packed_parallelization_20260309/sanity_checks/inspect_twin_packed_batch_handover_train.log +176 -0
README.md
ADDED
|
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# pi0.5 Packed Multi-Arm OpenPI Artifacts
|
| 2 |
+
|
| 3 |
+
This repo packages a finished initial comparison between:
|
| 4 |
+
|
| 5 |
+
1. a packed single-head `pi0.5` baseline
|
| 6 |
+
2. a packed parallel-head `pi0.5` model with an exact packed warm-start from the single-head checkpoint
|
| 7 |
+
|
| 8 |
+
The study was run from the checked-out `openpi/` tree on `4x H100 80GB` with `bfloat16`, `2000` optimizer steps per model, verbose startup/debug logging, fixed validation passes, and no raw data reconversion.
|
| 9 |
+
|
| 10 |
+
## Dataset and packing
|
| 11 |
+
|
| 12 |
+
- Train repo: `lsnu/twin_handover_256_train`
|
| 13 |
+
- Val repo: `lsnu/twin_handover_256_val`
|
| 14 |
+
- Original TWIN layout: `[L8, R8]`
|
| 15 |
+
- Packed model layout used for both models: `[L8, 0x8, R8, 0x8]`
|
| 16 |
+
- Action-loss mask: active dims `[0:8]` and `[16:24]`, padded dims masked out
|
| 17 |
+
- Public `16`-dim norm stats were reused; they were not recomputed
|
| 18 |
+
|
| 19 |
+
## Headline results
|
| 20 |
+
|
| 21 |
+
| Model | Val @ 1000 | Val @ 2000 | Train runtime | Peak VRAM |
|
| 22 |
+
| --- | ---: | ---: | ---: | ---: |
|
| 23 |
+
| Packed baseline | `0.052885` | `0.035776` | `33:27` | `35.23 GB` |
|
| 24 |
+
| Packed parallel | `0.051214` | `0.035680` | `30:38` | `35.27 GB` |
|
| 25 |
+
|
| 26 |
+
The two models tracked closely. In this short run, the packed parallel head finished with a small edge on validation loss while staying within the same memory envelope.
|
| 27 |
+
|
| 28 |
+
## Repo contents
|
| 29 |
+
|
| 30 |
+
- `openpi/`
|
| 31 |
+
- modified training/eval code
|
| 32 |
+
- config and transform changes
|
| 33 |
+
- copied norm-stats assets for the new packed configs
|
| 34 |
+
- smoke and main-run checkpoints under `openpi/checkpoints/`
|
| 35 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/`
|
| 36 |
+
- `bootstrap_checkpoints/`: single-head PyTorch bootstrap and exact packed parallel warm-start
|
| 37 |
+
- `metrics/`: JSON and CSV summaries
|
| 38 |
+
- `run_logs/`: smoke, train, eval, and follow-up logs
|
| 39 |
+
- `sanity_checks/`: packed-batch inspection output
|
| 40 |
+
- `environment/`: system, GPU, package, HF-tooling, and workspace snapshots
|
| 41 |
+
- `repro/`: changed-file list, checkpoint locations, and rerun commands
|
| 42 |
+
- `artifacts/pi05_base_params/`
|
| 43 |
+
- staged base JAX parameter snapshot used for PyTorch conversion
|
| 44 |
+
|
| 45 |
+
## Key artifact paths
|
| 46 |
+
|
| 47 |
+
- Full report: `REPORT.md`
|
| 48 |
+
- Reproduction commands: `artifacts/twin_handover_packed_parallelization_20260309/repro/commands_reproduce.sh`
|
| 49 |
+
- Metrics summary: `artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json`
|
| 50 |
+
- Train loss table: `artifacts/twin_handover_packed_parallelization_20260309/metrics/train_loss_table.csv`
|
| 51 |
+
- Val loss table: `artifacts/twin_handover_packed_parallelization_20260309/metrics/val_loss_table.csv`
|
| 52 |
+
- Environment snapshot: `artifacts/twin_handover_packed_parallelization_20260309/environment/`
|
| 53 |
+
|
| 54 |
+
## Notes
|
| 55 |
+
|
| 56 |
+
- The packed parallel warm-start is exact by construction from the implemented slice/fuse mapping.
|
| 57 |
+
- Weight loading on both main runs reported `missing=0` and `unexpected=0`.
|
| 58 |
+
- The packaged tree intentionally records reproducibility snapshots instead of uploading transient cache state.
|
REPORT.md
ADDED
|
@@ -0,0 +1,347 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Report: pi0.5 Packed Action-Head Parallelization on TWIN Handover
|
| 2 |
+
|
| 3 |
+
## Objective
|
| 4 |
+
|
| 5 |
+
Run the minimum scientifically meaningful comparison between:
|
| 6 |
+
|
| 7 |
+
1. a packed single-head `pi0.5` baseline
|
| 8 |
+
2. a packed parallel-head `pi0.5` model
|
| 9 |
+
|
| 10 |
+
Both models were fine-tuned on the same converted public TWIN handover dataset with the same training schedule:
|
| 11 |
+
|
| 12 |
+
- train: `lsnu/twin_handover_256_train`
|
| 13 |
+
- val: `lsnu/twin_handover_256_val`
|
| 14 |
+
- hardware: `4x H100 80GB`
|
| 15 |
+
- precision: `bfloat16`
|
| 16 |
+
- global batch size: `16`
|
| 17 |
+
- optimizer steps per model: `2000`
|
| 18 |
+
- save interval: `250`
|
| 19 |
+
- log interval: `10`
|
| 20 |
+
|
| 21 |
+
## Data layout and packing
|
| 22 |
+
|
| 23 |
+
The TWIN converted state/action layout is `16` dims in `[L8, R8]`, where each arm is `7` joints plus gripper. The generic `pi0.5` path right-pads to `32` dims, which does not preserve a semantic left/right split for a naive parallel-head setup.
|
| 24 |
+
|
| 25 |
+
To keep the experiment minimal and still semantically correct:
|
| 26 |
+
|
| 27 |
+
- existing public `16`-dim norm stats were reused
|
| 28 |
+
- semantic packing happened after normalization in model transforms
|
| 29 |
+
- both models consumed the same packed `32`-dim layout:
|
| 30 |
+
|
| 31 |
+
```text
|
| 32 |
+
[L8, R8] -> [L8, 0x8, R8, 0x8]
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
- the action loss was masked so only the real arm dims contributed:
|
| 36 |
+
|
| 37 |
+
```text
|
| 38 |
+
active dims: [0:8] and [16:24]
|
| 39 |
+
masked dims: [8:16] and [24:32]
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
The packed-batch sanity check confirmed exact zero padding:
|
| 43 |
+
|
| 44 |
+
- `state_padded_zero_count: 16 / 16`
|
| 45 |
+
- `actions_padded_zero_count: 256 / 256`
|
| 46 |
+
- `state_padded_exact_zero: True`
|
| 47 |
+
- `actions_padded_exact_zero: True`
|
| 48 |
+
|
| 49 |
+
Reference log:
|
| 50 |
+
|
| 51 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/sanity_checks/inspect_twin_packed_batch_handover_train.log`
|
| 52 |
+
|
| 53 |
+
## Code changes tied to files
|
| 54 |
+
|
| 55 |
+
The experiment-specific changes are summarized below.
|
| 56 |
+
|
| 57 |
+
- `openpi/src/openpi/transforms.py`
|
| 58 |
+
- added `PackPerArmBlocks` and `UnpackPerArmBlocks` for semantic TWIN packed training
|
| 59 |
+
- `openpi/src/openpi/training/config.py`
|
| 60 |
+
- added packed TWIN model-transform path
|
| 61 |
+
- added `action_loss_mask`
|
| 62 |
+
- added `pi05_twin_handover_256_packed_baseline_pytorch_2k`
|
| 63 |
+
- added `pi05_twin_handover_256_packed_parallel_pytorch_2k`
|
| 64 |
+
- `openpi/src/openpi/training/data_loader.py`
|
| 65 |
+
- added `set_epoch`
|
| 66 |
+
- improved local dataset mirror handling and loader startup behavior
|
| 67 |
+
- `openpi/src/openpi/models/model.py`
|
| 68 |
+
- made `pi0_pytorch` import lazy
|
| 69 |
+
- `openpi/src/openpi/models/tokenizer.py`
|
| 70 |
+
- made `AutoProcessor` import lazy
|
| 71 |
+
- `openpi/src/openpi/models_pytorch/pi0_pytorch.py`
|
| 72 |
+
- disabled unconditional `sample_actions` `torch.compile` by default
|
| 73 |
+
- `openpi/scripts/train_pytorch.py`
|
| 74 |
+
- added startup prints
|
| 75 |
+
- added masked action-loss reduction
|
| 76 |
+
- added first-steps debug prints and periodic runtime/memory logging
|
| 77 |
+
- hardened DDP/checkpoint startup
|
| 78 |
+
- `openpi/scripts/eval_twin_val_loss_pytorch.py`
|
| 79 |
+
- added masked validation-loss evaluation with fixed-batch execution
|
| 80 |
+
- `openpi/scripts/init_parallel_pi05_from_single_pytorch.py`
|
| 81 |
+
- added exact packed parallel warm-start initialization
|
| 82 |
+
- `openpi/scripts/inspect_twin_packed_batch.py`
|
| 83 |
+
- added packed-batch inspection and zero-padding verification
|
| 84 |
+
- `openpi/scripts/run_twin_handover_packed_followup.sh`
|
| 85 |
+
- added detached follow-up automation for the remaining train/eval stages
|
| 86 |
+
- `openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json`
|
| 87 |
+
- copied the existing handover train norm stats for the packed baseline config
|
| 88 |
+
- `openpi/assets/pi05_twin_handover_256_packed_parallel_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json`
|
| 89 |
+
- copied the existing handover train norm stats for the packed parallel config
|
| 90 |
+
|
| 91 |
+
Reference file list:
|
| 92 |
+
|
| 93 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/repro/changed_files.txt`
|
| 94 |
+
|
| 95 |
+
## Commands run
|
| 96 |
+
|
| 97 |
+
The exact rerun command list is saved in:
|
| 98 |
+
|
| 99 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/repro/commands_reproduce.sh`
|
| 100 |
+
|
| 101 |
+
The executed flow was:
|
| 102 |
+
|
| 103 |
+
1. packed-batch inspection
|
| 104 |
+
2. base `pi0.5` JAX-to-PyTorch conversion
|
| 105 |
+
3. exact packed parallel warm-start initialization from the single-head PyTorch checkpoint
|
| 106 |
+
4. packed baseline training for `2000` steps
|
| 107 |
+
5. baseline val at `1000`
|
| 108 |
+
6. baseline val at `2000`
|
| 109 |
+
7. packed parallel training for `2000` steps
|
| 110 |
+
8. parallel val at `1000`
|
| 111 |
+
9. parallel val at `2000`
|
| 112 |
+
|
| 113 |
+
The parallel training and its validation passes were chained through a detached follow-up runner.
|
| 114 |
+
|
| 115 |
+
Reference logs:
|
| 116 |
+
|
| 117 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/twin_handover_followup.log`
|
| 118 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k.log`
|
| 119 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k.log`
|
| 120 |
+
|
| 121 |
+
## Startup sanity checks
|
| 122 |
+
|
| 123 |
+
### Norm stats
|
| 124 |
+
|
| 125 |
+
The copied norm-stats files were loaded successfully and reported:
|
| 126 |
+
|
| 127 |
+
- keys: `['actions', 'state']`
|
| 128 |
+
- `state_mean_len=16`
|
| 129 |
+
- `state_std_len=16`
|
| 130 |
+
- `actions_mean_len=16`
|
| 131 |
+
- `actions_std_len=16`
|
| 132 |
+
|
| 133 |
+
Reference:
|
| 134 |
+
|
| 135 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/metrics/norm_stats_verification.txt`
|
| 136 |
+
|
| 137 |
+
### Baseline startup summary
|
| 138 |
+
|
| 139 |
+
Rank-0 startup logging for the packed baseline recorded:
|
| 140 |
+
|
| 141 |
+
```text
|
| 142 |
+
Resolved config name: pi05_twin_handover_256_packed_baseline_pytorch_2k
|
| 143 |
+
Dataset repo_id: lsnu/twin_handover_256_train
|
| 144 |
+
Norm-stats summary: {'keys': ['actions', 'state'], 'state_mean_len': 16, 'state_std_len': 16, 'actions_mean_len': 16, 'actions_std_len': 16}
|
| 145 |
+
Checkpoint source path: /workspace/checkpoints/pi05_base_single_pytorch
|
| 146 |
+
Model type: baseline
|
| 147 |
+
Packed transforms active: True
|
| 148 |
+
Batch size: local=4, global=16
|
| 149 |
+
Action-loss mask: (1.0 x8, 0.0 x8, 1.0 x8, 0.0 x8)
|
| 150 |
+
Weight loading missing key count: 0
|
| 151 |
+
Weight loading unexpected key count: 0
|
| 152 |
+
```
|
| 153 |
+
|
| 154 |
+
The first debug steps also showed:
|
| 155 |
+
|
| 156 |
+
- `observation.state shape=(4, 32)`
|
| 157 |
+
- `actions shape=(4, 16, 32)`
|
| 158 |
+
- `state_nonzero_counts_8d_blocks=[32, 0, 32, 0]`
|
| 159 |
+
- `action_nonzero_counts_8d_blocks=[512, 0, 512, 0]`
|
| 160 |
+
- masked padded dims stayed exactly zero in the batch
|
| 161 |
+
|
| 162 |
+
### Parallel startup summary
|
| 163 |
+
|
| 164 |
+
Rank-0 startup logging for the packed parallel run recorded:
|
| 165 |
+
|
| 166 |
+
```text
|
| 167 |
+
Resolved config name: pi05_twin_handover_256_packed_parallel_pytorch_2k
|
| 168 |
+
Dataset repo_id: lsnu/twin_handover_256_train
|
| 169 |
+
Norm-stats summary: {'keys': ['actions', 'state'], 'state_mean_len': 16, 'state_std_len': 16, 'actions_mean_len': 16, 'actions_std_len': 16}
|
| 170 |
+
Checkpoint source path: /workspace/checkpoints/pi05_base_parallel_packed_from_single
|
| 171 |
+
Model type: parallel
|
| 172 |
+
Packed transforms active: True
|
| 173 |
+
Batch size: local=4, global=16
|
| 174 |
+
Action-loss mask: (1.0 x8, 0.0 x8, 1.0 x8, 0.0 x8)
|
| 175 |
+
Weight loading missing key count: 0
|
| 176 |
+
Weight loading unexpected key count: 0
|
| 177 |
+
```
|
| 178 |
+
|
| 179 |
+
The first debug steps matched the expected packed layout:
|
| 180 |
+
|
| 181 |
+
- `observation.state shape=(4, 32)`
|
| 182 |
+
- `actions shape=(4, 16, 32)`
|
| 183 |
+
- `state_nonzero_counts_8d_blocks=[32, 0, 32, 0]`
|
| 184 |
+
- `action_nonzero_counts_8d_blocks=[512, 0, 512, 0]`
|
| 185 |
+
|
| 186 |
+
### Smoke tests
|
| 187 |
+
|
| 188 |
+
All required smoke tests passed before the main runs:
|
| 189 |
+
|
| 190 |
+
1. `debug_pi05_multiarm_pytorch_smoke`
|
| 191 |
+
2. packed-batch inspection on `lsnu/twin_handover_256_train`
|
| 192 |
+
3. packed baseline TWIN smoke on `4` GPUs for `20` steps
|
| 193 |
+
4. packed parallel TWIN smoke on `4` GPUs for `20` steps
|
| 194 |
+
|
| 195 |
+
Smoke logs are stored in:
|
| 196 |
+
|
| 197 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20k.log`
|
| 198 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20l.log`
|
| 199 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_parallel_20a.log`
|
| 200 |
+
|
| 201 |
+
## Warm-start note
|
| 202 |
+
|
| 203 |
+
The packed parallel warm-start was implemented as an exact slice/fuse mapping from the single-head PyTorch checkpoint:
|
| 204 |
+
|
| 205 |
+
- input side: split single-head input projection by packed arm blocks
|
| 206 |
+
- fuse side: initialize `arm_token_fuse.weight` as `[I I]`
|
| 207 |
+
- output side: split single-head output projection rows by packed arm blocks
|
| 208 |
+
|
| 209 |
+
This was exact by construction for the implemented mapping and both the warm-start checkpoint creation and main-run loading succeeded without missing or unexpected keys.
|
| 210 |
+
|
| 211 |
+
What was not done:
|
| 212 |
+
|
| 213 |
+
- no separate numerical equivalence test was run that compared step-0 forward outputs between the single-head and parallel-head models on the same batch
|
| 214 |
+
|
| 215 |
+
Bootstrap checkpoints:
|
| 216 |
+
|
| 217 |
+
- `/workspace/checkpoints/pi05_base_single_pytorch`
|
| 218 |
+
- `/workspace/checkpoints/pi05_base_parallel_packed_from_single`
|
| 219 |
+
|
| 220 |
+
Copies are also staged under:
|
| 221 |
+
|
| 222 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/`
|
| 223 |
+
|
| 224 |
+
## Results
|
| 225 |
+
|
| 226 |
+
### Training loss snapshots
|
| 227 |
+
|
| 228 |
+
| Model | Step 250 | Step 500 | Step 1000 | Step 1500 | Step 2000 |
|
| 229 |
+
| --- | ---: | ---: | ---: | ---: | ---: |
|
| 230 |
+
| Baseline loss | `0.1975` | `0.0606` | `0.0245` | `0.0155` | `0.0391` |
|
| 231 |
+
| Baseline smoothed | `0.1166` | `0.0554` | `0.0387` | `0.0331` | `0.0278` |
|
| 232 |
+
| Parallel loss | `0.1894` | `0.0633` | `0.0214` | `0.0155` | `0.0326` |
|
| 233 |
+
| Parallel smoothed | `0.1153` | `0.0565` | `0.0392` | `0.0331` | `0.0270` |
|
| 234 |
+
|
| 235 |
+
### Validation loss
|
| 236 |
+
|
| 237 |
+
| Model | Checkpoint | Batches | Mean val loss | Std val loss |
|
| 238 |
+
| --- | ---: | ---: | ---: | ---: |
|
| 239 |
+
| Baseline | `1000` | `50` | `0.052885` | `0.032533` |
|
| 240 |
+
| Baseline | `2000` | `100` | `0.035776` | `0.027648` |
|
| 241 |
+
| Parallel | `1000` | `50` | `0.051214` | `0.028985` |
|
| 242 |
+
| Parallel | `2000` | `100` | `0.035680` | `0.026077` |
|
| 243 |
+
|
| 244 |
+
### Runtime and memory
|
| 245 |
+
|
| 246 |
+
| Item | Value |
|
| 247 |
+
| --- | --- |
|
| 248 |
+
| Pipeline wallclock from baseline launch to final val | `01:32:29` |
|
| 249 |
+
| Detached follow-up runner wallclock | `01:17:47` |
|
| 250 |
+
| Baseline train runtime | `33:27` |
|
| 251 |
+
| Parallel train runtime | `30:38` |
|
| 252 |
+
| Baseline val @ 1000 | `00:05:14` |
|
| 253 |
+
| Baseline val @ 2000 | `00:05:19` |
|
| 254 |
+
| Parallel val @ 1000 | `00:03:23` |
|
| 255 |
+
| Parallel val @ 2000 | `00:03:33` |
|
| 256 |
+
| Peak baseline VRAM | `35.23 GB` |
|
| 257 |
+
| Peak parallel VRAM | `35.27 GB` |
|
| 258 |
+
|
| 259 |
+
### Interpretation
|
| 260 |
+
|
| 261 |
+
For this short `2000`-step TWIN handover run, the packed baseline and packed parallel-head models behaved very similarly. The packed parallel-head model ended slightly lower on both validation checkpoints while staying in the same memory range and training cleanly under the same schedule.
|
| 262 |
+
|
| 263 |
+
This should be treated as an initial profiling run, not a final benchmark claim.
|
| 264 |
+
|
| 265 |
+
Reference metrics:
|
| 266 |
+
|
| 267 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json`
|
| 268 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/metrics/train_loss_table.csv`
|
| 269 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/metrics/val_loss_table.csv`
|
| 270 |
+
|
| 271 |
+
## Checkpoints and logs
|
| 272 |
+
|
| 273 |
+
### Main-run checkpoints
|
| 274 |
+
|
| 275 |
+
- Baseline step `1000`:
|
| 276 |
+
- `/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1000`
|
| 277 |
+
- Baseline step `2000`:
|
| 278 |
+
- `/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/2000`
|
| 279 |
+
- Parallel step `1000`:
|
| 280 |
+
- `/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1000`
|
| 281 |
+
- Parallel step `2000`:
|
| 282 |
+
- `/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/2000`
|
| 283 |
+
|
| 284 |
+
The full checkpoint trees, including smoke checkpoints and intermediate saves every `250` steps, are under:
|
| 285 |
+
|
| 286 |
+
- `openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/`
|
| 287 |
+
- `openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/`
|
| 288 |
+
|
| 289 |
+
### Bootstrap checkpoints
|
| 290 |
+
|
| 291 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/pi05_base_single_pytorch/`
|
| 292 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/pi05_base_parallel_packed_from_single/`
|
| 293 |
+
|
| 294 |
+
### Logs
|
| 295 |
+
|
| 296 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k.log`
|
| 297 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k_val_1000.log`
|
| 298 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k_val_2000.log`
|
| 299 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k.log`
|
| 300 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k_val_1000.log`
|
| 301 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k_val_2000.log`
|
| 302 |
+
|
| 303 |
+
## Environment and provenance snapshot
|
| 304 |
+
|
| 305 |
+
Environment snapshots are stored in:
|
| 306 |
+
|
| 307 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/environment/system_info.txt`
|
| 308 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/environment/gpu_info.txt`
|
| 309 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/environment/python_env.txt`
|
| 310 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/environment/pip_freeze.txt`
|
| 311 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/environment/hf_env.txt`
|
| 312 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/environment/selected_env_vars.json`
|
| 313 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/environment/workspace_snapshot.txt`
|
| 314 |
+
- `artifacts/twin_handover_packed_parallelization_20260309/environment/openpi_source_snapshot.txt`
|
| 315 |
+
|
| 316 |
+
OpenPI source provenance:
|
| 317 |
+
|
| 318 |
+
- packaged `openpi/` tree does not contain a live `.git` directory
|
| 319 |
+
- source clone snapshot recorded in `openpi_source_snapshot.txt`
|
| 320 |
+
- source commit: `aa91438c0c130dcef4ccf378a56f4cf4cffc1310`
|
| 321 |
+
|
| 322 |
+
## Acceptance criteria status
|
| 323 |
+
|
| 324 |
+
1. Packed-batch inspection showed raw `16`-dim `[L8, R8]` and packed `32`-dim `[L8, 0x8, R8, 0x8]`: `PASS`
|
| 325 |
+
2. Both smoke tests passed on `4` GPUs with finite loss: `PASS`
|
| 326 |
+
3. Baseline run started from `/workspace/checkpoints/pi05_base_single_pytorch`: `PASS`
|
| 327 |
+
4. Parallel run started from `/workspace/checkpoints/pi05_base_parallel_packed_from_single`: `PASS`
|
| 328 |
+
5. Masked loss was active and padded dims were excluded: `PASS`
|
| 329 |
+
6. DDP ran without shape/key mismatches: `PASS`
|
| 330 |
+
7. Quick val was run at step `1000` for both models: `PASS`
|
| 331 |
+
8. Final val was run at step `2000` for both models: `PASS`
|
| 332 |
+
9. Both main runs finished under the `10`-hour cap: `PASS`
|
| 333 |
+
10. Final bundle includes code, checkpoints, logs, metrics, and environment snapshot: `PASS`
|
| 334 |
+
|
| 335 |
+
## Final inventory
|
| 336 |
+
|
| 337 |
+
The artifact bundle at repo root contains:
|
| 338 |
+
|
| 339 |
+
- all modified training/eval code under `openpi/`
|
| 340 |
+
- all baseline and parallel checkpoints under `openpi/checkpoints/`
|
| 341 |
+
- both bootstrap checkpoints under `artifacts/.../bootstrap_checkpoints/`
|
| 342 |
+
- all train/eval/smoke logs under `artifacts/.../run_logs/`
|
| 343 |
+
- metrics tables and summary JSON under `artifacts/.../metrics/`
|
| 344 |
+
- reproducibility files under `artifacts/.../repro/`
|
| 345 |
+
- environment and provenance snapshot under `artifacts/.../environment/`
|
| 346 |
+
|
| 347 |
+
This is a complete rerunnable package for the initial TWIN handover packed action-head parallelization study.
|
artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/pi05_base_parallel_packed_from_single/config.json
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"action_dim": 32,
|
| 3 |
+
"action_expert_variant": "gemma_300m",
|
| 4 |
+
"action_horizon": 16,
|
| 5 |
+
"arm_action_dims": [
|
| 6 |
+
16,
|
| 7 |
+
16
|
| 8 |
+
],
|
| 9 |
+
"discrete_state_input": true,
|
| 10 |
+
"dtype": "bfloat16",
|
| 11 |
+
"max_token_len": 200,
|
| 12 |
+
"paligemma_variant": "gemma_2b",
|
| 13 |
+
"pi05": true
|
| 14 |
+
}
|
artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/pi05_base_parallel_packed_from_single/init_parallel_metadata.json
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"config_name": "pi05_twin_handover_256_packed_parallel_pytorch_2k",
|
| 3 |
+
"input_projection_max_abs_diff": 1.1920928955078125e-06,
|
| 4 |
+
"load_state_missing_keys": [
|
| 5 |
+
"paligemma_with_expert.paligemma.model.language_model.embed_tokens.weight",
|
| 6 |
+
"action_in_proj_arms.0.weight",
|
| 7 |
+
"action_in_proj_arms.0.bias",
|
| 8 |
+
"action_in_proj_arms.1.weight",
|
| 9 |
+
"action_in_proj_arms.1.bias",
|
| 10 |
+
"arm_token_fuse.weight",
|
| 11 |
+
"arm_token_fuse.bias",
|
| 12 |
+
"action_out_proj_arms.0.weight",
|
| 13 |
+
"action_out_proj_arms.0.bias",
|
| 14 |
+
"action_out_proj_arms.1.weight",
|
| 15 |
+
"action_out_proj_arms.1.bias"
|
| 16 |
+
],
|
| 17 |
+
"load_state_unexpected_keys": [
|
| 18 |
+
"action_in_proj.bias",
|
| 19 |
+
"action_in_proj.weight",
|
| 20 |
+
"action_out_proj.bias",
|
| 21 |
+
"action_out_proj.weight"
|
| 22 |
+
],
|
| 23 |
+
"output_path": "/workspace/checkpoints/pi05_base_parallel_packed_from_single",
|
| 24 |
+
"output_projection_max_abs_diff": 9.5367431640625e-07,
|
| 25 |
+
"single_ckpt": "/workspace/checkpoints/pi05_base_single_pytorch",
|
| 26 |
+
"warm_start_exact": false
|
| 27 |
+
}
|
artifacts/twin_handover_packed_parallelization_20260309/bootstrap_checkpoints/pi05_base_single_pytorch/config.json
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"action_dim": 32,
|
| 3 |
+
"action_horizon": 16,
|
| 4 |
+
"paligemma_variant": "gemma_2b",
|
| 5 |
+
"action_expert_variant": "gemma_300m",
|
| 6 |
+
"precision": "bfloat16"
|
| 7 |
+
}
|
artifacts/twin_handover_packed_parallelization_20260309/environment/gpu_info.txt
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
timestamp_utc=2026-03-09T02:09:46Z
|
| 2 |
+
GPU 0: NVIDIA H100 80GB HBM3 (UUID: GPU-352e04eb-3fa2-0b3b-c24f-5c9567d275af)
|
| 3 |
+
GPU 1: NVIDIA H100 80GB HBM3 (UUID: GPU-09e17180-0d03-02d6-53c8-863ebf34f1a0)
|
| 4 |
+
GPU 2: NVIDIA H100 80GB HBM3 (UUID: GPU-323a86ac-758a-6993-c4b8-7b0c6cf94b3f)
|
| 5 |
+
GPU 3: NVIDIA H100 80GB HBM3 (UUID: GPU-dfccd461-1fa0-0b62-00da-e9abb74fb025)
|
| 6 |
+
|
| 7 |
+
0, NVIDIA H100 80GB HBM3, 81559 MiB, 580.126.09
|
| 8 |
+
1, NVIDIA H100 80GB HBM3, 81559 MiB, 580.126.09
|
| 9 |
+
2, NVIDIA H100 80GB HBM3, 81559 MiB, 580.126.09
|
| 10 |
+
3, NVIDIA H100 80GB HBM3, 81559 MiB, 580.126.09
|
artifacts/twin_handover_packed_parallelization_20260309/environment/hf_env.txt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
timestamp_utc=2026-03-09T02:10:06Z
|
| 2 |
+
hf_version=1.6.0
|
| 3 |
+
auth_state=Not logged in
|
artifacts/twin_handover_packed_parallelization_20260309/environment/openpi_source_snapshot.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
timestamp_utc=2026-03-09T02:11:23Z
|
| 2 |
+
packaged_openpi_has_git=no
|
| 3 |
+
source_clone_path=/workspace/openpi_partial_broken_1773005128
|
| 4 |
+
source_commit=aa91438c0c130dcef4ccf378a56f4cf4cffc1310
|
| 5 |
+
source_remote=https://huggingface.co/lsnu/pi05tests-openpi-multiarm
|
artifacts/twin_handover_packed_parallelization_20260309/environment/pip_freeze.txt
ADDED
|
@@ -0,0 +1,242 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
absl-py==2.3.0
|
| 2 |
+
aiohappyeyeballs==2.6.1
|
| 3 |
+
aiohttp==3.12.4
|
| 4 |
+
aiosignal==1.3.2
|
| 5 |
+
annotated-types==0.7.0
|
| 6 |
+
antlr4-python3-runtime==4.9.3
|
| 7 |
+
asttokens==3.0.0
|
| 8 |
+
attrs==25.3.0
|
| 9 |
+
augmax==0.4.1
|
| 10 |
+
av==14.4.0
|
| 11 |
+
beartype==0.19.0
|
| 12 |
+
beautifulsoup4==4.13.4
|
| 13 |
+
blinker==1.9.0
|
| 14 |
+
cachetools==5.5.2
|
| 15 |
+
certifi==2025.4.26
|
| 16 |
+
cffi==1.17.1
|
| 17 |
+
cfgv==3.4.0
|
| 18 |
+
charset-normalizer==3.4.2
|
| 19 |
+
chex==0.1.89
|
| 20 |
+
click==8.2.1
|
| 21 |
+
cloudpickle==3.1.1
|
| 22 |
+
cmake==4.0.2
|
| 23 |
+
comm==0.2.2
|
| 24 |
+
contourpy==1.3.2
|
| 25 |
+
crc32c==2.7.1
|
| 26 |
+
cycler==0.12.1
|
| 27 |
+
datasets==3.6.0
|
| 28 |
+
debugpy==1.8.14
|
| 29 |
+
decorator==5.2.1
|
| 30 |
+
deepdiff==8.5.0
|
| 31 |
+
diffusers==0.33.1
|
| 32 |
+
dill==0.3.8
|
| 33 |
+
distlib==0.3.9
|
| 34 |
+
dm-control==1.0.14
|
| 35 |
+
dm-env==1.6
|
| 36 |
+
dm-tree==0.1.9
|
| 37 |
+
docker-pycreds==0.4.0
|
| 38 |
+
docstring-parser==0.16
|
| 39 |
+
donfig==0.8.1.post1
|
| 40 |
+
draccus==0.10.0
|
| 41 |
+
einops==0.8.1
|
| 42 |
+
equinox==0.12.2
|
| 43 |
+
etils==1.12.2
|
| 44 |
+
evdev==1.9.2
|
| 45 |
+
executing==2.2.0
|
| 46 |
+
farama-notifications==0.0.4
|
| 47 |
+
filelock==3.18.0
|
| 48 |
+
flask==3.1.1
|
| 49 |
+
flatbuffers==25.2.10
|
| 50 |
+
flax==0.10.2
|
| 51 |
+
fonttools==4.58.1
|
| 52 |
+
frozenlist==1.6.0
|
| 53 |
+
fsspec==2025.3.0
|
| 54 |
+
gcsfs==2025.3.0
|
| 55 |
+
gdown==5.2.0
|
| 56 |
+
gitdb==4.0.12
|
| 57 |
+
gitpython==3.1.44
|
| 58 |
+
glfw==2.9.0
|
| 59 |
+
google-api-core==2.24.2
|
| 60 |
+
google-auth==2.40.2
|
| 61 |
+
google-auth-oauthlib==1.2.2
|
| 62 |
+
google-cloud-core==2.4.3
|
| 63 |
+
google-cloud-storage==3.1.0
|
| 64 |
+
google-crc32c==1.7.1
|
| 65 |
+
google-resumable-media==2.7.2
|
| 66 |
+
googleapis-common-protos==1.70.0
|
| 67 |
+
gym-aloha==0.1.1
|
| 68 |
+
gymnasium==0.29.1
|
| 69 |
+
h5py==3.13.0
|
| 70 |
+
hf-transfer==0.1.9
|
| 71 |
+
hf-xet==1.1.2
|
| 72 |
+
huggingface-hub==0.32.3
|
| 73 |
+
humanize==4.12.3
|
| 74 |
+
identify==2.6.12
|
| 75 |
+
idna==3.10
|
| 76 |
+
imageio==2.37.0
|
| 77 |
+
imageio-ffmpeg==0.6.0
|
| 78 |
+
importlib-metadata==8.7.0
|
| 79 |
+
importlib-resources==6.5.2
|
| 80 |
+
iniconfig==2.1.0
|
| 81 |
+
inquirerpy==0.3.4
|
| 82 |
+
ipykernel==6.29.5
|
| 83 |
+
ipython==9.2.0
|
| 84 |
+
ipython-pygments-lexers==1.1.1
|
| 85 |
+
ipywidgets==8.1.7
|
| 86 |
+
itsdangerous==2.2.0
|
| 87 |
+
jax==0.5.3
|
| 88 |
+
jax-cuda12-pjrt==0.5.3
|
| 89 |
+
jax-cuda12-plugin==0.5.3
|
| 90 |
+
jaxlib==0.5.3
|
| 91 |
+
jaxtyping==0.2.36
|
| 92 |
+
jedi==0.19.2
|
| 93 |
+
jinja2==3.1.6
|
| 94 |
+
jsonlines==4.0.0
|
| 95 |
+
jupyter-client==8.6.3
|
| 96 |
+
jupyter-core==5.8.1
|
| 97 |
+
jupyterlab-widgets==3.0.15
|
| 98 |
+
kiwisolver==1.4.8
|
| 99 |
+
labmaze==1.0.6
|
| 100 |
+
lerobot @ git+https://github.com/huggingface/lerobot@0cf864870cf29f4738d3ade893e6fd13fbd7cdb5
|
| 101 |
+
llvmlite==0.44.0
|
| 102 |
+
lxml==5.4.0
|
| 103 |
+
markdown-it-py==3.0.0
|
| 104 |
+
markupsafe==3.0.2
|
| 105 |
+
matplotlib==3.10.3
|
| 106 |
+
matplotlib-inline==0.1.7
|
| 107 |
+
mdurl==0.1.2
|
| 108 |
+
mergedeep==1.3.4
|
| 109 |
+
ml-collections==1.0.0
|
| 110 |
+
ml-dtypes==0.4.1
|
| 111 |
+
mpmath==1.3.0
|
| 112 |
+
msgpack==1.1.0
|
| 113 |
+
mujoco==2.3.7
|
| 114 |
+
multidict==6.4.4
|
| 115 |
+
multiprocess==0.70.16
|
| 116 |
+
mypy-extensions==1.1.0
|
| 117 |
+
nest-asyncio==1.6.0
|
| 118 |
+
networkx==3.5
|
| 119 |
+
nodeenv==1.9.1
|
| 120 |
+
numba==0.61.2
|
| 121 |
+
numcodecs==0.16.1
|
| 122 |
+
numpy==1.26.4
|
| 123 |
+
numpydantic==1.6.9
|
| 124 |
+
nvidia-cublas-cu12==12.6.4.1
|
| 125 |
+
nvidia-cuda-cupti-cu12==12.6.80
|
| 126 |
+
nvidia-cuda-nvcc-cu12==12.9.41
|
| 127 |
+
nvidia-cuda-nvrtc-cu12==12.6.77
|
| 128 |
+
nvidia-cuda-runtime-cu12==12.6.77
|
| 129 |
+
nvidia-cudnn-cu12==9.5.1.17
|
| 130 |
+
nvidia-cufft-cu12==11.3.0.4
|
| 131 |
+
nvidia-cufile-cu12==1.11.1.6
|
| 132 |
+
nvidia-curand-cu12==10.3.7.77
|
| 133 |
+
nvidia-cusolver-cu12==11.7.1.2
|
| 134 |
+
nvidia-cusparse-cu12==12.5.4.2
|
| 135 |
+
nvidia-cusparselt-cu12==0.6.3
|
| 136 |
+
nvidia-ml-py==12.575.51
|
| 137 |
+
nvidia-nccl-cu12==2.26.2
|
| 138 |
+
nvidia-nvjitlink-cu12==12.6.85
|
| 139 |
+
nvidia-nvtx-cu12==12.6.77
|
| 140 |
+
oauthlib==3.2.2
|
| 141 |
+
omegaconf==2.3.0
|
| 142 |
+
opencv-python==4.11.0.86
|
| 143 |
+
opencv-python-headless==4.11.0.86
|
| 144 |
+
-e file:///workspace/pi05tests-openpi-multiarm/openpi
|
| 145 |
+
-e file:///workspace/pi05tests-openpi-multiarm/openpi/packages/openpi-client
|
| 146 |
+
opt-einsum==3.4.0
|
| 147 |
+
optax==0.2.4
|
| 148 |
+
orbax-checkpoint==0.11.13
|
| 149 |
+
orderly-set==5.4.1
|
| 150 |
+
packaging==25.0
|
| 151 |
+
pandas==2.2.3
|
| 152 |
+
parso==0.8.4
|
| 153 |
+
pexpect==4.9.0
|
| 154 |
+
pfzy==0.3.4
|
| 155 |
+
pillow==11.2.1
|
| 156 |
+
platformdirs==4.3.8
|
| 157 |
+
pluggy==1.6.0
|
| 158 |
+
polars==1.30.0
|
| 159 |
+
pre-commit==4.2.0
|
| 160 |
+
prompt-toolkit==3.0.51
|
| 161 |
+
propcache==0.3.1
|
| 162 |
+
proto-plus==1.26.1
|
| 163 |
+
protobuf==4.25.8
|
| 164 |
+
psutil==7.0.0
|
| 165 |
+
ptyprocess==0.7.0
|
| 166 |
+
pure-eval==0.2.3
|
| 167 |
+
pyarrow==20.0.0
|
| 168 |
+
pyasn1==0.6.1
|
| 169 |
+
pyasn1-modules==0.4.2
|
| 170 |
+
pycparser==2.22
|
| 171 |
+
pydantic==2.11.5
|
| 172 |
+
pydantic-core==2.33.2
|
| 173 |
+
pygments==2.19.1
|
| 174 |
+
pymunk==7.0.0
|
| 175 |
+
pynput==1.8.1
|
| 176 |
+
pynvml==12.0.0
|
| 177 |
+
pyopengl==3.1.9
|
| 178 |
+
pyparsing==3.2.3
|
| 179 |
+
pysocks==1.7.1
|
| 180 |
+
pytest==8.3.5
|
| 181 |
+
python-dateutil==2.9.0.post0
|
| 182 |
+
python-xlib==0.33
|
| 183 |
+
pytz==2025.2
|
| 184 |
+
pyyaml==6.0.2
|
| 185 |
+
pyyaml-include==1.4.1
|
| 186 |
+
pyzmq==26.4.0
|
| 187 |
+
regex==2024.11.6
|
| 188 |
+
requests==2.32.3
|
| 189 |
+
requests-oauthlib==2.0.0
|
| 190 |
+
rerun-sdk==0.23.1
|
| 191 |
+
rich==14.0.0
|
| 192 |
+
rsa==4.9.1
|
| 193 |
+
ruff==0.11.12
|
| 194 |
+
safetensors==0.5.3
|
| 195 |
+
scipy==1.15.3
|
| 196 |
+
sentencepiece==0.2.0
|
| 197 |
+
sentry-sdk==2.29.1
|
| 198 |
+
setproctitle==1.3.6
|
| 199 |
+
setuptools==80.9.0
|
| 200 |
+
shtab==1.7.2
|
| 201 |
+
simplejson==3.20.1
|
| 202 |
+
six==1.17.0
|
| 203 |
+
smmap==5.0.2
|
| 204 |
+
soupsieve==2.7
|
| 205 |
+
stack-data==0.6.3
|
| 206 |
+
svgwrite==1.4.3
|
| 207 |
+
sympy==1.14.0
|
| 208 |
+
tensorstore==0.1.74
|
| 209 |
+
termcolor==3.1.0
|
| 210 |
+
tokenizers==0.21.1
|
| 211 |
+
toml==0.10.2
|
| 212 |
+
toolz==1.0.0
|
| 213 |
+
torch==2.7.1
|
| 214 |
+
torchcodec==0.4.0
|
| 215 |
+
torchvision==0.22.1
|
| 216 |
+
tornado==6.5.1
|
| 217 |
+
tqdm==4.67.1
|
| 218 |
+
tqdm-loggable==0.2
|
| 219 |
+
traitlets==5.14.3
|
| 220 |
+
transformers==4.53.2
|
| 221 |
+
tree==0.2.4
|
| 222 |
+
treescope==0.1.9
|
| 223 |
+
triton==3.3.1
|
| 224 |
+
typeguard==4.4.2
|
| 225 |
+
typing-extensions==4.13.2
|
| 226 |
+
typing-inspect==0.9.0
|
| 227 |
+
typing-inspection==0.4.1
|
| 228 |
+
tyro==0.9.22
|
| 229 |
+
tzdata==2025.2
|
| 230 |
+
urllib3==2.4.0
|
| 231 |
+
virtualenv==20.31.2
|
| 232 |
+
wadler-lindig==0.1.6
|
| 233 |
+
wandb==0.19.11
|
| 234 |
+
wcwidth==0.2.13
|
| 235 |
+
websockets==15.0.1
|
| 236 |
+
werkzeug==3.1.3
|
| 237 |
+
widgetsnbextension==4.0.14
|
| 238 |
+
wrapt==1.14.1
|
| 239 |
+
xxhash==3.5.0
|
| 240 |
+
yarl==1.20.0
|
| 241 |
+
zarr==3.0.8
|
| 242 |
+
zipp==3.22.0
|
artifacts/twin_handover_packed_parallelization_20260309/environment/python_env.txt
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
timestamp_utc=2026-03-09T02:09:46Z
|
| 2 |
+
Python 3.11.10
|
| 3 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/bin/python
|
| 4 |
+
/usr/local/bin/uv
|
| 5 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/bin/python
|
| 6 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv
|
| 7 |
+
|
| 8 |
+
torch=2.7.1+cu126
|
| 9 |
+
cuda=12.6
|
| 10 |
+
cudnn=90501
|
| 11 |
+
huggingface_hub=0.32.3
|
artifacts/twin_handover_packed_parallelization_20260309/environment/selected_env_vars.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{}
|
artifacts/twin_handover_packed_parallelization_20260309/environment/system_info.txt
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
timestamp_utc=2026-03-09T02:08:36Z
|
| 2 |
+
hostname=9e9e564d5d6e
|
| 3 |
+
uname=Linux 9e9e564d5d6e 6.8.0-90-generic #91-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 18 14:14:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
|
| 4 |
+
python=Python 3.11.10
|
| 5 |
+
uv=uv 0.10.9
|
| 6 |
+
torch=2.7.1+cu126
|
| 7 |
+
huggingface_hub=0.32.3
|
artifacts/twin_handover_packed_parallelization_20260309/environment/workspace_snapshot.txt
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
timestamp_utc=2026-03-09T02:10:07Z
|
| 2 |
+
|
| 3 |
+
Top-level /workspace contents:
|
| 4 |
+
/workspace/.codex
|
| 5 |
+
/workspace/.hf
|
| 6 |
+
/workspace/.local
|
| 7 |
+
/workspace/bin
|
| 8 |
+
/workspace/checkpoints
|
| 9 |
+
/workspace/codex-env.sh
|
| 10 |
+
/workspace/lerobot
|
| 11 |
+
/workspace/openpi
|
| 12 |
+
/workspace/openpi_partial_broken_1773005128
|
| 13 |
+
/workspace/pi05tests-openpi-multiarm
|
| 14 |
+
/workspace/run_logs
|
| 15 |
+
|
| 16 |
+
Top-level packaged repo contents:
|
| 17 |
+
/workspace/pi05tests-openpi-multiarm/.cache
|
| 18 |
+
/workspace/pi05tests-openpi-multiarm/.cache/huggingface
|
| 19 |
+
/workspace/pi05tests-openpi-multiarm/artifacts
|
| 20 |
+
/workspace/pi05tests-openpi-multiarm/artifacts/pi05_base_params
|
| 21 |
+
/workspace/pi05tests-openpi-multiarm/artifacts/twin_handover_packed_parallelization_20260309
|
| 22 |
+
/workspace/pi05tests-openpi-multiarm/openpi
|
| 23 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.dockerignore
|
| 24 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.github
|
| 25 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.gitignore
|
| 26 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.gitmodules
|
| 27 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.pre-commit-config.yaml
|
| 28 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.python-version
|
| 29 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv
|
| 30 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv_partial_1773006322
|
| 31 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.vscode
|
| 32 |
+
/workspace/pi05tests-openpi-multiarm/openpi/CONTRIBUTING.md
|
| 33 |
+
/workspace/pi05tests-openpi-multiarm/openpi/LICENSE
|
| 34 |
+
/workspace/pi05tests-openpi-multiarm/openpi/LICENSE_GEMMA.txt
|
| 35 |
+
/workspace/pi05tests-openpi-multiarm/openpi/README.md
|
| 36 |
+
/workspace/pi05tests-openpi-multiarm/openpi/assets
|
| 37 |
+
/workspace/pi05tests-openpi-multiarm/openpi/checkpoints
|
| 38 |
+
/workspace/pi05tests-openpi-multiarm/openpi/docs
|
| 39 |
+
/workspace/pi05tests-openpi-multiarm/openpi/examples
|
| 40 |
+
/workspace/pi05tests-openpi-multiarm/openpi/packages
|
| 41 |
+
/workspace/pi05tests-openpi-multiarm/openpi/pyproject.toml
|
| 42 |
+
/workspace/pi05tests-openpi-multiarm/openpi/scripts
|
| 43 |
+
/workspace/pi05tests-openpi-multiarm/openpi/src
|
| 44 |
+
/workspace/pi05tests-openpi-multiarm/openpi/uv.lock
|
| 45 |
+
|
| 46 |
+
Selected sizes:
|
| 47 |
+
410G /workspace/pi05tests-openpi-multiarm
|
| 48 |
+
2.9M /workspace/checkpoints/pi05_base_single_pytorch
|
| 49 |
+
2.9M /workspace/checkpoints/pi05_base_parallel_packed_from_single
|
artifacts/twin_handover_packed_parallelization_20260309/metrics/norm_stats_verification.txt
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
path=/workspace/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json
|
| 2 |
+
keys=[actions,state]
|
| 3 |
+
state_mean_len=16 state_std_len=16
|
| 4 |
+
action_mean_len=16 action_std_len=16
|
| 5 |
+
---
|
| 6 |
+
path=/workspace/openpi/assets/pi05_twin_handover_256_packed_parallel_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json
|
| 7 |
+
keys=[actions,state]
|
| 8 |
+
state_mean_len=16 state_std_len=16
|
| 9 |
+
action_mean_len=16 action_std_len=16
|
artifacts/twin_handover_packed_parallelization_20260309/metrics/summary.json
ADDED
|
@@ -0,0 +1,318 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"train": {
|
| 3 |
+
"baseline": {
|
| 4 |
+
"steps": {
|
| 5 |
+
"250": {
|
| 6 |
+
"loss": 0.1975,
|
| 7 |
+
"smoothed_loss": 0.1166,
|
| 8 |
+
"lr": "2.50e-05",
|
| 9 |
+
"grad_norm": 1.0523,
|
| 10 |
+
"max_cuda_memory": "35.23GB"
|
| 11 |
+
},
|
| 12 |
+
"500": {
|
| 13 |
+
"loss": 0.0606,
|
| 14 |
+
"smoothed_loss": 0.0554,
|
| 15 |
+
"lr": "2.35e-05",
|
| 16 |
+
"grad_norm": 1.021,
|
| 17 |
+
"max_cuda_memory": "35.23GB"
|
| 18 |
+
},
|
| 19 |
+
"1000": {
|
| 20 |
+
"loss": 0.0245,
|
| 21 |
+
"smoothed_loss": 0.0387,
|
| 22 |
+
"lr": "1.58e-05",
|
| 23 |
+
"grad_norm": 1.0163,
|
| 24 |
+
"max_cuda_memory": "35.23GB"
|
| 25 |
+
},
|
| 26 |
+
"1500": {
|
| 27 |
+
"loss": 0.0155,
|
| 28 |
+
"smoothed_loss": 0.0331,
|
| 29 |
+
"lr": "6.60e-06",
|
| 30 |
+
"grad_norm": 0.7702,
|
| 31 |
+
"max_cuda_memory": "35.23GB"
|
| 32 |
+
},
|
| 33 |
+
"2000": {
|
| 34 |
+
"loss": 0.0391,
|
| 35 |
+
"smoothed_loss": 0.0278,
|
| 36 |
+
"lr": "2.50e-06",
|
| 37 |
+
"grad_norm": 0.7445,
|
| 38 |
+
"max_cuda_memory": "35.23GB"
|
| 39 |
+
}
|
| 40 |
+
},
|
| 41 |
+
"startup": {
|
| 42 |
+
"config_name": "pi05_twin_handover_256_packed_baseline_pytorch_2k",
|
| 43 |
+
"dataset_repo_id": "lsnu/twin_handover_256_train",
|
| 44 |
+
"norm_stats_file": "/workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json",
|
| 45 |
+
"checkpoint_source": "/workspace/checkpoints/pi05_base_single_pytorch",
|
| 46 |
+
"model_type": "baseline",
|
| 47 |
+
"packed_transforms": "True",
|
| 48 |
+
"world_size": "4",
|
| 49 |
+
"batch_size": "local=4, global=16",
|
| 50 |
+
"num_workers": "8",
|
| 51 |
+
"precision": "bfloat16",
|
| 52 |
+
"lr_schedule": "warmup_steps=200, peak_lr=2.50e-05, decay_steps=2000, decay_lr=2.50e-06",
|
| 53 |
+
"save_log_intervals": "save_interval=250, log_interval=10",
|
| 54 |
+
"action_loss_mask": "(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)",
|
| 55 |
+
"active_mask_dims": "[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]",
|
| 56 |
+
"masked_dims": "[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]",
|
| 57 |
+
"weight_missing_count": "0",
|
| 58 |
+
"weight_unexpected_count": "0",
|
| 59 |
+
"weight_missing_keys": "set()",
|
| 60 |
+
"weight_unexpected_keys": "[]"
|
| 61 |
+
},
|
| 62 |
+
"saves": {
|
| 63 |
+
"250": {
|
| 64 |
+
"timestamp": "00:25:28.986",
|
| 65 |
+
"path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/250"
|
| 66 |
+
},
|
| 67 |
+
"500": {
|
| 68 |
+
"timestamp": "00:29:40.355",
|
| 69 |
+
"path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/500"
|
| 70 |
+
},
|
| 71 |
+
"750": {
|
| 72 |
+
"timestamp": "00:35:01.426",
|
| 73 |
+
"path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/750"
|
| 74 |
+
},
|
| 75 |
+
"1000": {
|
| 76 |
+
"timestamp": "00:39:27.037",
|
| 77 |
+
"path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1000"
|
| 78 |
+
},
|
| 79 |
+
"1250": {
|
| 80 |
+
"timestamp": "00:43:25.467",
|
| 81 |
+
"path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1250"
|
| 82 |
+
},
|
| 83 |
+
"1500": {
|
| 84 |
+
"timestamp": "00:47:39.593",
|
| 85 |
+
"path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1500"
|
| 86 |
+
},
|
| 87 |
+
"1750": {
|
| 88 |
+
"timestamp": "00:51:38.690",
|
| 89 |
+
"path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1750"
|
| 90 |
+
},
|
| 91 |
+
"2000": {
|
| 92 |
+
"timestamp": "00:55:30.655",
|
| 93 |
+
"path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/2000"
|
| 94 |
+
}
|
| 95 |
+
},
|
| 96 |
+
"runtime": "33:27"
|
| 97 |
+
},
|
| 98 |
+
"parallel": {
|
| 99 |
+
"steps": {
|
| 100 |
+
"250": {
|
| 101 |
+
"loss": 0.1894,
|
| 102 |
+
"smoothed_loss": 0.1153,
|
| 103 |
+
"lr": "2.50e-05",
|
| 104 |
+
"grad_norm": 1.0751,
|
| 105 |
+
"max_cuda_memory": "35.27GB"
|
| 106 |
+
},
|
| 107 |
+
"500": {
|
| 108 |
+
"loss": 0.0633,
|
| 109 |
+
"smoothed_loss": 0.0565,
|
| 110 |
+
"lr": "2.35e-05",
|
| 111 |
+
"grad_norm": 1.001,
|
| 112 |
+
"max_cuda_memory": "35.27GB"
|
| 113 |
+
},
|
| 114 |
+
"1000": {
|
| 115 |
+
"loss": 0.0214,
|
| 116 |
+
"smoothed_loss": 0.0392,
|
| 117 |
+
"lr": "1.58e-05",
|
| 118 |
+
"grad_norm": 0.9669,
|
| 119 |
+
"max_cuda_memory": "35.27GB"
|
| 120 |
+
},
|
| 121 |
+
"1500": {
|
| 122 |
+
"loss": 0.0155,
|
| 123 |
+
"smoothed_loss": 0.0331,
|
| 124 |
+
"lr": "6.60e-06",
|
| 125 |
+
"grad_norm": 0.7305,
|
| 126 |
+
"max_cuda_memory": "35.27GB"
|
| 127 |
+
},
|
| 128 |
+
"2000": {
|
| 129 |
+
"loss": 0.0326,
|
| 130 |
+
"smoothed_loss": 0.027,
|
| 131 |
+
"lr": "2.50e-06",
|
| 132 |
+
"grad_norm": 0.735,
|
| 133 |
+
"max_cuda_memory": "35.27GB"
|
| 134 |
+
}
|
| 135 |
+
},
|
| 136 |
+
"startup": {
|
| 137 |
+
"config_name": "pi05_twin_handover_256_packed_parallel_pytorch_2k",
|
| 138 |
+
"dataset_repo_id": "lsnu/twin_handover_256_train",
|
| 139 |
+
"norm_stats_file": "/workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_parallel_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json",
|
| 140 |
+
"checkpoint_source": "/workspace/checkpoints/pi05_base_parallel_packed_from_single",
|
| 141 |
+
"model_type": "parallel",
|
| 142 |
+
"packed_transforms": "True",
|
| 143 |
+
"world_size": "4",
|
| 144 |
+
"batch_size": "local=4, global=16",
|
| 145 |
+
"num_workers": "8",
|
| 146 |
+
"precision": "bfloat16",
|
| 147 |
+
"lr_schedule": "warmup_steps=200, peak_lr=2.50e-05, decay_steps=2000, decay_lr=2.50e-06",
|
| 148 |
+
"save_log_intervals": "save_interval=250, log_interval=10",
|
| 149 |
+
"action_loss_mask": "(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)",
|
| 150 |
+
"active_mask_dims": "[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]",
|
| 151 |
+
"masked_dims": "[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]",
|
| 152 |
+
"weight_missing_count": "0",
|
| 153 |
+
"weight_unexpected_count": "0",
|
| 154 |
+
"weight_missing_keys": "set()",
|
| 155 |
+
"weight_unexpected_keys": "[]"
|
| 156 |
+
},
|
| 157 |
+
"saves": {
|
| 158 |
+
"250": {
|
| 159 |
+
"timestamp": "01:14:12.456",
|
| 160 |
+
"path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/250"
|
| 161 |
+
},
|
| 162 |
+
"500": {
|
| 163 |
+
"timestamp": "01:18:40.916",
|
| 164 |
+
"path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/500"
|
| 165 |
+
},
|
| 166 |
+
"750": {
|
| 167 |
+
"timestamp": "01:22:49.479",
|
| 168 |
+
"path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/750"
|
| 169 |
+
},
|
| 170 |
+
"1000": {
|
| 171 |
+
"timestamp": "01:26:47.884",
|
| 172 |
+
"path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1000"
|
| 173 |
+
},
|
| 174 |
+
"1250": {
|
| 175 |
+
"timestamp": "01:30:56.356",
|
| 176 |
+
"path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1250"
|
| 177 |
+
},
|
| 178 |
+
"1500": {
|
| 179 |
+
"timestamp": "01:34:31.362",
|
| 180 |
+
"path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1500"
|
| 181 |
+
},
|
| 182 |
+
"1750": {
|
| 183 |
+
"timestamp": "01:38:21.550",
|
| 184 |
+
"path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1750"
|
| 185 |
+
},
|
| 186 |
+
"2000": {
|
| 187 |
+
"timestamp": "01:42:18.699",
|
| 188 |
+
"path": "/workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/2000"
|
| 189 |
+
}
|
| 190 |
+
},
|
| 191 |
+
"runtime": "30:38"
|
| 192 |
+
}
|
| 193 |
+
},
|
| 194 |
+
"val": {
|
| 195 |
+
"baseline_1000": {
|
| 196 |
+
"checkpoint_path": "/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1000",
|
| 197 |
+
"repo_id_used": "lsnu/twin_handover_256_val",
|
| 198 |
+
"num_batches": 50,
|
| 199 |
+
"mean_val_loss": 0.052885,
|
| 200 |
+
"std_val_loss": 0.032533,
|
| 201 |
+
"timing": "mean=0.3108 std=0.1375 min=0.2230 max=1.1986",
|
| 202 |
+
"active_mask_dims": "[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]",
|
| 203 |
+
"masked_dims": "[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]",
|
| 204 |
+
"weight_loading_missing_keys": "[]",
|
| 205 |
+
"weight_loading_unexpected_keys": "[]"
|
| 206 |
+
},
|
| 207 |
+
"baseline_2000": {
|
| 208 |
+
"checkpoint_path": "/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/2000",
|
| 209 |
+
"repo_id_used": "lsnu/twin_handover_256_val",
|
| 210 |
+
"num_batches": 100,
|
| 211 |
+
"mean_val_loss": 0.035776,
|
| 212 |
+
"std_val_loss": 0.027648,
|
| 213 |
+
"timing": "mean=0.2587 std=0.1111 min=0.2224 max=1.2881",
|
| 214 |
+
"active_mask_dims": "[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]",
|
| 215 |
+
"masked_dims": "[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]",
|
| 216 |
+
"weight_loading_missing_keys": "[]",
|
| 217 |
+
"weight_loading_unexpected_keys": "[]"
|
| 218 |
+
},
|
| 219 |
+
"parallel_1000": {
|
| 220 |
+
"checkpoint_path": "/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1000",
|
| 221 |
+
"repo_id_used": "lsnu/twin_handover_256_val",
|
| 222 |
+
"num_batches": 50,
|
| 223 |
+
"mean_val_loss": 0.051214,
|
| 224 |
+
"std_val_loss": 0.028985,
|
| 225 |
+
"timing": "mean=0.2468 std=0.0900 min=0.2211 max=0.8606",
|
| 226 |
+
"active_mask_dims": "[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]",
|
| 227 |
+
"masked_dims": "[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]",
|
| 228 |
+
"weight_loading_missing_keys": "[]",
|
| 229 |
+
"weight_loading_unexpected_keys": "[]"
|
| 230 |
+
},
|
| 231 |
+
"parallel_2000": {
|
| 232 |
+
"checkpoint_path": "/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/2000",
|
| 233 |
+
"repo_id_used": "lsnu/twin_handover_256_val",
|
| 234 |
+
"num_batches": 100,
|
| 235 |
+
"mean_val_loss": 0.03568,
|
| 236 |
+
"std_val_loss": 0.026077,
|
| 237 |
+
"timing": "mean=0.2366 std=0.0593 min=0.2215 max=0.8235",
|
| 238 |
+
"active_mask_dims": "[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]",
|
| 239 |
+
"masked_dims": "[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]",
|
| 240 |
+
"weight_loading_missing_keys": "[]",
|
| 241 |
+
"weight_loading_unexpected_keys": "[]"
|
| 242 |
+
}
|
| 243 |
+
},
|
| 244 |
+
"wallclock": {
|
| 245 |
+
"followup_start_utc": "2026-03-09 00:31:32 UTC",
|
| 246 |
+
"followup_end_utc": "2026-03-09 01:49:19 UTC",
|
| 247 |
+
"pipeline_wallclock_from_baseline_start_to_final_val": "01:32:29",
|
| 248 |
+
"followup_runner_wallclock": "01:17:47",
|
| 249 |
+
"baseline_train_runtime": "33:27",
|
| 250 |
+
"parallel_train_runtime": "30:38",
|
| 251 |
+
"baseline_val_1000_runtime": "00:05:14",
|
| 252 |
+
"baseline_val_2000_runtime": "00:05:19",
|
| 253 |
+
"parallel_val_1000_runtime": "00:03:23",
|
| 254 |
+
"parallel_val_2000_runtime": "00:03:33"
|
| 255 |
+
},
|
| 256 |
+
"changed_files": [
|
| 257 |
+
{
|
| 258 |
+
"path": "openpi/src/openpi/transforms.py",
|
| 259 |
+
"description": "added PackPerArmBlocks and UnpackPerArmBlocks for semantic TWIN per-arm block packing"
|
| 260 |
+
},
|
| 261 |
+
{
|
| 262 |
+
"path": "openpi/src/openpi/training/config.py",
|
| 263 |
+
"description": "added packed TWIN model transforms, action_loss_mask, and 2K baseline/parallel configs"
|
| 264 |
+
},
|
| 265 |
+
{
|
| 266 |
+
"path": "openpi/src/openpi/training/data_loader.py",
|
| 267 |
+
"description": "added set_epoch and local dataset mirror handling / loader startup fixes"
|
| 268 |
+
},
|
| 269 |
+
{
|
| 270 |
+
"path": "openpi/src/openpi/models/model.py",
|
| 271 |
+
"description": "made pi0_pytorch import lazy"
|
| 272 |
+
},
|
| 273 |
+
{
|
| 274 |
+
"path": "openpi/src/openpi/models/tokenizer.py",
|
| 275 |
+
"description": "made AutoProcessor import lazy"
|
| 276 |
+
},
|
| 277 |
+
{
|
| 278 |
+
"path": "openpi/src/openpi/models_pytorch/pi0_pytorch.py",
|
| 279 |
+
"description": "disabled unconditional sample_actions torch.compile by default"
|
| 280 |
+
},
|
| 281 |
+
{
|
| 282 |
+
"path": "openpi/scripts/train_pytorch.py",
|
| 283 |
+
"description": "added startup logging, masked action loss, debug logging, and DDP/startup fixes"
|
| 284 |
+
},
|
| 285 |
+
{
|
| 286 |
+
"path": "openpi/scripts/eval_twin_val_loss_pytorch.py",
|
| 287 |
+
"description": "added masked val loss evaluation with configurable batches/workers and startup prints"
|
| 288 |
+
},
|
| 289 |
+
{
|
| 290 |
+
"path": "openpi/scripts/init_parallel_pi05_from_single_pytorch.py",
|
| 291 |
+
"description": "added exact packed parallel warm-start initialization from single-head checkpoint"
|
| 292 |
+
},
|
| 293 |
+
{
|
| 294 |
+
"path": "openpi/scripts/inspect_twin_packed_batch.py",
|
| 295 |
+
"description": "added packed batch inspection / zero-padding verification"
|
| 296 |
+
},
|
| 297 |
+
{
|
| 298 |
+
"path": "openpi/scripts/run_twin_handover_packed_followup.sh",
|
| 299 |
+
"description": "added detached follow-up automation for val passes and parallel launch"
|
| 300 |
+
},
|
| 301 |
+
{
|
| 302 |
+
"path": "openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json",
|
| 303 |
+
"description": "copied handover train norm stats for packed baseline config"
|
| 304 |
+
},
|
| 305 |
+
{
|
| 306 |
+
"path": "openpi/assets/pi05_twin_handover_256_packed_parallel_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json",
|
| 307 |
+
"description": "copied handover train norm stats for packed parallel config"
|
| 308 |
+
},
|
| 309 |
+
{
|
| 310 |
+
"path": "README.md",
|
| 311 |
+
"description": "new repo-level experiment summary for the uploaded artifact bundle"
|
| 312 |
+
},
|
| 313 |
+
{
|
| 314 |
+
"path": "REPORT.md",
|
| 315 |
+
"description": "new detailed experiment report tying outcomes to code and artifacts"
|
| 316 |
+
}
|
| 317 |
+
]
|
| 318 |
+
}
|
artifacts/twin_handover_packed_parallelization_20260309/metrics/train_loss_table.csv
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
model,step,loss,smoothed_loss,lr,grad_norm,max_cuda_memory
|
| 2 |
+
baseline,250,0.1975,0.1166,2.50e-05,1.0523,35.23GB
|
| 3 |
+
baseline,500,0.0606,0.0554,2.35e-05,1.021,35.23GB
|
| 4 |
+
baseline,1000,0.0245,0.0387,1.58e-05,1.0163,35.23GB
|
| 5 |
+
baseline,1500,0.0155,0.0331,6.60e-06,0.7702,35.23GB
|
| 6 |
+
baseline,2000,0.0391,0.0278,2.50e-06,0.7445,35.23GB
|
| 7 |
+
parallel,250,0.1894,0.1153,2.50e-05,1.0751,35.27GB
|
| 8 |
+
parallel,500,0.0633,0.0565,2.35e-05,1.001,35.27GB
|
| 9 |
+
parallel,1000,0.0214,0.0392,1.58e-05,0.9669,35.27GB
|
| 10 |
+
parallel,1500,0.0155,0.0331,6.60e-06,0.7305,35.27GB
|
| 11 |
+
parallel,2000,0.0326,0.027,2.50e-06,0.735,35.27GB
|
artifacts/twin_handover_packed_parallelization_20260309/metrics/val_loss_table.csv
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
model,checkpoint_step,num_batches,mean_val_loss,std_val_loss,timing
|
| 2 |
+
baseline,1000,50,0.052885,0.032533,mean=0.3108 std=0.1375 min=0.2230 max=1.1986
|
| 3 |
+
baseline,2000,100,0.035776,0.027648,mean=0.2587 std=0.1111 min=0.2224 max=1.2881
|
| 4 |
+
parallel,1000,50,0.051214,0.028985,mean=0.2468 std=0.0900 min=0.2211 max=0.8606
|
| 5 |
+
parallel,2000,100,0.03568,0.026077,mean=0.2366 std=0.0593 min=0.2215 max=0.8235
|
artifacts/twin_handover_packed_parallelization_20260309/repro/changed_files.txt
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
openpi/src/openpi/transforms.py added PackPerArmBlocks and UnpackPerArmBlocks for semantic TWIN per-arm block packing
|
| 2 |
+
openpi/src/openpi/training/config.py added packed TWIN model transforms, action_loss_mask, and 2K baseline/parallel configs
|
| 3 |
+
openpi/src/openpi/training/data_loader.py added set_epoch and local dataset mirror handling / loader startup fixes
|
| 4 |
+
openpi/src/openpi/models/model.py made pi0_pytorch import lazy
|
| 5 |
+
openpi/src/openpi/models/tokenizer.py made AutoProcessor import lazy
|
| 6 |
+
openpi/src/openpi/models_pytorch/pi0_pytorch.py disabled unconditional sample_actions torch.compile by default
|
| 7 |
+
openpi/scripts/train_pytorch.py added startup logging, masked action loss, debug logging, and DDP/startup fixes
|
| 8 |
+
openpi/scripts/eval_twin_val_loss_pytorch.py added masked val loss evaluation with configurable batches/workers and startup prints
|
| 9 |
+
openpi/scripts/init_parallel_pi05_from_single_pytorch.py added exact packed parallel warm-start initialization from single-head checkpoint
|
| 10 |
+
openpi/scripts/inspect_twin_packed_batch.py added packed batch inspection / zero-padding verification
|
| 11 |
+
openpi/scripts/run_twin_handover_packed_followup.sh added detached follow-up automation for val passes and parallel launch
|
| 12 |
+
openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json copied handover train norm stats for packed baseline config
|
| 13 |
+
openpi/assets/pi05_twin_handover_256_packed_parallel_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json copied handover train norm stats for packed parallel config
|
| 14 |
+
README.md new repo-level experiment summary for the uploaded artifact bundle
|
| 15 |
+
REPORT.md new detailed experiment report tying outcomes to code and artifacts
|
artifacts/twin_handover_packed_parallelization_20260309/repro/checkpoint_locations.txt
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/workspace/checkpoints/pi05_base_single_pytorch
|
| 2 |
+
/workspace/checkpoints/pi05_base_parallel_packed_from_single
|
| 3 |
+
/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1000
|
| 4 |
+
/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/2000
|
| 5 |
+
/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1000
|
| 6 |
+
/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/2000
|
artifacts/twin_handover_packed_parallelization_20260309/repro/commands_reproduce.sh
ADDED
|
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
set -euo pipefail
|
| 3 |
+
export HF_HOME=/workspace/.hf
|
| 4 |
+
export HF_HUB_CACHE=/workspace/.hf/hub
|
| 5 |
+
export HF_DATASETS_CACHE=/workspace/.hf/datasets
|
| 6 |
+
export HUGGINGFACE_HUB_CACHE=/workspace/.hf/hub
|
| 7 |
+
export XDG_CACHE_HOME=/workspace/.cache
|
| 8 |
+
export OPENPI_LEROBOT_HOME=/workspace/lerobot
|
| 9 |
+
export OPENPI_TORCH_COMPILE_SAMPLE_ACTIONS=0
|
| 10 |
+
export TOKENIZERS_PARALLELISM=false
|
| 11 |
+
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
|
| 12 |
+
cd /workspace/openpi
|
| 13 |
+
source .venv/bin/activate
|
| 14 |
+
python scripts/inspect_twin_packed_batch.py --config_name pi05_twin_handover_256_packed_baseline_pytorch_2k --repo_id lsnu/twin_handover_256_train
|
| 15 |
+
python examples/convert_jax_model_to_pytorch.py --checkpoint_dir /workspace/pi05tests-openpi-multiarm/artifacts/pi05_base_params --config_name pi05_twin_handover_256_packed_baseline_pytorch_2k --output_path /workspace/checkpoints/pi05_base_single_pytorch --precision bfloat16
|
| 16 |
+
python scripts/init_parallel_pi05_from_single_pytorch.py --single_ckpt /workspace/checkpoints/pi05_base_single_pytorch --config_name pi05_twin_handover_256_packed_parallel_pytorch_2k --output_path /workspace/checkpoints/pi05_base_parallel_packed_from_single
|
| 17 |
+
torchrun --standalone --nproc_per_node=4 scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k --overwrite
|
| 18 |
+
python scripts/eval_twin_val_loss_pytorch.py --config_name pi05_twin_handover_256_packed_baseline_pytorch_2k --checkpoint_dir /workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1000 --repo_id lsnu/twin_handover_256_val --num_batches 50 --num_workers 0
|
| 19 |
+
python scripts/eval_twin_val_loss_pytorch.py --config_name pi05_twin_handover_256_packed_baseline_pytorch_2k --checkpoint_dir /workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/2000 --repo_id lsnu/twin_handover_256_val --num_batches 100 --num_workers 0
|
| 20 |
+
torchrun --standalone --nproc_per_node=4 scripts/train_pytorch.py pi05_twin_handover_256_packed_parallel_pytorch_2k --exp_name handover_packed_parallel_2k --overwrite
|
| 21 |
+
python scripts/eval_twin_val_loss_pytorch.py --config_name pi05_twin_handover_256_packed_parallel_pytorch_2k --checkpoint_dir /workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1000 --repo_id lsnu/twin_handover_256_val --num_batches 50 --num_workers 0
|
| 22 |
+
python scripts/eval_twin_val_loss_pytorch.py --config_name pi05_twin_handover_256_packed_parallel_pytorch_2k --checkpoint_dir /workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/2000 --repo_id lsnu/twin_handover_256_val --num_batches 100 --num_workers 0
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/detach_test.log
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
hi
|
| 2 |
+
bye
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k.log
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k_val_1000.log
ADDED
|
@@ -0,0 +1,66 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
starting_eval config=pi05_twin_handover_256_packed_baseline_pytorch_2k checkpoint=/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1000 repo_id=lsnu/twin_handover_256_val
|
| 2 |
+
eval_loader batch_size=16 num_batches=50 num_workers=0
|
| 3 |
+
|
| 4 |
+
|
| 5 |
+
weight_loading missing=0 unexpected=0 device=cuda:0
|
| 6 |
+
eval_batch=1 loss=0.031020 batch_time_s=1.1986
|
| 7 |
+
eval_batch=2 loss=0.016421 batch_time_s=0.2400
|
| 8 |
+
eval_batch=3 loss=0.019009 batch_time_s=0.2371
|
| 9 |
+
eval_batch=4 loss=0.058900 batch_time_s=0.2230
|
| 10 |
+
eval_batch=5 loss=0.039465 batch_time_s=0.2257
|
| 11 |
+
eval_batch=6 loss=0.061871 batch_time_s=0.3408
|
| 12 |
+
eval_batch=7 loss=0.039355 batch_time_s=0.2552
|
| 13 |
+
eval_batch=8 loss=0.013108 batch_time_s=0.3001
|
| 14 |
+
eval_batch=9 loss=0.037281 batch_time_s=0.3122
|
| 15 |
+
eval_batch=10 loss=0.062332 batch_time_s=0.2296
|
| 16 |
+
eval_batch=11 loss=0.026757 batch_time_s=0.2320
|
| 17 |
+
eval_batch=12 loss=0.043025 batch_time_s=0.2359
|
| 18 |
+
eval_batch=13 loss=0.047591 batch_time_s=0.2317
|
| 19 |
+
eval_batch=14 loss=0.046923 batch_time_s=0.2352
|
| 20 |
+
eval_batch=15 loss=0.048440 batch_time_s=0.3084
|
| 21 |
+
eval_batch=16 loss=0.074316 batch_time_s=0.2294
|
| 22 |
+
eval_batch=17 loss=0.068891 batch_time_s=0.2512
|
| 23 |
+
eval_batch=18 loss=0.053325 batch_time_s=0.3206
|
| 24 |
+
eval_batch=19 loss=0.035644 batch_time_s=0.3163
|
| 25 |
+
eval_batch=20 loss=0.025946 batch_time_s=0.2289
|
| 26 |
+
eval_batch=21 loss=0.048144 batch_time_s=0.2838
|
| 27 |
+
eval_batch=22 loss=0.081570 batch_time_s=0.3150
|
| 28 |
+
eval_batch=23 loss=0.062998 batch_time_s=0.3382
|
| 29 |
+
eval_batch=24 loss=0.078956 batch_time_s=0.3765
|
| 30 |
+
eval_batch=25 loss=0.045697 batch_time_s=0.3072
|
| 31 |
+
eval_batch=26 loss=0.020523 batch_time_s=0.2988
|
| 32 |
+
eval_batch=27 loss=0.035404 batch_time_s=0.3281
|
| 33 |
+
eval_batch=28 loss=0.039222 batch_time_s=0.3669
|
| 34 |
+
eval_batch=29 loss=0.053275 batch_time_s=0.3338
|
| 35 |
+
eval_batch=30 loss=0.053682 batch_time_s=0.2773
|
| 36 |
+
eval_batch=31 loss=0.124611 batch_time_s=0.3229
|
| 37 |
+
eval_batch=32 loss=0.093004 batch_time_s=0.3327
|
| 38 |
+
eval_batch=33 loss=0.100326 batch_time_s=0.3062
|
| 39 |
+
eval_batch=34 loss=0.068221 batch_time_s=0.5203
|
| 40 |
+
eval_batch=35 loss=0.067222 batch_time_s=0.3190
|
| 41 |
+
eval_batch=36 loss=0.047065 batch_time_s=0.2393
|
| 42 |
+
eval_batch=37 loss=0.019016 batch_time_s=0.2778
|
| 43 |
+
eval_batch=38 loss=0.048523 batch_time_s=0.3234
|
| 44 |
+
eval_batch=39 loss=0.075579 batch_time_s=0.2905
|
| 45 |
+
eval_batch=40 loss=0.049607 batch_time_s=0.2612
|
| 46 |
+
eval_batch=41 loss=0.047019 batch_time_s=0.3323
|
| 47 |
+
eval_batch=42 loss=0.035811 batch_time_s=0.3344
|
| 48 |
+
eval_batch=43 loss=0.021360 batch_time_s=0.3128
|
| 49 |
+
eval_batch=44 loss=0.019255 batch_time_s=0.2885
|
| 50 |
+
eval_batch=45 loss=0.022715 batch_time_s=0.3116
|
| 51 |
+
eval_batch=46 loss=0.024246 batch_time_s=0.3442
|
| 52 |
+
eval_batch=47 loss=0.077525 batch_time_s=0.2601
|
| 53 |
+
eval_batch=48 loss=0.207067 batch_time_s=0.3068
|
| 54 |
+
eval_batch=49 loss=0.033557 batch_time_s=0.2332
|
| 55 |
+
eval_batch=50 loss=0.093434 batch_time_s=0.2469
|
| 56 |
+
config_name: pi05_twin_handover_256_packed_baseline_pytorch_2k
|
| 57 |
+
checkpoint_path: /workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1000
|
| 58 |
+
repo_id_used: lsnu/twin_handover_256_val
|
| 59 |
+
num_batches: 50
|
| 60 |
+
mean_val_loss: 0.052885
|
| 61 |
+
std_val_loss: 0.032533
|
| 62 |
+
per_batch_timing_seconds: mean=0.3108 std=0.1375 min=0.2230 max=1.1986
|
| 63 |
+
active_mask_dims: [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]
|
| 64 |
+
masked_dims: [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]
|
| 65 |
+
weight_loading_missing_keys: []
|
| 66 |
+
weight_loading_unexpected_keys: []
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_baseline_2k_val_2000.log
ADDED
|
@@ -0,0 +1,114 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
starting_eval config=pi05_twin_handover_256_packed_baseline_pytorch_2k checkpoint=/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/2000 repo_id=lsnu/twin_handover_256_val
|
| 2 |
+
eval_loader batch_size=16 num_batches=100 num_workers=0
|
| 3 |
+
weight_loading missing=0 unexpected=0 device=cuda:0
|
| 4 |
+
eval_batch=1 loss=0.019216 batch_time_s=1.2881
|
| 5 |
+
eval_batch=2 loss=0.013719 batch_time_s=0.2542
|
| 6 |
+
eval_batch=3 loss=0.012779 batch_time_s=0.2498
|
| 7 |
+
eval_batch=4 loss=0.026855 batch_time_s=0.2422
|
| 8 |
+
eval_batch=5 loss=0.023092 batch_time_s=0.2363
|
| 9 |
+
eval_batch=6 loss=0.063545 batch_time_s=0.2285
|
| 10 |
+
eval_batch=7 loss=0.035285 batch_time_s=0.2961
|
| 11 |
+
eval_batch=8 loss=0.014463 batch_time_s=0.2318
|
| 12 |
+
eval_batch=9 loss=0.029309 batch_time_s=0.2403
|
| 13 |
+
eval_batch=10 loss=0.043977 batch_time_s=0.2449
|
| 14 |
+
eval_batch=11 loss=0.024810 batch_time_s=0.2426
|
| 15 |
+
eval_batch=12 loss=0.031340 batch_time_s=0.2310
|
| 16 |
+
eval_batch=13 loss=0.038825 batch_time_s=0.3180
|
| 17 |
+
eval_batch=14 loss=0.036152 batch_time_s=0.2432
|
| 18 |
+
eval_batch=15 loss=0.034914 batch_time_s=0.3352
|
| 19 |
+
eval_batch=16 loss=0.053971 batch_time_s=0.2680
|
| 20 |
+
eval_batch=17 loss=0.031400 batch_time_s=0.2827
|
| 21 |
+
eval_batch=18 loss=0.040505 batch_time_s=0.2913
|
| 22 |
+
eval_batch=19 loss=0.016300 batch_time_s=0.2329
|
| 23 |
+
eval_batch=20 loss=0.023962 batch_time_s=0.2303
|
| 24 |
+
eval_batch=21 loss=0.034431 batch_time_s=0.2705
|
| 25 |
+
eval_batch=22 loss=0.056853 batch_time_s=0.2979
|
| 26 |
+
eval_batch=23 loss=0.038143 batch_time_s=0.2601
|
| 27 |
+
eval_batch=24 loss=0.075043 batch_time_s=0.3020
|
| 28 |
+
eval_batch=25 loss=0.058564 batch_time_s=0.5796
|
| 29 |
+
eval_batch=26 loss=0.032481 batch_time_s=0.2340
|
| 30 |
+
eval_batch=27 loss=0.035333 batch_time_s=0.2701
|
| 31 |
+
eval_batch=28 loss=0.042256 batch_time_s=0.3069
|
| 32 |
+
eval_batch=29 loss=0.067687 batch_time_s=0.2336
|
| 33 |
+
eval_batch=30 loss=0.048997 batch_time_s=0.2917
|
| 34 |
+
eval_batch=31 loss=0.119097 batch_time_s=0.2272
|
| 35 |
+
eval_batch=32 loss=0.060042 batch_time_s=0.2282
|
| 36 |
+
eval_batch=33 loss=0.058640 batch_time_s=0.2405
|
| 37 |
+
eval_batch=34 loss=0.062960 batch_time_s=0.2298
|
| 38 |
+
eval_batch=35 loss=0.052300 batch_time_s=0.2224
|
| 39 |
+
eval_batch=36 loss=0.036295 batch_time_s=0.2275
|
| 40 |
+
eval_batch=37 loss=0.025163 batch_time_s=0.2301
|
| 41 |
+
eval_batch=38 loss=0.032151 batch_time_s=0.2865
|
| 42 |
+
eval_batch=39 loss=0.052523 batch_time_s=0.2395
|
| 43 |
+
eval_batch=40 loss=0.017417 batch_time_s=0.2338
|
| 44 |
+
eval_batch=41 loss=0.028829 batch_time_s=0.2308
|
| 45 |
+
eval_batch=42 loss=0.031216 batch_time_s=0.2330
|
| 46 |
+
eval_batch=43 loss=0.005192 batch_time_s=0.2345
|
| 47 |
+
eval_batch=44 loss=0.011528 batch_time_s=0.2308
|
| 48 |
+
eval_batch=45 loss=0.046379 batch_time_s=0.2311
|
| 49 |
+
eval_batch=46 loss=0.026113 batch_time_s=0.2280
|
| 50 |
+
eval_batch=47 loss=0.093653 batch_time_s=0.2313
|
| 51 |
+
eval_batch=48 loss=0.219696 batch_time_s=0.2301
|
| 52 |
+
eval_batch=49 loss=0.021639 batch_time_s=0.2477
|
| 53 |
+
eval_batch=50 loss=0.062274 batch_time_s=0.2299
|
| 54 |
+
eval_batch=51 loss=0.043294 batch_time_s=0.2282
|
| 55 |
+
eval_batch=52 loss=0.020800 batch_time_s=0.2402
|
| 56 |
+
eval_batch=53 loss=0.017962 batch_time_s=0.2315
|
| 57 |
+
eval_batch=54 loss=0.011119 batch_time_s=0.2258
|
| 58 |
+
eval_batch=55 loss=0.022601 batch_time_s=0.2330
|
| 59 |
+
eval_batch=56 loss=0.063293 batch_time_s=0.2378
|
| 60 |
+
eval_batch=57 loss=0.033958 batch_time_s=0.2375
|
| 61 |
+
eval_batch=58 loss=0.025469 batch_time_s=0.2294
|
| 62 |
+
eval_batch=59 loss=0.019972 batch_time_s=0.2376
|
| 63 |
+
eval_batch=60 loss=0.004765 batch_time_s=0.2354
|
| 64 |
+
eval_batch=61 loss=0.014635 batch_time_s=0.2449
|
| 65 |
+
eval_batch=62 loss=0.006239 batch_time_s=0.2288
|
| 66 |
+
eval_batch=63 loss=0.041332 batch_time_s=0.2520
|
| 67 |
+
eval_batch=64 loss=0.016763 batch_time_s=0.2517
|
| 68 |
+
eval_batch=65 loss=0.028758 batch_time_s=0.2447
|
| 69 |
+
eval_batch=66 loss=0.026301 batch_time_s=0.2312
|
| 70 |
+
eval_batch=67 loss=0.014657 batch_time_s=0.2353
|
| 71 |
+
eval_batch=68 loss=0.043065 batch_time_s=0.2276
|
| 72 |
+
eval_batch=69 loss=0.048954 batch_time_s=0.2282
|
| 73 |
+
eval_batch=70 loss=0.047917 batch_time_s=0.2359
|
| 74 |
+
eval_batch=71 loss=0.013441 batch_time_s=0.2318
|
| 75 |
+
eval_batch=72 loss=0.023035 batch_time_s=0.2453
|
| 76 |
+
eval_batch=73 loss=0.024245 batch_time_s=0.2530
|
| 77 |
+
eval_batch=74 loss=0.021810 batch_time_s=0.2387
|
| 78 |
+
eval_batch=75 loss=0.016290 batch_time_s=0.2281
|
| 79 |
+
eval_batch=76 loss=0.019809 batch_time_s=0.2320
|
| 80 |
+
eval_batch=77 loss=0.016700 batch_time_s=0.2462
|
| 81 |
+
eval_batch=78 loss=0.049874 batch_time_s=0.2369
|
| 82 |
+
eval_batch=79 loss=0.065255 batch_time_s=0.2548
|
| 83 |
+
eval_batch=80 loss=0.077142 batch_time_s=0.2906
|
| 84 |
+
eval_batch=81 loss=0.059736 batch_time_s=0.3057
|
| 85 |
+
eval_batch=82 loss=0.011131 batch_time_s=0.2359
|
| 86 |
+
eval_batch=83 loss=0.016865 batch_time_s=0.2454
|
| 87 |
+
eval_batch=84 loss=0.007890 batch_time_s=0.2386
|
| 88 |
+
eval_batch=85 loss=0.044606 batch_time_s=0.2352
|
| 89 |
+
eval_batch=86 loss=0.014035 batch_time_s=0.2365
|
| 90 |
+
eval_batch=87 loss=0.020954 batch_time_s=0.2419
|
| 91 |
+
eval_batch=88 loss=0.042758 batch_time_s=0.2262
|
| 92 |
+
eval_batch=89 loss=0.019468 batch_time_s=0.2352
|
| 93 |
+
eval_batch=90 loss=0.004773 batch_time_s=0.2292
|
| 94 |
+
eval_batch=91 loss=0.005070 batch_time_s=0.2296
|
| 95 |
+
eval_batch=92 loss=0.007161 batch_time_s=0.2291
|
| 96 |
+
eval_batch=93 loss=0.026996 batch_time_s=0.2361
|
| 97 |
+
eval_batch=94 loss=0.011121 batch_time_s=0.2456
|
| 98 |
+
eval_batch=95 loss=0.041840 batch_time_s=0.2409
|
| 99 |
+
eval_batch=96 loss=0.054416 batch_time_s=0.2333
|
| 100 |
+
eval_batch=97 loss=0.024979 batch_time_s=0.2276
|
| 101 |
+
eval_batch=98 loss=0.062096 batch_time_s=0.2403
|
| 102 |
+
eval_batch=99 loss=0.032598 batch_time_s=0.2326
|
| 103 |
+
eval_batch=100 loss=0.022353 batch_time_s=0.2274
|
| 104 |
+
config_name: pi05_twin_handover_256_packed_baseline_pytorch_2k
|
| 105 |
+
checkpoint_path: /workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/2000
|
| 106 |
+
repo_id_used: lsnu/twin_handover_256_val
|
| 107 |
+
num_batches: 100
|
| 108 |
+
mean_val_loss: 0.035776
|
| 109 |
+
std_val_loss: 0.027648
|
| 110 |
+
per_batch_timing_seconds: mean=0.2587 std=0.1111 min=0.2224 max=1.2881
|
| 111 |
+
active_mask_dims: [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]
|
| 112 |
+
masked_dims: [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]
|
| 113 |
+
weight_loading_missing_keys: []
|
| 114 |
+
weight_loading_unexpected_keys: []
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k.log
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k_val_1000.log
ADDED
|
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
starting_eval config=pi05_twin_handover_256_packed_parallel_pytorch_2k checkpoint=/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1000 repo_id=lsnu/twin_handover_256_val
|
| 2 |
+
eval_loader batch_size=16 num_batches=50 num_workers=0
|
| 3 |
+
weight_loading missing=0 unexpected=0 device=cuda:0
|
| 4 |
+
eval_batch=1 loss=0.039282 batch_time_s=0.8606
|
| 5 |
+
eval_batch=2 loss=0.059935 batch_time_s=0.2233
|
| 6 |
+
eval_batch=3 loss=0.029645 batch_time_s=0.2237
|
| 7 |
+
eval_batch=4 loss=0.030436 batch_time_s=0.2312
|
| 8 |
+
eval_batch=5 loss=0.029398 batch_time_s=0.2255
|
| 9 |
+
eval_batch=6 loss=0.046098 batch_time_s=0.2291
|
| 10 |
+
eval_batch=7 loss=0.031397 batch_time_s=0.2243
|
| 11 |
+
eval_batch=8 loss=0.013987 batch_time_s=0.2256
|
| 12 |
+
eval_batch=9 loss=0.046950 batch_time_s=0.3194
|
| 13 |
+
eval_batch=10 loss=0.055185 batch_time_s=0.2211
|
| 14 |
+
eval_batch=11 loss=0.045538 batch_time_s=0.2270
|
| 15 |
+
eval_batch=12 loss=0.034314 batch_time_s=0.2221
|
| 16 |
+
eval_batch=13 loss=0.053436 batch_time_s=0.2306
|
| 17 |
+
eval_batch=14 loss=0.048917 batch_time_s=0.2322
|
| 18 |
+
eval_batch=15 loss=0.059734 batch_time_s=0.2346
|
| 19 |
+
eval_batch=16 loss=0.072608 batch_time_s=0.2275
|
| 20 |
+
eval_batch=17 loss=0.071442 batch_time_s=0.2257
|
| 21 |
+
eval_batch=18 loss=0.056916 batch_time_s=0.2247
|
| 22 |
+
eval_batch=19 loss=0.025555 batch_time_s=0.2238
|
| 23 |
+
eval_batch=20 loss=0.031001 batch_time_s=0.2557
|
| 24 |
+
eval_batch=21 loss=0.054189 batch_time_s=0.2259
|
| 25 |
+
eval_batch=22 loss=0.046724 batch_time_s=0.2544
|
| 26 |
+
eval_batch=23 loss=0.048790 batch_time_s=0.2389
|
| 27 |
+
eval_batch=24 loss=0.073533 batch_time_s=0.2283
|
| 28 |
+
eval_batch=25 loss=0.060645 batch_time_s=0.2387
|
| 29 |
+
eval_batch=26 loss=0.020740 batch_time_s=0.2323
|
| 30 |
+
eval_batch=27 loss=0.027174 batch_time_s=0.2226
|
| 31 |
+
eval_batch=28 loss=0.030402 batch_time_s=0.2211
|
| 32 |
+
eval_batch=29 loss=0.037136 batch_time_s=0.2303
|
| 33 |
+
eval_batch=30 loss=0.057298 batch_time_s=0.2221
|
| 34 |
+
eval_batch=31 loss=0.133256 batch_time_s=0.2228
|
| 35 |
+
eval_batch=32 loss=0.081425 batch_time_s=0.2285
|
| 36 |
+
eval_batch=33 loss=0.101147 batch_time_s=0.2291
|
| 37 |
+
eval_batch=34 loss=0.084155 batch_time_s=0.2763
|
| 38 |
+
eval_batch=35 loss=0.050369 batch_time_s=0.2300
|
| 39 |
+
eval_batch=36 loss=0.037849 batch_time_s=0.2228
|
| 40 |
+
eval_batch=37 loss=0.016911 batch_time_s=0.2211
|
| 41 |
+
eval_batch=38 loss=0.035706 batch_time_s=0.2215
|
| 42 |
+
eval_batch=39 loss=0.074094 batch_time_s=0.2247
|
| 43 |
+
eval_batch=40 loss=0.031583 batch_time_s=0.2256
|
| 44 |
+
eval_batch=41 loss=0.063281 batch_time_s=0.2345
|
| 45 |
+
eval_batch=42 loss=0.034781 batch_time_s=0.2247
|
| 46 |
+
eval_batch=43 loss=0.021991 batch_time_s=0.3036
|
| 47 |
+
eval_batch=44 loss=0.006788 batch_time_s=0.2310
|
| 48 |
+
eval_batch=45 loss=0.029891 batch_time_s=0.2888
|
| 49 |
+
eval_batch=46 loss=0.024711 batch_time_s=0.2320
|
| 50 |
+
eval_batch=47 loss=0.139781 batch_time_s=0.2281
|
| 51 |
+
eval_batch=48 loss=0.129609 batch_time_s=0.2421
|
| 52 |
+
eval_batch=49 loss=0.039653 batch_time_s=0.2222
|
| 53 |
+
eval_batch=50 loss=0.085291 batch_time_s=0.2304
|
| 54 |
+
config_name: pi05_twin_handover_256_packed_parallel_pytorch_2k
|
| 55 |
+
checkpoint_path: /workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1000
|
| 56 |
+
repo_id_used: lsnu/twin_handover_256_val
|
| 57 |
+
num_batches: 50
|
| 58 |
+
mean_val_loss: 0.051214
|
| 59 |
+
std_val_loss: 0.028985
|
| 60 |
+
per_batch_timing_seconds: mean=0.2468 std=0.0900 min=0.2211 max=0.8606
|
| 61 |
+
active_mask_dims: [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]
|
| 62 |
+
masked_dims: [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]
|
| 63 |
+
weight_loading_missing_keys: []
|
| 64 |
+
weight_loading_unexpected_keys: []
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/handover_packed_parallel_2k_val_2000.log
ADDED
|
@@ -0,0 +1,114 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
starting_eval config=pi05_twin_handover_256_packed_parallel_pytorch_2k checkpoint=/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/2000 repo_id=lsnu/twin_handover_256_val
|
| 2 |
+
eval_loader batch_size=16 num_batches=100 num_workers=0
|
| 3 |
+
weight_loading missing=0 unexpected=0 device=cuda:0
|
| 4 |
+
eval_batch=1 loss=0.019788 batch_time_s=0.8235
|
| 5 |
+
eval_batch=2 loss=0.010034 batch_time_s=0.2312
|
| 6 |
+
eval_batch=3 loss=0.006535 batch_time_s=0.2283
|
| 7 |
+
eval_batch=4 loss=0.019442 batch_time_s=0.2249
|
| 8 |
+
eval_batch=5 loss=0.023646 batch_time_s=0.2275
|
| 9 |
+
eval_batch=6 loss=0.045010 batch_time_s=0.2273
|
| 10 |
+
eval_batch=7 loss=0.021796 batch_time_s=0.2327
|
| 11 |
+
eval_batch=8 loss=0.019273 batch_time_s=0.2319
|
| 12 |
+
eval_batch=9 loss=0.021624 batch_time_s=0.2248
|
| 13 |
+
eval_batch=10 loss=0.035467 batch_time_s=0.2359
|
| 14 |
+
eval_batch=11 loss=0.034351 batch_time_s=0.2552
|
| 15 |
+
eval_batch=12 loss=0.027341 batch_time_s=0.2308
|
| 16 |
+
eval_batch=13 loss=0.047439 batch_time_s=0.2257
|
| 17 |
+
eval_batch=14 loss=0.037939 batch_time_s=0.2329
|
| 18 |
+
eval_batch=15 loss=0.043057 batch_time_s=0.2215
|
| 19 |
+
eval_batch=16 loss=0.038503 batch_time_s=0.2317
|
| 20 |
+
eval_batch=17 loss=0.043592 batch_time_s=0.2290
|
| 21 |
+
eval_batch=18 loss=0.037270 batch_time_s=0.2265
|
| 22 |
+
eval_batch=19 loss=0.020304 batch_time_s=0.2329
|
| 23 |
+
eval_batch=20 loss=0.030268 batch_time_s=0.2234
|
| 24 |
+
eval_batch=21 loss=0.041346 batch_time_s=0.2263
|
| 25 |
+
eval_batch=22 loss=0.028159 batch_time_s=0.2268
|
| 26 |
+
eval_batch=23 loss=0.065991 batch_time_s=0.2251
|
| 27 |
+
eval_batch=24 loss=0.064603 batch_time_s=0.2268
|
| 28 |
+
eval_batch=25 loss=0.068628 batch_time_s=0.2282
|
| 29 |
+
eval_batch=26 loss=0.023403 batch_time_s=0.2302
|
| 30 |
+
eval_batch=27 loss=0.031110 batch_time_s=0.2274
|
| 31 |
+
eval_batch=28 loss=0.022352 batch_time_s=0.2289
|
| 32 |
+
eval_batch=29 loss=0.046446 batch_time_s=0.2292
|
| 33 |
+
eval_batch=30 loss=0.043246 batch_time_s=0.2321
|
| 34 |
+
eval_batch=31 loss=0.101922 batch_time_s=0.2274
|
| 35 |
+
eval_batch=32 loss=0.072581 batch_time_s=0.2300
|
| 36 |
+
eval_batch=33 loss=0.056358 batch_time_s=0.2252
|
| 37 |
+
eval_batch=34 loss=0.065017 batch_time_s=0.2306
|
| 38 |
+
eval_batch=35 loss=0.048672 batch_time_s=0.2388
|
| 39 |
+
eval_batch=36 loss=0.022249 batch_time_s=0.2322
|
| 40 |
+
eval_batch=37 loss=0.014201 batch_time_s=0.2266
|
| 41 |
+
eval_batch=38 loss=0.039009 batch_time_s=0.2261
|
| 42 |
+
eval_batch=39 loss=0.033967 batch_time_s=0.2303
|
| 43 |
+
eval_batch=40 loss=0.021915 batch_time_s=0.2462
|
| 44 |
+
eval_batch=41 loss=0.024328 batch_time_s=0.2613
|
| 45 |
+
eval_batch=42 loss=0.050496 batch_time_s=0.2354
|
| 46 |
+
eval_batch=43 loss=0.010375 batch_time_s=0.2300
|
| 47 |
+
eval_batch=44 loss=0.016967 batch_time_s=0.2276
|
| 48 |
+
eval_batch=45 loss=0.026333 batch_time_s=0.2552
|
| 49 |
+
eval_batch=46 loss=0.019980 batch_time_s=0.2267
|
| 50 |
+
eval_batch=47 loss=0.089578 batch_time_s=0.2327
|
| 51 |
+
eval_batch=48 loss=0.209416 batch_time_s=0.2445
|
| 52 |
+
eval_batch=49 loss=0.011339 batch_time_s=0.2359
|
| 53 |
+
eval_batch=50 loss=0.066028 batch_time_s=0.2251
|
| 54 |
+
eval_batch=51 loss=0.035093 batch_time_s=0.2288
|
| 55 |
+
eval_batch=52 loss=0.020534 batch_time_s=0.2276
|
| 56 |
+
eval_batch=53 loss=0.006331 batch_time_s=0.2313
|
| 57 |
+
eval_batch=54 loss=0.012782 batch_time_s=0.2247
|
| 58 |
+
eval_batch=55 loss=0.022509 batch_time_s=0.2299
|
| 59 |
+
eval_batch=56 loss=0.047079 batch_time_s=0.2317
|
| 60 |
+
eval_batch=57 loss=0.023989 batch_time_s=0.2302
|
| 61 |
+
eval_batch=58 loss=0.019615 batch_time_s=0.2322
|
| 62 |
+
eval_batch=59 loss=0.026347 batch_time_s=0.2346
|
| 63 |
+
eval_batch=60 loss=0.004678 batch_time_s=0.2323
|
| 64 |
+
eval_batch=61 loss=0.007068 batch_time_s=0.2324
|
| 65 |
+
eval_batch=62 loss=0.013162 batch_time_s=0.2336
|
| 66 |
+
eval_batch=63 loss=0.047115 batch_time_s=0.2236
|
| 67 |
+
eval_batch=64 loss=0.017077 batch_time_s=0.2299
|
| 68 |
+
eval_batch=65 loss=0.047049 batch_time_s=0.2288
|
| 69 |
+
eval_batch=66 loss=0.035518 batch_time_s=0.2257
|
| 70 |
+
eval_batch=67 loss=0.016819 batch_time_s=0.2306
|
| 71 |
+
eval_batch=68 loss=0.051586 batch_time_s=0.2215
|
| 72 |
+
eval_batch=69 loss=0.043497 batch_time_s=0.2312
|
| 73 |
+
eval_batch=70 loss=0.072536 batch_time_s=0.2301
|
| 74 |
+
eval_batch=71 loss=0.018621 batch_time_s=0.2365
|
| 75 |
+
eval_batch=72 loss=0.043862 batch_time_s=0.2305
|
| 76 |
+
eval_batch=73 loss=0.034882 batch_time_s=0.2314
|
| 77 |
+
eval_batch=74 loss=0.028771 batch_time_s=0.2286
|
| 78 |
+
eval_batch=75 loss=0.012547 batch_time_s=0.2269
|
| 79 |
+
eval_batch=76 loss=0.023966 batch_time_s=0.2317
|
| 80 |
+
eval_batch=77 loss=0.023444 batch_time_s=0.2290
|
| 81 |
+
eval_batch=78 loss=0.048585 batch_time_s=0.2343
|
| 82 |
+
eval_batch=79 loss=0.065904 batch_time_s=0.2264
|
| 83 |
+
eval_batch=80 loss=0.072660 batch_time_s=0.2255
|
| 84 |
+
eval_batch=81 loss=0.038694 batch_time_s=0.2281
|
| 85 |
+
eval_batch=82 loss=0.013027 batch_time_s=0.2302
|
| 86 |
+
eval_batch=83 loss=0.022540 batch_time_s=0.2336
|
| 87 |
+
eval_batch=84 loss=0.010291 batch_time_s=0.2216
|
| 88 |
+
eval_batch=85 loss=0.054119 batch_time_s=0.2286
|
| 89 |
+
eval_batch=86 loss=0.021808 batch_time_s=0.2305
|
| 90 |
+
eval_batch=87 loss=0.018521 batch_time_s=0.2330
|
| 91 |
+
eval_batch=88 loss=0.042638 batch_time_s=0.2329
|
| 92 |
+
eval_batch=89 loss=0.023391 batch_time_s=0.2352
|
| 93 |
+
eval_batch=90 loss=0.004995 batch_time_s=0.2289
|
| 94 |
+
eval_batch=91 loss=0.006358 batch_time_s=0.2311
|
| 95 |
+
eval_batch=92 loss=0.024077 batch_time_s=0.2306
|
| 96 |
+
eval_batch=93 loss=0.039791 batch_time_s=0.2334
|
| 97 |
+
eval_batch=94 loss=0.046554 batch_time_s=0.2327
|
| 98 |
+
eval_batch=95 loss=0.038985 batch_time_s=0.2279
|
| 99 |
+
eval_batch=96 loss=0.034484 batch_time_s=0.2243
|
| 100 |
+
eval_batch=97 loss=0.037144 batch_time_s=0.2285
|
| 101 |
+
eval_batch=98 loss=0.069108 batch_time_s=0.2318
|
| 102 |
+
eval_batch=99 loss=0.035033 batch_time_s=0.2335
|
| 103 |
+
eval_batch=100 loss=0.024118 batch_time_s=0.2258
|
| 104 |
+
config_name: pi05_twin_handover_256_packed_parallel_pytorch_2k
|
| 105 |
+
checkpoint_path: /workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/2000
|
| 106 |
+
repo_id_used: lsnu/twin_handover_256_val
|
| 107 |
+
num_batches: 100
|
| 108 |
+
mean_val_loss: 0.035680
|
| 109 |
+
std_val_loss: 0.026077
|
| 110 |
+
per_batch_timing_seconds: mean=0.2366 std=0.0593 min=0.2215 max=0.8235
|
| 111 |
+
active_mask_dims: [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23]
|
| 112 |
+
masked_dims: [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31]
|
| 113 |
+
weight_loading_missing_keys: []
|
| 114 |
+
weight_loading_unexpected_keys: []
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/importtime_train_pytorch.log
ADDED
|
@@ -0,0 +1,349 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import time: self [us] | cumulative | imported package
|
| 2 |
+
import time: 459 | 459 | _io
|
| 3 |
+
import time: 100 | 100 | marshal
|
| 4 |
+
import time: 1005 | 1005 | posix
|
| 5 |
+
import time: 2124 | 3687 | _frozen_importlib_external
|
| 6 |
+
import time: 521 | 521 | time
|
| 7 |
+
import time: 542 | 1062 | zipimport
|
| 8 |
+
import time: 126 | 126 | _codecs
|
| 9 |
+
import time: 1309 | 1435 | codecs
|
| 10 |
+
import time: 1112 | 1112 | encodings.aliases
|
| 11 |
+
import time: 2172 | 4718 | encodings
|
| 12 |
+
import time: 579 | 579 | encodings.utf_8
|
| 13 |
+
import time: 307 | 307 | _signal
|
| 14 |
+
import time: 100 | 100 | _abc
|
| 15 |
+
import time: 561 | 660 | abc
|
| 16 |
+
import time: 825 | 1484 | io
|
| 17 |
+
import time: 115 | 115 | _stat
|
| 18 |
+
import time: 578 | 693 | stat
|
| 19 |
+
import time: 2208 | 2208 | _collections_abc
|
| 20 |
+
import time: 104 | 104 | genericpath
|
| 21 |
+
import time: 550 | 653 | posixpath
|
| 22 |
+
import time: 1911 | 5463 | os
|
| 23 |
+
import time: 177 | 177 | _sitebuiltins
|
| 24 |
+
import time: 27305 | 27305 | _virtualenv
|
| 25 |
+
import time: 29406 | 29406 | _distutils_hack
|
| 26 |
+
import time: 427 | 427 | sitecustomize
|
| 27 |
+
import time: 150941 | 213717 | site
|
| 28 |
+
import time: 44311 | 44311 | scripts
|
| 29 |
+
import time: 2030 | 2030 | types
|
| 30 |
+
import time: 257 | 257 | _operator
|
| 31 |
+
import time: 2479 | 2735 | operator
|
| 32 |
+
import time: 339 | 339 | itertools
|
| 33 |
+
import time: 1371 | 1371 | keyword
|
| 34 |
+
import time: 1450 | 1450 | reprlib
|
| 35 |
+
import time: 191 | 191 | _collections
|
| 36 |
+
import time: 5447 | 8797 | collections
|
| 37 |
+
import time: 173 | 173 | _functools
|
| 38 |
+
import time: 2662 | 11631 | functools
|
| 39 |
+
import time: 8546 | 24941 | enum
|
| 40 |
+
import time: 219 | 219 | _sre
|
| 41 |
+
import time: 791 | 791 | re._constants
|
| 42 |
+
import time: 1361 | 2151 | re._parser
|
| 43 |
+
import time: 365 | 365 | re._casefix
|
| 44 |
+
import time: 2178 | 4912 | re._compiler
|
| 45 |
+
import time: 1533 | 1533 | copyreg
|
| 46 |
+
import time: 3330 | 34714 | re
|
| 47 |
+
import time: 1611 | 1611 | _weakrefset
|
| 48 |
+
import time: 3100 | 4710 | weakref
|
| 49 |
+
import time: 9576 | 9576 | org
|
| 50 |
+
import time: 255 | 9830 | org.python
|
| 51 |
+
import time: 224 | 10053 | org.python.core
|
| 52 |
+
import time: 2268 | 17030 | copy
|
| 53 |
+
import time: 3210 | 3210 | _ast
|
| 54 |
+
import time: 4811 | 4811 | contextlib
|
| 55 |
+
import time: 7411 | 15431 | ast
|
| 56 |
+
import time: 164 | 164 | _opcode
|
| 57 |
+
import time: 4583 | 4747 | opcode
|
| 58 |
+
import time: 5950 | 10696 | dis
|
| 59 |
+
import time: 612 | 612 | collections.abc
|
| 60 |
+
import time: 2739 | 2739 | warnings
|
| 61 |
+
import time: 2281 | 5019 | importlib
|
| 62 |
+
import time: 368 | 5387 | importlib.machinery
|
| 63 |
+
import time: 3055 | 3055 | token
|
| 64 |
+
import time: 6195 | 9250 | tokenize
|
| 65 |
+
import time: 2342 | 11591 | linecache
|
| 66 |
+
import time: 8114 | 51829 | inspect
|
| 67 |
+
import time: 4396 | 107967 | dataclasses
|
| 68 |
+
import time: 186 | 186 | gc
|
| 69 |
+
import time: 4839 | 4839 | textwrap
|
| 70 |
+
import time: 3120 | 7958 | traceback
|
| 71 |
+
import time: 130 | 130 | _string
|
| 72 |
+
import time: 3121 | 3250 | string
|
| 73 |
+
import time: 4201 | 4201 | threading
|
| 74 |
+
import time: 127 | 127 | atexit
|
| 75 |
+
import time: 7952 | 23486 | logging
|
| 76 |
+
import time: 6543 | 6543 | platform
|
| 77 |
+
import time: 2295 | 2295 | fnmatch
|
| 78 |
+
import time: 287 | 287 | errno
|
| 79 |
+
import time: 336 | 336 | zlib
|
| 80 |
+
import time: 4029 | 4029 | _compression
|
| 81 |
+
import time: 2862 | 2862 | _bz2
|
| 82 |
+
import time: 4040 | 10930 | bz2
|
| 83 |
+
import time: 4869 | 4869 | _lzma
|
| 84 |
+
import time: 4931 | 9800 | lzma
|
| 85 |
+
import time: 6201 | 29847 | shutil
|
| 86 |
+
import time: 4466 | 4466 | __future__
|
| 87 |
+
import time: 222 | 222 | math
|
| 88 |
+
import time: 341 | 341 | _datetime
|
| 89 |
+
import time: 9607 | 10169 | datetime
|
| 90 |
+
import time: 8169 | 8169 | _winapi
|
| 91 |
+
import time: 11261 | 11261 | nt
|
| 92 |
+
import time: 9784 | 9784 | nt
|
| 93 |
+
import time: 8028 | 8028 | nt
|
| 94 |
+
import time: 10836 | 10836 | nt
|
| 95 |
+
import time: 8338 | 8338 | nt
|
| 96 |
+
import time: 3115 | 59529 | ntpath
|
| 97 |
+
import time: 3503 | 3503 | urllib
|
| 98 |
+
import time: 7606 | 7606 | ipaddress
|
| 99 |
+
import time: 3508 | 14616 | urllib.parse
|
| 100 |
+
import time: 7032 | 81177 | pathlib
|
| 101 |
+
import time: 279 | 279 | _locale
|
| 102 |
+
import time: 7805 | 8083 | locale
|
| 103 |
+
import time: 5367 | 5367 | signal
|
| 104 |
+
import time: 213 | 213 | fcntl
|
| 105 |
+
import time: 9064 | 9064 | msvcrt
|
| 106 |
+
import time: 169 | 169 | _posixsubprocess
|
| 107 |
+
import time: 236 | 236 | select
|
| 108 |
+
import time: 6301 | 6301 | selectors
|
| 109 |
+
import time: 11615 | 41045 | subprocess
|
| 110 |
+
import time: 42021 | 178875 | jax.version
|
| 111 |
+
import time: 56236 | 56236 | jax._src
|
| 112 |
+
import time: 8786 | 8786 | _typing
|
| 113 |
+
import time: 13264 | 22049 | typing
|
| 114 |
+
import time: 58851 | 58851 | jaxlib.version
|
| 115 |
+
import time: 95721 | 154572 | jaxlib
|
| 116 |
+
import time: 21162 | 21162 | jaxlib.cpu_feature_guard
|
| 117 |
+
import time: 17965 | 17965 | jaxlib.utils
|
| 118 |
+
import time: 357 | 357 | _struct
|
| 119 |
+
import time: 1673 | 2029 | struct
|
| 120 |
+
import time: 12406 | 14434 | gzip
|
| 121 |
+
import time: 72050 | 72050 | numpy._utils._convertions
|
| 122 |
+
import time: 90873 | 162922 | numpy._utils
|
| 123 |
+
import time: 58590 | 221512 | numpy._globals
|
| 124 |
+
import time: 53224 | 53224 | numpy.exceptions
|
| 125 |
+
import time: 52905 | 52905 | numpy.version
|
| 126 |
+
import time: 609 | 609 | numpy._distributor_init_local
|
| 127 |
+
import time: 55586 | 56194 | numpy._distributor_init
|
| 128 |
+
import time: 35135 | 35135 | numpy._utils._inspect
|
| 129 |
+
import time: 12046 | 12046 | numpy.core._exceptions
|
| 130 |
+
import time: 8271 | 8271 | numpy.dtypes
|
| 131 |
+
import time: 202114 | 222430 | numpy.core._multiarray_umath
|
| 132 |
+
import time: 37402 | 294966 | numpy.core.overrides
|
| 133 |
+
import time: 53534 | 348500 | numpy.core.multiarray
|
| 134 |
+
import time: 5565 | 5565 | numpy.core.umath
|
| 135 |
+
import time: 1567 | 1567 | numbers
|
| 136 |
+
import time: 8550 | 8550 | numpy.core._string_helpers
|
| 137 |
+
import time: 8226 | 8226 | pickle5
|
| 138 |
+
import time: 1179 | 1179 | _compat_pickle
|
| 139 |
+
import time: 512 | 512 | _pickle
|
| 140 |
+
import time: 1864 | 1864 | org
|
| 141 |
+
import time: 329 | 2192 | org.python
|
| 142 |
+
import time: 776 | 2968 | org.python.core
|
| 143 |
+
import time: 4888 | 9545 | pickle
|
| 144 |
+
import time: 10531 | 28301 | numpy.compat.py3k
|
| 145 |
+
import time: 25123 | 53423 | numpy.compat
|
| 146 |
+
import time: 7502 | 7502 | numpy.core._dtype
|
| 147 |
+
import time: 8090 | 69014 | numpy.core._type_aliases
|
| 148 |
+
import time: 6044 | 85174 | numpy.core.numerictypes
|
| 149 |
+
import time: 646 | 646 | _contextvars
|
| 150 |
+
import time: 805 | 1451 | contextvars
|
| 151 |
+
import time: 7510 | 8961 | numpy.core._ufunc_config
|
| 152 |
+
import time: 21816 | 30776 | numpy.core._methods
|
| 153 |
+
import time: 10769 | 41545 | numpy.core.fromnumeric
|
| 154 |
+
import time: 7787 | 49331 | numpy.core.shape_base
|
| 155 |
+
import time: 6325 | 6325 | numpy.core.arrayprint
|
| 156 |
+
import time: 4369 | 4369 | numpy.core._asarray
|
| 157 |
+
import time: 10436 | 70460 | numpy.core.numeric
|
| 158 |
+
import time: 5458 | 5458 | numpy.core.defchararray
|
| 159 |
+
import time: 6653 | 6653 | numpy.core.records
|
| 160 |
+
import time: 2659 | 2659 | numpy.core.memmap
|
| 161 |
+
import time: 3430 | 3430 | numpy.core.function_base
|
| 162 |
+
import time: 3739 | 3739 | numpy.core._machar
|
| 163 |
+
import time: 4821 | 4821 | numpy.core.getlimits
|
| 164 |
+
import time: 5141 | 5141 | numpy.core.einsumfunc
|
| 165 |
+
import time: 2892 | 2892 | numpy.core._multiarray_tests
|
| 166 |
+
import time: 7349 | 10241 | numpy.core._add_newdocs
|
| 167 |
+
import time: 10209 | 10209 | numpy.core._add_newdocs_scalars
|
| 168 |
+
import time: 4958 | 4958 | numpy.core._dtype_ctypes
|
| 169 |
+
import time: 1331 | 1331 | _ctypes
|
| 170 |
+
import time: 1038 | 1038 | ctypes._endian
|
| 171 |
+
import time: 3302 | 5670 | ctypes
|
| 172 |
+
import time: 8903 | 14573 | numpy.core._internal
|
| 173 |
+
import time: 7543 | 7543 | numpy._pytesttester
|
| 174 |
+
import time: 81885 | 671000 | numpy.core
|
| 175 |
+
import time: 153 | 671152 | numpy.core._multiarray_umath
|
| 176 |
+
import time: 56994 | 728146 | numpy.__config__
|
| 177 |
+
import time: 7653 | 7653 | numpy.lib.mixins
|
| 178 |
+
import time: 9676 | 9676 | numpy.lib.ufunclike
|
| 179 |
+
import time: 7766 | 17441 | numpy.lib.type_check
|
| 180 |
+
import time: 10010 | 27450 | numpy.lib.scimath
|
| 181 |
+
import time: 22351 | 22351 | numpy.lib.stride_tricks
|
| 182 |
+
import time: 11303 | 33654 | numpy.lib.twodim_base
|
| 183 |
+
import time: 8761 | 8761 | numpy.linalg._umath_linalg
|
| 184 |
+
import time: 16569 | 16569 | numpy._typing._nested_sequence
|
| 185 |
+
import time: 13982 | 13982 | numpy._typing._nbit
|
| 186 |
+
import time: 20263 | 20263 | numpy._typing._char_codes
|
| 187 |
+
import time: 11700 | 11700 | numpy._typing._scalars
|
| 188 |
+
import time: 8982 | 8982 | numpy._typing._shape
|
| 189 |
+
import time: 24532 | 24532 | numpy._typing._dtype_like
|
| 190 |
+
import time: 44660 | 44660 | numpy._typing._array_like
|
| 191 |
+
import time: 29866 | 170550 | numpy._typing
|
| 192 |
+
import time: 17677 | 230640 | numpy.linalg.linalg
|
| 193 |
+
import time: 237805 | 468444 | numpy.linalg
|
| 194 |
+
import time: 9029 | 477473 | numpy.matrixlib.defmatrix
|
| 195 |
+
import time: 10944 | 488417 | numpy.matrixlib
|
| 196 |
+
import time: 8745 | 8745 | numpy.lib.histograms
|
| 197 |
+
import time: 27873 | 36617 | numpy.lib.function_base
|
| 198 |
+
import time: 17216 | 542249 | numpy.lib.index_tricks
|
| 199 |
+
import time: 16518 | 16518 | numpy.lib.nanfunctions
|
| 200 |
+
import time: 14925 | 14925 | numpy.lib.shape_base
|
| 201 |
+
import time: 8883 | 8883 | numpy.lib.polynomial
|
| 202 |
+
import time: 13341 | 13341 | numpy.lib.utils
|
| 203 |
+
import time: 13347 | 13347 | numpy.lib.arraysetops
|
| 204 |
+
import time: 18662 | 18662 | numpy.lib.format
|
| 205 |
+
import time: 9834 | 9834 | numpy.lib._datasource
|
| 206 |
+
import time: 10465 | 10465 | numpy.lib._iotools
|
| 207 |
+
import time: 26974 | 65935 | numpy.lib.npyio
|
| 208 |
+
import time: 14808 | 14808 | numpy.lib.arrayterator
|
| 209 |
+
import time: 28751 | 28751 | numpy.lib.arraypad
|
| 210 |
+
import time: 31641 | 31641 | numpy.lib._version
|
| 211 |
+
import time: 16718 | 802213 | numpy.lib
|
| 212 |
+
import time: 13764 | 13764 | numpy.fft._pocketfft_internal
|
| 213 |
+
import time: 47189 | 60952 | numpy.fft._pocketfft
|
| 214 |
+
import time: 34176 | 34176 | numpy.fft.helper
|
| 215 |
+
import time: 57859 | 152987 | numpy.fft
|
| 216 |
+
import time: 32723 | 32723 | numpy.polynomial.polyutils
|
| 217 |
+
import time: 20810 | 20810 | numpy.polynomial._polybase
|
| 218 |
+
import time: 47703 | 101235 | numpy.polynomial.polynomial
|
| 219 |
+
import time: 22597 | 22597 | numpy.polynomial.chebyshev
|
| 220 |
+
import time: 15190 | 15190 | numpy.polynomial.legendre
|
| 221 |
+
import time: 12249 | 12249 | numpy.polynomial.hermite
|
| 222 |
+
import time: 15883 | 15883 | numpy.polynomial.hermite_e
|
| 223 |
+
import time: 20997 | 20997 | numpy.polynomial.laguerre
|
| 224 |
+
import time: 57756 | 245905 | numpy.polynomial
|
| 225 |
+
import time: 11659 | 11659 | backports_abc
|
| 226 |
+
import time: 8899 | 20558 | numpy.random._common
|
| 227 |
+
import time: 609 | 609 | binascii
|
| 228 |
+
import time: 1895 | 2503 | base64
|
| 229 |
+
import time: 6404 | 6404 | _hashlib
|
| 230 |
+
import time: 184 | 184 | _blake2
|
| 231 |
+
import time: 1554 | 1737 | hashlib
|
| 232 |
+
import time: 2119 | 10260 | hmac
|
| 233 |
+
import time: 96 | 96 | _bisect
|
| 234 |
+
import time: 1252 | 1347 | bisect
|
| 235 |
+
import time: 164 | 164 | _random
|
| 236 |
+
import time: 176 | 176 | _sha512
|
| 237 |
+
import time: 2855 | 4541 | random
|
| 238 |
+
import time: 1966 | 19268 | secrets
|
| 239 |
+
import time: 8364 | 48189 | numpy.random.bit_generator
|
| 240 |
+
import time: 5773 | 5773 | numpy.random._bounded_integers
|
| 241 |
+
import time: 6014 | 6014 | numpy.random._mt19937
|
| 242 |
+
import time: 9760 | 69734 | numpy.random.mtrand
|
| 243 |
+
import time: 7331 | 7331 | numpy.random._philox
|
| 244 |
+
import time: 5862 | 5862 | numpy.random._pcg64
|
| 245 |
+
import time: 5462 | 5462 | numpy.random._sfc64
|
| 246 |
+
import time: 8031 | 8031 | numpy.random._generator
|
| 247 |
+
import time: 22729 | 119147 | numpy.random._pickle
|
| 248 |
+
import time: 23124 | 142271 | numpy.random
|
| 249 |
+
import time: 20592 | 20592 | numpy.ctypeslib
|
| 250 |
+
import time: 40900 | 40900 | numpy.ma.core
|
| 251 |
+
import time: 26643 | 26643 | numpy.ma.extras
|
| 252 |
+
import time: 31513 | 99055 | numpy.ma
|
| 253 |
+
import time: 75854 | 2650852 | numpy
|
| 254 |
+
import time: 22335 | 2673187 | numpy._core
|
| 255 |
+
import time: 29059 | 2702245 | numpy._core._multiarray_umath
|
| 256 |
+
import time: 22101 | 2724346 | ml_dtypes._ml_dtypes_ext
|
| 257 |
+
import time: 53315 | 2777661 | ml_dtypes._finfo
|
| 258 |
+
import time: 15604 | 15604 | ml_dtypes._iinfo
|
| 259 |
+
import time: 93641 | 2886905 | ml_dtypes
|
| 260 |
+
import time: 62057 | 62057 | jaxlib.xla_extension
|
| 261 |
+
import time: 46707 | 3010102 | jaxlib.xla_client
|
| 262 |
+
import time: 31965 | 31965 | jaxlib.cpu
|
| 263 |
+
import time: 45309 | 45309 | jaxlib.cpu._lapack
|
| 264 |
+
import time: 28009 | 105282 | jaxlib.lapack
|
| 265 |
+
import time: 378 | 378 | jaxlib.cuda
|
| 266 |
+
import time: 468 | 846 | jaxlib.cuda._versions
|
| 267 |
+
import time: 40192 | 40192 | jax_cuda12_plugin
|
| 268 |
+
import time: 39973 | 80164 | jax_cuda12_plugin._versions
|
| 269 |
+
import time: 5772 | 5772 | jaxlib.plugin_support
|
| 270 |
+
import time: 1036132 | 1041903 | jaxlib.gpu_solver
|
| 271 |
+
import time: 7872 | 7872 | jaxlib.mlir
|
| 272 |
+
import time: 167478 | 167478 | jaxlib.mlir._mlir_libs._mlir
|
| 273 |
+
import time: 22966 | 190443 | jaxlib.mlir._mlir_libs
|
| 274 |
+
import time: 599 | 191041 | jaxlib.mlir._mlir_libs._mlir
|
| 275 |
+
import time: 311 | 191352 | jaxlib.mlir._mlir_libs._mlir.ir
|
| 276 |
+
import time: 15499 | 214723 | jaxlib.mlir.ir
|
| 277 |
+
import time: 4822 | 4822 | jaxlib.mlir.dialects
|
| 278 |
+
import time: 9680 | 9680 | jaxlib.mlir.dialects._ods_common
|
| 279 |
+
import time: 20956 | 30636 | jaxlib.mlir.dialects._stablehlo_ops_gen
|
| 280 |
+
import time: 6744 | 6744 | jaxlib.mlir._mlir_libs._stablehlo
|
| 281 |
+
import time: 15548 | 57749 | jaxlib.mlir.dialects.stablehlo
|
| 282 |
+
import time: 8550 | 66299 | jaxlib.hlo_helpers
|
| 283 |
+
import time: 20744 | 301764 | jaxlib.gpu_sparse
|
| 284 |
+
import time: 26892 | 26892 | jaxlib.gpu_prng
|
| 285 |
+
import time: 14945 | 14945 | jaxlib.gpu_linalg
|
| 286 |
+
import time: 8409 | 8409 | jaxlib.gpu_common_utils
|
| 287 |
+
import time: 19412 | 27821 | jaxlib.gpu_rnn
|
| 288 |
+
import time: 18330 | 18330 | jaxlib.gpu_triton
|
| 289 |
+
import time: 7304 | 7304 | jaxlib.mosaic
|
| 290 |
+
import time: 17191 | 24495 | jaxlib.mosaic.python
|
| 291 |
+
import time: 8072 | 8072 | jaxlib.mosaic.dialect
|
| 292 |
+
import time: 12479 | 20550 | jaxlib.mosaic.dialect.gpu
|
| 293 |
+
import time: 39792 | 60342 | jaxlib.mosaic.dialect.gpu._mosaic_gpu_gen_ops
|
| 294 |
+
import time: 33098 | 33098 | jaxlib.mosaic.dialect.gpu._mosaic_gpu_gen_enums
|
| 295 |
+
import time: 17182 | 17182 | jaxlib.mlir._mlir_libs._mosaic_gpu_ext
|
| 296 |
+
import time: 44715 | 179830 | jaxlib.mosaic.python.mosaic_gpu
|
| 297 |
+
import time: 38772 | 38772 | jaxlib.mosaic.python._tpu_gen
|
| 298 |
+
import time: 19006 | 19006 | jaxlib.mlir._mlir_libs._tpu_ext
|
| 299 |
+
import time: 35539 | 93316 | jaxlib.mosaic.python.tpu
|
| 300 |
+
import time: 52437 | 52437 | nvidia
|
| 301 |
+
import time: 33733 | 33733 | nvidia.cuda_nvcc
|
| 302 |
+
import time: 94038 | 5275093 | jax._src.lib
|
| 303 |
+
import time: 43893 | 43893 | jax._src.logging_config
|
| 304 |
+
import time: 58138 | 5399172 | jax._src.config
|
| 305 |
+
import time: 5142 | 5142 | glob
|
| 306 |
+
import time: 32810 | 37951 | jax._src.hardware_utils
|
| 307 |
+
import time: 89410 | 5582767 | jax._src.cloud_tpu_init
|
| 308 |
+
import time: 1474 | 1474 | libtpu
|
| 309 |
+
import time: 16083 | 16083 | jax._src.basearray
|
| 310 |
+
import time: 10237 | 26320 | jax._src.typing
|
| 311 |
+
import time: 14374 | 14374 | jax._src.util
|
| 312 |
+
import time: 40141 | 40141 | jax._src.traceback_util
|
| 313 |
+
import time: 19514 | 100348 | jax._src.dtypes
|
| 314 |
+
import time: 44949 | 44949 | jax._src.effects
|
| 315 |
+
import time: 45640 | 45640 | jax._src.compute_on
|
| 316 |
+
import time: 827 | 827 | _json
|
| 317 |
+
import time: 1678 | 2504 | json.scanner
|
| 318 |
+
import time: 1613 | 4117 | json.decoder
|
| 319 |
+
import time: 1159 | 1159 | json.encoder
|
| 320 |
+
import time: 1849 | 7124 | json
|
| 321 |
+
import time: 867 | 867 | importlib._abc
|
| 322 |
+
import time: 675 | 1541 | importlib.util
|
| 323 |
+
import time: 1466 | 3007 | pkgutil
|
| 324 |
+
import time: 74826 | 74826 | jax._src.clusters.cluster
|
| 325 |
+
import time: 44029 | 44029 | jax._src.clusters.ompi_cluster
|
| 326 |
+
import time: 45845 | 45845 | jax._src.clusters.slurm_cluster
|
| 327 |
+
import time: 664 | 664 | _socket
|
| 328 |
+
import time: 283 | 283 | array
|
| 329 |
+
import time: 6869 | 7815 | socket
|
| 330 |
+
import time: 46749 | 54564 | jax._src.clusters.mpi4py_cluster
|
| 331 |
+
import time: 52800 | 52800 | jax._src.clusters.cloud_tpu_cluster
|
| 332 |
+
import time: 45723 | 45723 | jax._src.clusters.k8s_cluster
|
| 333 |
+
import time: 98001 | 415785 | jax._src.clusters
|
| 334 |
+
import time: 53385 | 469170 | jax._src.distributed
|
| 335 |
+
import time: 83403 | 83403 | jax_plugins
|
| 336 |
+
import time: 60154 | 622856 | jax._src.xla_bridge
|
| 337 |
+
import time: 54548 | 677403 | jax._src.mesh
|
| 338 |
+
import time: 72161 | 72161 | jax._src.partition_spec
|
| 339 |
+
import time: 85965 | 85965 | jax._src.errors
|
| 340 |
+
import time: 208 | 208 | _heapq
|
| 341 |
+
import time: 9637 | 9845 | heapq
|
| 342 |
+
import time: 9570 | 19414 | difflib
|
| 343 |
+
import time: 60664 | 80078 | jax._src.tree_util
|
| 344 |
+
import time: 58352 | 138430 | jax._src.linear_util
|
| 345 |
+
import time: 49579 | 49579 | sysconfig
|
| 346 |
+
import time: 10829 | 10829 | _sysconfigdata__x86_64-linux-gnu
|
| 347 |
+
import time: 61018 | 121425 | jax._src.source_info_util
|
| 348 |
+
import time: 9007 | 9007 | colorama
|
| 349 |
+
import time: 55120 | 64126 | jax._src.pretty_printer
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/inspect_twin_packed_batch_handover_train.log
ADDED
|
@@ -0,0 +1,176 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
config_name: pi05_twin_handover_256_packed_baseline_pytorch_2k
|
| 2 |
+
repo_id: lsnu/twin_handover_256_train
|
| 3 |
+
sample_index: 0
|
| 4 |
+
norm_stats_path: /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json
|
| 5 |
+
norm_stats_keys: ['actions', 'state']
|
| 6 |
+
norm_stats_lengths: state_mean=16 state_std=16 action_mean=16 action_std=16
|
| 7 |
+
block_boundaries: [0:8] [8:16] [16:24] [24:32]
|
| 8 |
+
raw_state_16d_shape: (16,)
|
| 9 |
+
raw_state_16d:
|
| 10 |
+
[ 7.1883e-07 1.7515e-01 -5.6890e-06 -8.7299e-01 -6.3130e-06 1.2216e+00
|
| 11 |
+
7.8540e-01 1.0000e+00 1.1957e-06 1.7514e-01 -9.2062e-07 -8.7312e-01
|
| 12 |
+
1.6098e-05 1.2216e+00 7.8539e-01 1.0000e+00]
|
| 13 |
+
raw_actions_16d_shape: (16, 16)
|
| 14 |
+
raw_actions_16d:
|
| 15 |
+
[[ 2.3842e-05 -8.2493e-04 -5.7220e-05 3.9577e-04 2.8610e-05 7.8201e-04
|
| 16 |
+
-1.2398e-04 1.0000e+00 9.5367e-05 4.0293e-03 9.5367e-06 7.2479e-04
|
| 17 |
+
1.8120e-04 -1.4305e-05 -2.2411e-04 1.0000e+00]
|
| 18 |
+
[ 5.0068e-04 -1.5645e-02 2.6083e-03 -5.5575e-02 1.8883e-03 2.5430e-02
|
| 19 |
+
-1.9326e-02 1.0000e+00 2.7800e-02 2.4877e-02 -2.7924e-02 -2.7843e-02
|
| 20 |
+
-1.6832e-02 1.0629e-02 3.8543e-02 1.0000e+00]
|
| 21 |
+
[ 1.7738e-03 -7.6041e-02 8.9645e-03 -1.7257e-01 6.0558e-03 8.7943e-02
|
| 22 |
+
-6.4831e-02 1.0000e+00 9.2287e-02 5.8761e-02 -9.3136e-02 -7.6413e-02
|
| 23 |
+
-5.3630e-02 4.2353e-02 1.2606e-01 1.0000e+00]
|
| 24 |
+
[ 3.2425e-03 -1.3747e-01 1.5845e-02 -3.1527e-01 1.0653e-02 1.6477e-01
|
| 25 |
+
-1.1840e-01 1.0000e+00 1.7036e-01 1.0629e-01 -1.7153e-01 -1.4015e-01
|
| 26 |
+
-9.7461e-02 7.8468e-02 2.3009e-01 1.0000e+00]
|
| 27 |
+
[ 5.5885e-03 -2.1545e-01 2.4767e-02 -4.6663e-01 1.6103e-02 2.4452e-01
|
| 28 |
+
-1.7446e-01 1.0000e+00 2.5305e-01 1.5107e-01 -2.5392e-01 -2.1260e-01
|
| 29 |
+
-1.4490e-01 1.1766e-01 3.4122e-01 1.0000e+00]
|
| 30 |
+
[ 6.1035e-03 -2.8390e-01 3.3288e-02 -6.1909e-01 2.1739e-02 3.2683e-01
|
| 31 |
+
-2.3199e-01 1.0000e+00 3.3677e-01 1.9970e-01 -3.3804e-01 -2.8173e-01
|
| 32 |
+
-1.9161e-01 1.5831e-01 4.5282e-01 1.0000e+00]
|
| 33 |
+
[ 9.3937e-03 -3.1736e-01 3.8815e-02 -7.2264e-01 2.9097e-02 3.8407e-01
|
| 34 |
+
-2.9788e-01 1.0000e+00 3.9431e-01 2.3764e-01 -3.9650e-01 -3.2045e-01
|
| 35 |
+
-2.2884e-01 1.8487e-01 5.3961e-01 1.0000e+00]
|
| 36 |
+
[ 1.1177e-02 -3.3051e-01 4.2367e-02 -7.4072e-01 3.5295e-02 4.0234e-01
|
| 37 |
+
-3.4810e-01 1.0000e+00 4.1353e-01 2.4687e-01 -4.1600e-01 -3.4033e-01
|
| 38 |
+
-2.4390e-01 1.9067e-01 5.7513e-01 1.0000e+00]
|
| 39 |
+
[ 1.2674e-02 -3.1841e-01 4.3559e-02 -7.5366e-01 3.7665e-02 4.1035e-01
|
| 40 |
+
-3.7488e-01 1.0000e+00 4.2095e-01 2.5672e-01 -4.2238e-01 -3.4335e-01
|
| 41 |
+
-2.4950e-01 1.9567e-01 5.8634e-01 1.0000e+00]
|
| 42 |
+
[ 1.5645e-02 -3.0324e-01 4.3592e-02 -7.4167e-01 4.2624e-02 4.1367e-01
|
| 43 |
+
-4.1199e-01 1.0000e+00 4.2353e-01 2.6254e-01 -4.2444e-01 -3.4899e-01
|
| 44 |
+
-2.5064e-01 1.9762e-01 5.8977e-01 1.0000e+00]
|
| 45 |
+
[ 1.6398e-02 -2.9560e-01 4.2553e-02 -7.3503e-01 4.5595e-02 4.1383e-01
|
| 46 |
+
-4.3354e-01 1.0000e+00 4.2382e-01 2.5776e-01 -4.2612e-01 -3.5491e-01
|
| 47 |
+
-2.5177e-01 1.9462e-01 5.9134e-01 1.0000e+00]
|
| 48 |
+
[ 2.0757e-02 -2.9058e-01 4.2739e-02 -7.3133e-01 4.6840e-02 4.1339e-01
|
| 49 |
+
-4.5310e-01 1.0000e+00 4.2468e-01 2.5057e-01 -4.2498e-01 -3.4835e-01
|
| 50 |
+
-2.5149e-01 2.0029e-01 5.9138e-01 1.0000e+00]
|
| 51 |
+
[ 2.3303e-02 -2.7753e-01 4.1437e-02 -7.2254e-01 4.8075e-02 4.1380e-01
|
| 52 |
+
-4.7155e-01 1.0000e+00 4.2468e-01 2.5254e-01 -4.2522e-01 -3.4195e-01
|
| 53 |
+
-2.5130e-01 1.9623e-01 5.9127e-01 1.0000e+00]
|
| 54 |
+
[ 2.7924e-02 -2.5505e-01 4.0684e-02 -7.0069e-01 5.3768e-02 4.1076e-01
|
| 55 |
+
-5.1048e-01 1.0000e+00 4.2446e-01 2.5574e-01 -4.2656e-01 -3.5101e-01
|
| 56 |
+
-2.5181e-01 1.9645e-01 5.9101e-01 1.0000e+00]
|
| 57 |
+
[ 3.2401e-02 -2.4053e-01 4.1451e-02 -6.8364e-01 5.6882e-02 4.1132e-01
|
| 58 |
+
-5.4158e-01 1.0000e+00 4.2435e-01 2.5109e-01 -4.2632e-01 -3.5082e-01
|
| 59 |
+
-2.5095e-01 1.9805e-01 5.9107e-01 1.0000e+00]
|
| 60 |
+
[ 3.4809e-02 -2.2431e-01 4.0565e-02 -6.7288e-01 5.6076e-02 4.0839e-01
|
| 61 |
+
-5.6400e-01 1.0000e+00 4.2504e-01 2.5486e-01 -4.2588e-01 -3.4874e-01
|
| 62 |
+
-2.5139e-01 1.9783e-01 5.9183e-01 1.0000e+00]]
|
| 63 |
+
normalized_state_16d_shape: (16,)
|
| 64 |
+
normalized_state_16d:
|
| 65 |
+
[-0.174 0.1055 -0.0061 1.0124 0.086 -0.4741 0.2016 1.0004 0.0951
|
| 66 |
+
0.0668 0.0549 1.0086 -0.053 -0.3299 -1.0068 1.0004]
|
| 67 |
+
normalized_actions_16d_shape: (16, 16)
|
| 68 |
+
normalized_actions_16d:
|
| 69 |
+
[[-0.2378 0.0147 0.1124 0.1989 0.1562 0.1251 0.0182 1.0004 0.1108
|
| 70 |
+
0.0624 0.0823 0.9208 0.055 -0.5935 -0.7448 1.0004]
|
| 71 |
+
[-0.2367 -0.0063 0.1178 0.1174 0.1593 0.1567 -0.0046 1.0004 0.1686
|
| 72 |
+
0.107 0.02 0.7676 0.0127 -0.5697 -0.6371 1.0004]
|
| 73 |
+
[-0.2338 -0.092 0.1305 -0.0529 0.1664 0.2368 -0.0585 1.0004 0.303
|
| 74 |
+
0.1794 -0.1254 0.5072 -0.0788 -0.499 -0.3941 1.0004]
|
| 75 |
+
[-0.2306 -0.1792 0.1444 -0.2606 0.1742 0.3352 -0.1219 1.0004 0.4658
|
| 76 |
+
0.2811 -0.3003 0.1655 -0.1877 -0.4185 -0.1052 1.0004]
|
| 77 |
+
[-0.2253 -0.2898 0.1623 -0.4809 0.1834 0.4374 -0.1883 1.0004 0.6382
|
| 78 |
+
0.3768 -0.484 -0.223 -0.3056 -0.3311 0.2034 1.0004]
|
| 79 |
+
[-0.2242 -0.3869 0.1795 -0.7028 0.193 0.5429 -0.2564 1.0004 0.8128
|
| 80 |
+
0.4808 -0.6717 -0.5936 -0.4217 -0.2404 0.5133 1.0004]
|
| 81 |
+
[-0.2168 -0.4344 0.1906 -0.8535 0.2055 0.6163 -0.3344 1.0004 0.9328
|
| 82 |
+
0.5619 -0.8021 -0.8012 -0.5143 -0.1812 0.7543 1.0004]
|
| 83 |
+
[-0.2129 -0.4531 0.1977 -0.8798 0.216 0.6397 -0.3939 1.0004 0.9729
|
| 84 |
+
0.5816 -0.8455 -0.9078 -0.5517 -0.1682 0.8529 1.0004]
|
| 85 |
+
[-0.2095 -0.4359 0.2001 -0.8986 0.2201 0.6499 -0.4256 1.0004 0.9883
|
| 86 |
+
0.6027 -0.8598 -0.924 -0.5656 -0.1571 0.8841 1.0004]
|
| 87 |
+
[-0.2029 -0.4144 0.2002 -0.8812 0.2285 0.6542 -0.4695 1.0004 0.9937
|
| 88 |
+
0.6151 -0.8644 -0.9542 -0.5684 -0.1527 0.8936 1.0004]
|
| 89 |
+
[-0.2012 -0.4035 0.1981 -0.8715 0.2335 0.6544 -0.495 1.0004 0.9943
|
| 90 |
+
0.6049 -0.8681 -0.986 -0.5713 -0.1594 0.8979 1.0004]
|
| 91 |
+
[-0.1915 -0.3964 0.1985 -0.8661 0.2356 0.6538 -0.5182 1.0004 0.9961
|
| 92 |
+
0.5895 -0.8656 -0.9508 -0.5705 -0.1468 0.8981 1.0004]
|
| 93 |
+
[-0.1858 -0.3779 0.1959 -0.8533 0.2377 0.6544 -0.54 1.0004 0.9961
|
| 94 |
+
0.5937 -0.8661 -0.9165 -0.5701 -0.1558 0.8978 1.0004]
|
| 95 |
+
[-0.1755 -0.346 0.1944 -0.8215 0.2474 0.6505 -0.5861 1.0004 0.9956
|
| 96 |
+
0.6006 -0.8691 -0.9651 -0.5713 -0.1554 0.897 1.0004]
|
| 97 |
+
[-0.1655 -0.3254 0.1959 -0.7967 0.2527 0.6512 -0.623 1.0004 0.9954
|
| 98 |
+
0.5907 -0.8686 -0.9641 -0.5692 -0.1518 0.8972 1.0004]
|
| 99 |
+
[-0.1601 -0.3024 0.1941 -0.7811 0.2513 0.6474 -0.6495 1.0004 0.9969
|
| 100 |
+
0.5987 -0.8676 -0.9529 -0.5703 -0.1523 0.8993 1.0004]]
|
| 101 |
+
packed_state_32d_shape: (32,)
|
| 102 |
+
packed_state_32d:
|
| 103 |
+
[-0.174 0.1055 -0.0061 1.0124 0.086 -0.4741 0.2016 1.0004 0.
|
| 104 |
+
0. 0. 0. 0. 0. 0. 0. 0.0951 0.0668
|
| 105 |
+
0.0549 1.0086 -0.053 -0.3299 -1.0068 1.0004 0. 0. 0.
|
| 106 |
+
0. 0. 0. 0. 0. ]
|
| 107 |
+
packed_actions_32d_shape: (16, 32)
|
| 108 |
+
packed_actions_32d:
|
| 109 |
+
[[-0.2378 0.0147 0.1124 0.1989 0.1562 0.1251 0.0182 1.0004 0.
|
| 110 |
+
0. 0. 0. 0. 0. 0. 0. 0.1108 0.0624
|
| 111 |
+
0.0823 0.9208 0.055 -0.5935 -0.7448 1.0004 0. 0. 0.
|
| 112 |
+
0. 0. 0. 0. 0. ]
|
| 113 |
+
[-0.2367 -0.0063 0.1178 0.1174 0.1593 0.1567 -0.0046 1.0004 0.
|
| 114 |
+
0. 0. 0. 0. 0. 0. 0. 0.1686 0.107
|
| 115 |
+
0.02 0.7676 0.0127 -0.5697 -0.6371 1.0004 0. 0. 0.
|
| 116 |
+
0. 0. 0. 0. 0. ]
|
| 117 |
+
[-0.2338 -0.092 0.1305 -0.0529 0.1664 0.2368 -0.0585 1.0004 0.
|
| 118 |
+
0. 0. 0. 0. 0. 0. 0. 0.303 0.1794
|
| 119 |
+
-0.1254 0.5072 -0.0788 -0.499 -0.3941 1.0004 0. 0. 0.
|
| 120 |
+
0. 0. 0. 0. 0. ]
|
| 121 |
+
[-0.2306 -0.1792 0.1444 -0.2606 0.1742 0.3352 -0.1219 1.0004 0.
|
| 122 |
+
0. 0. 0. 0. 0. 0. 0. 0.4658 0.2811
|
| 123 |
+
-0.3003 0.1655 -0.1877 -0.4185 -0.1052 1.0004 0. 0. 0.
|
| 124 |
+
0. 0. 0. 0. 0. ]
|
| 125 |
+
[-0.2253 -0.2898 0.1623 -0.4809 0.1834 0.4374 -0.1883 1.0004 0.
|
| 126 |
+
0. 0. 0. 0. 0. 0. 0. 0.6382 0.3768
|
| 127 |
+
-0.484 -0.223 -0.3056 -0.3311 0.2034 1.0004 0. 0. 0.
|
| 128 |
+
0. 0. 0. 0. 0. ]
|
| 129 |
+
[-0.2242 -0.3869 0.1795 -0.7028 0.193 0.5429 -0.2564 1.0004 0.
|
| 130 |
+
0. 0. 0. 0. 0. 0. 0. 0.8128 0.4808
|
| 131 |
+
-0.6717 -0.5936 -0.4217 -0.2404 0.5133 1.0004 0. 0. 0.
|
| 132 |
+
0. 0. 0. 0. 0. ]
|
| 133 |
+
[-0.2168 -0.4344 0.1906 -0.8535 0.2055 0.6163 -0.3344 1.0004 0.
|
| 134 |
+
0. 0. 0. 0. 0. 0. 0. 0.9328 0.5619
|
| 135 |
+
-0.8021 -0.8012 -0.5143 -0.1812 0.7543 1.0004 0. 0. 0.
|
| 136 |
+
0. 0. 0. 0. 0. ]
|
| 137 |
+
[-0.2129 -0.4531 0.1977 -0.8798 0.216 0.6397 -0.3939 1.0004 0.
|
| 138 |
+
0. 0. 0. 0. 0. 0. 0. 0.9729 0.5816
|
| 139 |
+
-0.8455 -0.9078 -0.5517 -0.1682 0.8529 1.0004 0. 0. 0.
|
| 140 |
+
0. 0. 0. 0. 0. ]
|
| 141 |
+
[-0.2095 -0.4359 0.2001 -0.8986 0.2201 0.6499 -0.4256 1.0004 0.
|
| 142 |
+
0. 0. 0. 0. 0. 0. 0. 0.9883 0.6027
|
| 143 |
+
-0.8598 -0.924 -0.5656 -0.1571 0.8841 1.0004 0. 0. 0.
|
| 144 |
+
0. 0. 0. 0. 0. ]
|
| 145 |
+
[-0.2029 -0.4144 0.2002 -0.8812 0.2285 0.6542 -0.4695 1.0004 0.
|
| 146 |
+
0. 0. 0. 0. 0. 0. 0. 0.9937 0.6151
|
| 147 |
+
-0.8644 -0.9542 -0.5684 -0.1527 0.8936 1.0004 0. 0. 0.
|
| 148 |
+
0. 0. 0. 0. 0. ]
|
| 149 |
+
[-0.2012 -0.4035 0.1981 -0.8715 0.2335 0.6544 -0.495 1.0004 0.
|
| 150 |
+
0. 0. 0. 0. 0. 0. 0. 0.9943 0.6049
|
| 151 |
+
-0.8681 -0.986 -0.5713 -0.1594 0.8979 1.0004 0. 0. 0.
|
| 152 |
+
0. 0. 0. 0. 0. ]
|
| 153 |
+
[-0.1915 -0.3964 0.1985 -0.8661 0.2356 0.6538 -0.5182 1.0004 0.
|
| 154 |
+
0. 0. 0. 0. 0. 0. 0. 0.9961 0.5895
|
| 155 |
+
-0.8656 -0.9508 -0.5705 -0.1468 0.8981 1.0004 0. 0. 0.
|
| 156 |
+
0. 0. 0. 0. 0. ]
|
| 157 |
+
[-0.1858 -0.3779 0.1959 -0.8533 0.2377 0.6544 -0.54 1.0004 0.
|
| 158 |
+
0. 0. 0. 0. 0. 0. 0. 0.9961 0.5937
|
| 159 |
+
-0.8661 -0.9165 -0.5701 -0.1558 0.8978 1.0004 0. 0. 0.
|
| 160 |
+
0. 0. 0. 0. 0. ]
|
| 161 |
+
[-0.1755 -0.346 0.1944 -0.8215 0.2474 0.6505 -0.5861 1.0004 0.
|
| 162 |
+
0. 0. 0. 0. 0. 0. 0. 0.9956 0.6006
|
| 163 |
+
-0.8691 -0.9651 -0.5713 -0.1554 0.897 1.0004 0. 0. 0.
|
| 164 |
+
0. 0. 0. 0. 0. ]
|
| 165 |
+
[-0.1655 -0.3254 0.1959 -0.7967 0.2527 0.6512 -0.623 1.0004 0.
|
| 166 |
+
0. 0. 0. 0. 0. 0. 0. 0.9954 0.5907
|
| 167 |
+
-0.8686 -0.9641 -0.5692 -0.1518 0.8972 1.0004 0. 0. 0.
|
| 168 |
+
0. 0. 0. 0. 0. ]
|
| 169 |
+
[-0.1601 -0.3024 0.1941 -0.7811 0.2513 0.6474 -0.6495 1.0004 0.
|
| 170 |
+
0. 0. 0. 0. 0. 0. 0. 0.9969 0.5987
|
| 171 |
+
-0.8676 -0.9529 -0.5703 -0.1523 0.8993 1.0004 0. 0. 0.
|
| 172 |
+
0. 0. 0. 0. 0. ]]
|
| 173 |
+
state_padded_zero_count: 16 / 16
|
| 174 |
+
actions_padded_zero_count: 256 / 256
|
| 175 |
+
state_padded_exact_zero: True
|
| 176 |
+
actions_padded_exact_zero: True
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20.log
ADDED
|
@@ -0,0 +1,241 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
W0308 22:58:43.681000 16356 torch/distributed/run.py:766]
|
| 2 |
+
W0308 22:58:43.681000 16356 torch/distributed/run.py:766] *****************************************
|
| 3 |
+
W0308 22:58:43.681000 16356 torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
|
| 4 |
+
W0308 22:58:43.681000 16356 torch/distributed/run.py:766] *****************************************
|
| 5 |
+
23:00:43.715 [I] Overwriting checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20 (16451:train_pytorch.py:451)
|
| 6 |
+
23:00:43.718 [I] Created experiment checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20 (16451:train_pytorch.py:458)
|
| 7 |
+
23:00:43.762 [I] Using batch size per GPU: 4 (total batch size across 4 GPUs: 16) (16451:train_pytorch.py:474)
|
| 8 |
+
23:00:43.844 [I] Loaded norm stats from /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train (16451:config.py:234)
|
| 9 |
+
23:00:43.846 [I] data_config: DataConfig(repo_id='lsnu/twin_handover_256_train', asset_id='lsnu/twin_handover_256_train', norm_stats={'state': NormStats(mean=array([ 0.40321857, 0.17899239, -0.07588876, -2.06326795, -0.46418607,
|
| 10 |
+
1.79356563, 0.70229131, 0.48194093, 0.93952829, 0.86693275,
|
| 11 |
+
-1.03168762, -1.9056077 , -0.53421056, 1.87584054, 2.36738205,
|
| 12 |
+
0.91249251]), std=array([0.73344636, 0.47653052, 0.72710407, 0.42399687, 0.63613892,
|
| 13 |
+
0.61144608, 1.11724186, 0.49967375, 0.86981195, 0.75071597,
|
| 14 |
+
0.90787333, 0.35008711, 0.51183224, 0.36600712, 0.56947577,
|
| 15 |
+
0.28257725]), q01=array([-1.52408956, -1.32446341, -1.91092197, -2.89885788, -1.66315554,
|
| 16 |
+
0.59010215, -2.27611645, 0. , -1.77352981, -1.62131719,
|
| 17 |
+
-1.77092851, -2.19172778, -2.03159353, 0.55409113, 0.79255736,
|
| 18 |
+
0. ]), q99=array([ 2.16638614, 1.38857444, 1.93436338, -0.88548369, 1.39976143,
|
| 19 |
+
2.99162304, 2.8194857 , 0.9998 , 1.46557211, 1.74660106,
|
| 20 |
+
1.58644652, -0.87876934, 2.25910752, 2.54628449, 2.89347284,
|
| 21 |
+
0.9998 ])), 'actions': NormStats(mean=array([ 0.05879939, -0.00704042, -0.02719213, -0.07685276, -0.07520971,
|
| 22 |
+
-0.00498583, 0.03577602, 0.48164892, 0.06564316, 0.06023132,
|
| 23 |
+
-0.10068271, -0.09547432, -0.0526481 , 0.08205888, 0.13954687,
|
| 24 |
+
0.88333535]), std=array([0.18337056, 0.28128958, 0.18525195, 0.29767084, 0.22944973,
|
| 25 |
+
0.40312037, 0.3896611 , 0.49966311, 0.21938531, 0.16883859,
|
| 26 |
+
0.20206179, 0.14864719, 0.12629333, 0.15546791, 0.23423795,
|
| 27 |
+
0.32102022]), q01=array([-0.34140511, -0.71597991, -0.55301429, -0.8233152 , -0.68097536,
|
| 28 |
+
-0.87723451, -0.86000918, 0. , -0.53261366, -0.49289397,
|
| 29 |
+
-0.48524564, -0.35752607, -0.42426748, -0.18230745, -0.09212705,
|
| 30 |
+
0. ]), q99=array([0.55444025, 0.69361174, 0.44115428, 0.550829 , 0.49707318,
|
| 31 |
+
0.68353445, 0.82907713, 0.9998 , 0.42654409, 0.44255511,
|
| 32 |
+
0.4114292 , 0.01550327, 0.38038206, 0.71452535, 0.62808441,
|
| 33 |
+
0.9998 ]))}, repack_transforms=Group(inputs=[RepackTransform(structure={'images': {'cam_high': 'front_image', 'cam_left_wrist': 'wrist_left_image', 'cam_right_wrist': 'wrist_right_image'}, 'state': 'state', 'actions': 'action', 'prompt': 'task'})], outputs=()), data_transforms=Group(inputs=[AlohaInputs(adapt_to_pi=False)], outputs=[]), model_transforms=Group(inputs=[InjectDefaultPrompt(prompt=None), ResizeImages(height=224, width=224), TokenizePrompt(tokenizer=<openpi.models.tokenizer.PaligemmaTokenizer object at 0x702ed02c29d0>, discrete_state_input=True), PackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))], outputs=[UnpackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))]), use_quantile_norm=True, action_sequence_keys=('action',), prompt_from_task=False, rlds_data_dir=None, action_space=None, datasets=()) (16451:data_loader.py:282)
|
| 34 |
+
23:00:43.849 [I] Using existing local LeRobot dataset mirror for lsnu/twin_handover_256_train: /workspace/lerobot/lsnu/twin_handover_256_train (16451:data_loader.py:148)
|
| 35 |
+
23:00:43.958 [I] Overwriting checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20 (16454:train_pytorch.py:451)
|
| 36 |
+
23:00:43.959 [I] Created experiment checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20 (16454:train_pytorch.py:458)
|
| 37 |
+
23:00:43.959 [I] Using batch size per GPU: 4 (total batch size across 4 GPUs: 16) (16454:train_pytorch.py:474)
|
| 38 |
+
23:00:44.046 [I] Loaded norm stats from /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train (16454:config.py:234)
|
| 39 |
+
23:00:44.048 [I] data_config: DataConfig(repo_id='lsnu/twin_handover_256_train', asset_id='lsnu/twin_handover_256_train', norm_stats={'state': NormStats(mean=array([ 0.40321857, 0.17899239, -0.07588876, -2.06326795, -0.46418607,
|
| 40 |
+
1.79356563, 0.70229131, 0.48194093, 0.93952829, 0.86693275,
|
| 41 |
+
-1.03168762, -1.9056077 , -0.53421056, 1.87584054, 2.36738205,
|
| 42 |
+
0.91249251]), std=array([0.73344636, 0.47653052, 0.72710407, 0.42399687, 0.63613892,
|
| 43 |
+
0.61144608, 1.11724186, 0.49967375, 0.86981195, 0.75071597,
|
| 44 |
+
0.90787333, 0.35008711, 0.51183224, 0.36600712, 0.56947577,
|
| 45 |
+
0.28257725]), q01=array([-1.52408956, -1.32446341, -1.91092197, -2.89885788, -1.66315554,
|
| 46 |
+
0.59010215, -2.27611645, 0. , -1.77352981, -1.62131719,
|
| 47 |
+
-1.77092851, -2.19172778, -2.03159353, 0.55409113, 0.79255736,
|
| 48 |
+
0. ]), q99=array([ 2.16638614, 1.38857444, 1.93436338, -0.88548369, 1.39976143,
|
| 49 |
+
2.99162304, 2.8194857 , 0.9998 , 1.46557211, 1.74660106,
|
| 50 |
+
1.58644652, -0.87876934, 2.25910752, 2.54628449, 2.89347284,
|
| 51 |
+
0.9998 ])), 'actions': NormStats(mean=array([ 0.05879939, -0.00704042, -0.02719213, -0.07685276, -0.07520971,
|
| 52 |
+
-0.00498583, 0.03577602, 0.48164892, 0.06564316, 0.06023132,
|
| 53 |
+
-0.10068271, -0.09547432, -0.0526481 , 0.08205888, 0.13954687,
|
| 54 |
+
0.88333535]), std=array([0.18337056, 0.28128958, 0.18525195, 0.29767084, 0.22944973,
|
| 55 |
+
0.40312037, 0.3896611 , 0.49966311, 0.21938531, 0.16883859,
|
| 56 |
+
0.20206179, 0.14864719, 0.12629333, 0.15546791, 0.23423795,
|
| 57 |
+
0.32102022]), q01=array([-0.34140511, -0.71597991, -0.55301429, -0.8233152 , -0.68097536,
|
| 58 |
+
-0.87723451, -0.86000918, 0. , -0.53261366, -0.49289397,
|
| 59 |
+
-0.48524564, -0.35752607, -0.42426748, -0.18230745, -0.09212705,
|
| 60 |
+
0. ]), q99=array([0.55444025, 0.69361174, 0.44115428, 0.550829 , 0.49707318,
|
| 61 |
+
0.68353445, 0.82907713, 0.9998 , 0.42654409, 0.44255511,
|
| 62 |
+
0.4114292 , 0.01550327, 0.38038206, 0.71452535, 0.62808441,
|
| 63 |
+
0.9998 ]))}, repack_transforms=Group(inputs=[RepackTransform(structure={'images': {'cam_high': 'front_image', 'cam_left_wrist': 'wrist_left_image', 'cam_right_wrist': 'wrist_right_image'}, 'state': 'state', 'actions': 'action', 'prompt': 'task'})], outputs=()), data_transforms=Group(inputs=[AlohaInputs(adapt_to_pi=False)], outputs=[]), model_transforms=Group(inputs=[InjectDefaultPrompt(prompt=None), ResizeImages(height=224, width=224), TokenizePrompt(tokenizer=<openpi.models.tokenizer.PaligemmaTokenizer object at 0x79acff7466d0>, discrete_state_input=True), PackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))], outputs=[UnpackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))]), use_quantile_norm=True, action_sequence_keys=('action',), prompt_from_task=False, rlds_data_dir=None, action_space=None, datasets=()) (16454:data_loader.py:282)
|
| 64 |
+
23:00:44.049 [I] Using existing local LeRobot dataset mirror for lsnu/twin_handover_256_train: /workspace/lerobot/lsnu/twin_handover_256_train (16454:data_loader.py:148)
|
| 65 |
+
23:00:45.456 [I] Overwriting checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20 (16452:train_pytorch.py:451)
|
| 66 |
+
23:00:45.458 [I] Created experiment checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20 (16452:train_pytorch.py:458)
|
| 67 |
+
23:00:45.458 [I] Using batch size per GPU: 4 (total batch size across 4 GPUs: 16) (16452:train_pytorch.py:474)
|
| 68 |
+
23:00:45.548 [I] Loaded norm stats from /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train (16452:config.py:234)
|
| 69 |
+
23:00:45.549 [I] data_config: DataConfig(repo_id='lsnu/twin_handover_256_train', asset_id='lsnu/twin_handover_256_train', norm_stats={'state': NormStats(mean=array([ 0.40321857, 0.17899239, -0.07588876, -2.06326795, -0.46418607,
|
| 70 |
+
1.79356563, 0.70229131, 0.48194093, 0.93952829, 0.86693275,
|
| 71 |
+
-1.03168762, -1.9056077 , -0.53421056, 1.87584054, 2.36738205,
|
| 72 |
+
0.91249251]), std=array([0.73344636, 0.47653052, 0.72710407, 0.42399687, 0.63613892,
|
| 73 |
+
0.61144608, 1.11724186, 0.49967375, 0.86981195, 0.75071597,
|
| 74 |
+
0.90787333, 0.35008711, 0.51183224, 0.36600712, 0.56947577,
|
| 75 |
+
0.28257725]), q01=array([-1.52408956, -1.32446341, -1.91092197, -2.89885788, -1.66315554,
|
| 76 |
+
0.59010215, -2.27611645, 0. , -1.77352981, -1.62131719,
|
| 77 |
+
-1.77092851, -2.19172778, -2.03159353, 0.55409113, 0.79255736,
|
| 78 |
+
0. ]), q99=array([ 2.16638614, 1.38857444, 1.93436338, -0.88548369, 1.39976143,
|
| 79 |
+
2.99162304, 2.8194857 , 0.9998 , 1.46557211, 1.74660106,
|
| 80 |
+
1.58644652, -0.87876934, 2.25910752, 2.54628449, 2.89347284,
|
| 81 |
+
0.9998 ])), 'actions': NormStats(mean=array([ 0.05879939, -0.00704042, -0.02719213, -0.07685276, -0.07520971,
|
| 82 |
+
-0.00498583, 0.03577602, 0.48164892, 0.06564316, 0.06023132,
|
| 83 |
+
-0.10068271, -0.09547432, -0.0526481 , 0.08205888, 0.13954687,
|
| 84 |
+
0.88333535]), std=array([0.18337056, 0.28128958, 0.18525195, 0.29767084, 0.22944973,
|
| 85 |
+
0.40312037, 0.3896611 , 0.49966311, 0.21938531, 0.16883859,
|
| 86 |
+
0.20206179, 0.14864719, 0.12629333, 0.15546791, 0.23423795,
|
| 87 |
+
0.32102022]), q01=array([-0.34140511, -0.71597991, -0.55301429, -0.8233152 , -0.68097536,
|
| 88 |
+
-0.87723451, -0.86000918, 0. , -0.53261366, -0.49289397,
|
| 89 |
+
-0.48524564, -0.35752607, -0.42426748, -0.18230745, -0.09212705,
|
| 90 |
+
0. ]), q99=array([0.55444025, 0.69361174, 0.44115428, 0.550829 , 0.49707318,
|
| 91 |
+
0.68353445, 0.82907713, 0.9998 , 0.42654409, 0.44255511,
|
| 92 |
+
0.4114292 , 0.01550327, 0.38038206, 0.71452535, 0.62808441,
|
| 93 |
+
0.9998 ]))}, repack_transforms=Group(inputs=[RepackTransform(structure={'images': {'cam_high': 'front_image', 'cam_left_wrist': 'wrist_left_image', 'cam_right_wrist': 'wrist_right_image'}, 'state': 'state', 'actions': 'action', 'prompt': 'task'})], outputs=()), data_transforms=Group(inputs=[AlohaInputs(adapt_to_pi=False)], outputs=[]), model_transforms=Group(inputs=[InjectDefaultPrompt(prompt=None), ResizeImages(height=224, width=224), TokenizePrompt(tokenizer=<openpi.models.tokenizer.PaligemmaTokenizer object at 0x7736f700ba90>, discrete_state_input=True), PackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))], outputs=[UnpackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))]), use_quantile_norm=True, action_sequence_keys=('action',), prompt_from_task=False, rlds_data_dir=None, action_space=None, datasets=()) (16452:data_loader.py:282)
|
| 94 |
+
23:00:45.551 [I] Using existing local LeRobot dataset mirror for lsnu/twin_handover_256_train: /workspace/lerobot/lsnu/twin_handover_256_train (16452:data_loader.py:148)
|
| 95 |
+
23:00:45.562 [I] local_batch_size: 4 (16451:data_loader.py:363)
|
| 96 |
+
23:00:45.861 [I] local_batch_size: 4 (16454:data_loader.py:363)
|
| 97 |
+
23:00:47.007 [I] local_batch_size: 4 (16452:data_loader.py:363)
|
| 98 |
+
23:00:47.287 [I] Overwriting checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20 (16453:train_pytorch.py:451)
|
| 99 |
+
23:00:47.290 [I] Created experiment checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20 (16453:train_pytorch.py:458)
|
| 100 |
+
23:00:47.291 [I] Using batch size per GPU: 4 (total batch size across 4 GPUs: 16) (16453:train_pytorch.py:474)
|
| 101 |
+
INFO:2026-03-08 23:00:47,419:jax._src.xla_bridge:925: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
|
| 102 |
+
23:00:47.419 [I] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' (16454:xla_bridge.py:925)
|
| 103 |
+
INFO:2026-03-08 23:00:47,435:jax._src.xla_bridge:925: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
|
| 104 |
+
23:00:47.435 [I] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory (16454:xla_bridge.py:925)
|
| 105 |
+
23:00:47.437 [I] Loaded norm stats from /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train (16453:config.py:234)
|
| 106 |
+
23:00:47.440 [I] data_config: DataConfig(repo_id='lsnu/twin_handover_256_train', asset_id='lsnu/twin_handover_256_train', norm_stats={'state': NormStats(mean=array([ 0.40321857, 0.17899239, -0.07588876, -2.06326795, -0.46418607,
|
| 107 |
+
1.79356563, 0.70229131, 0.48194093, 0.93952829, 0.86693275,
|
| 108 |
+
-1.03168762, -1.9056077 , -0.53421056, 1.87584054, 2.36738205,
|
| 109 |
+
0.91249251]), std=array([0.73344636, 0.47653052, 0.72710407, 0.42399687, 0.63613892,
|
| 110 |
+
0.61144608, 1.11724186, 0.49967375, 0.86981195, 0.75071597,
|
| 111 |
+
0.90787333, 0.35008711, 0.51183224, 0.36600712, 0.56947577,
|
| 112 |
+
0.28257725]), q01=array([-1.52408956, -1.32446341, -1.91092197, -2.89885788, -1.66315554,
|
| 113 |
+
0.59010215, -2.27611645, 0. , -1.77352981, -1.62131719,
|
| 114 |
+
-1.77092851, -2.19172778, -2.03159353, 0.55409113, 0.79255736,
|
| 115 |
+
0. ]), q99=array([ 2.16638614, 1.38857444, 1.93436338, -0.88548369, 1.39976143,
|
| 116 |
+
2.99162304, 2.8194857 , 0.9998 , 1.46557211, 1.74660106,
|
| 117 |
+
1.58644652, -0.87876934, 2.25910752, 2.54628449, 2.89347284,
|
| 118 |
+
0.9998 ])), 'actions': NormStats(mean=array([ 0.05879939, -0.00704042, -0.02719213, -0.07685276, -0.07520971,
|
| 119 |
+
-0.00498583, 0.03577602, 0.48164892, 0.06564316, 0.06023132,
|
| 120 |
+
-0.10068271, -0.09547432, -0.0526481 , 0.08205888, 0.13954687,
|
| 121 |
+
0.88333535]), std=array([0.18337056, 0.28128958, 0.18525195, 0.29767084, 0.22944973,
|
| 122 |
+
0.40312037, 0.3896611 , 0.49966311, 0.21938531, 0.16883859,
|
| 123 |
+
0.20206179, 0.14864719, 0.12629333, 0.15546791, 0.23423795,
|
| 124 |
+
0.32102022]), q01=array([-0.34140511, -0.71597991, -0.55301429, -0.8233152 , -0.68097536,
|
| 125 |
+
-0.87723451, -0.86000918, 0. , -0.53261366, -0.49289397,
|
| 126 |
+
-0.48524564, -0.35752607, -0.42426748, -0.18230745, -0.09212705,
|
| 127 |
+
0. ]), q99=array([0.55444025, 0.69361174, 0.44115428, 0.550829 , 0.49707318,
|
| 128 |
+
0.68353445, 0.82907713, 0.9998 , 0.42654409, 0.44255511,
|
| 129 |
+
0.4114292 , 0.01550327, 0.38038206, 0.71452535, 0.62808441,
|
| 130 |
+
0.9998 ]))}, repack_transforms=Group(inputs=[RepackTransform(structure={'images': {'cam_high': 'front_image', 'cam_left_wrist': 'wrist_left_image', 'cam_right_wrist': 'wrist_right_image'}, 'state': 'state', 'actions': 'action', 'prompt': 'task'})], outputs=()), data_transforms=Group(inputs=[AlohaInputs(adapt_to_pi=False)], outputs=[]), model_transforms=Group(inputs=[InjectDefaultPrompt(prompt=None), ResizeImages(height=224, width=224), TokenizePrompt(tokenizer=<openpi.models.tokenizer.PaligemmaTokenizer object at 0x728778855290>, discrete_state_input=True), PackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))], outputs=[UnpackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))]), use_quantile_norm=True, action_sequence_keys=('action',), prompt_from_task=False, rlds_data_dir=None, action_space=None, datasets=()) (16453:data_loader.py:282)
|
| 131 |
+
23:00:47.459 [I] Using existing local LeRobot dataset mirror for lsnu/twin_handover_256_train: /workspace/lerobot/lsnu/twin_handover_256_train (16453:data_loader.py:148)
|
| 132 |
+
INFO:2026-03-08 23:00:47,514:jax._src.xla_bridge:925: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
|
| 133 |
+
23:00:47.514 [I] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' (16451:xla_bridge.py:925)
|
| 134 |
+
INFO:2026-03-08 23:00:47,530:jax._src.xla_bridge:925: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
|
| 135 |
+
23:00:47.530 [I] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory (16451:xla_bridge.py:925)
|
| 136 |
+
INFO:2026-03-08 23:00:48,755:jax._src.xla_bridge:925: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
|
| 137 |
+
23:00:48.755 [I] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' (16452:xla_bridge.py:925)
|
| 138 |
+
INFO:2026-03-08 23:00:48,768:jax._src.xla_bridge:925: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
|
| 139 |
+
23:00:48.768 [I] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory (16452:xla_bridge.py:925)
|
| 140 |
+
23:00:49.029 [I] local_batch_size: 4 (16453:data_loader.py:363)
|
| 141 |
+
INFO:2026-03-08 23:00:49,834:jax._src.xla_bridge:925: Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
|
| 142 |
+
23:00:49.834 [I] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig' (16453:xla_bridge.py:925)
|
| 143 |
+
INFO:2026-03-08 23:00:49,836:jax._src.xla_bridge:925: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
|
| 144 |
+
23:00:49.836 [I] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory (16453:xla_bridge.py:925)
|
| 145 |
+
23:01:43.138 [I] Enabled gradient checkpointing for PI0Pytorch model (16451:pi0_pytorch.py:148)
|
| 146 |
+
23:01:43.139 [I] Enabled gradient checkpointing for memory optimization (16451:train_pytorch.py:535)
|
| 147 |
+
23:01:43.139 [I] Step 0 (after_model_creation): GPU memory - allocated: 7.47GB, reserved: 7.48GB, free: 0.01GB, peak_allocated: 7.47GB, peak_reserved: 7.48GB | DDP: rank=0, world_size=4 (16451:train_pytorch.py:422)
|
| 148 |
+
23:01:43.801 [I] Enabled gradient checkpointing for PI0Pytorch model (16454:pi0_pytorch.py:148)
|
| 149 |
+
23:01:43.802 [I] Enabled gradient checkpointing for memory optimization (16454:train_pytorch.py:535)
|
| 150 |
+
23:01:44.623 [I] Enabled gradient checkpointing for PI0Pytorch model (16452:pi0_pytorch.py:148)
|
| 151 |
+
23:01:44.623 [I] Enabled gradient checkpointing for memory optimization (16452:train_pytorch.py:535)
|
| 152 |
+
23:01:45.354 [I] Enabled gradient checkpointing for PI0Pytorch model (16453:pi0_pytorch.py:148)
|
| 153 |
+
23:01:45.354 [I] Enabled gradient checkpointing for memory optimization (16453:train_pytorch.py:535)
|
| 154 |
+
23:01:46.643 [I] Loading weights from: /workspace/checkpoints/pi05_base_single_pytorch (16451:train_pytorch.py:564)
|
| 155 |
+
23:01:46.648 [I] Loading weights from: /workspace/checkpoints/pi05_base_single_pytorch (16454:train_pytorch.py:564)
|
| 156 |
+
23:01:46.648 [I] Loading weights from: /workspace/checkpoints/pi05_base_single_pytorch (16453:train_pytorch.py:564)
|
| 157 |
+
23:01:46.648 [I] Loading weights from: /workspace/checkpoints/pi05_base_single_pytorch (16452:train_pytorch.py:564)
|
| 158 |
+
23:01:48.714 [I] Weight loading missing key count: 0 (16451:train_pytorch.py:572)
|
| 159 |
+
23:01:48.714 [I] Weight loading missing keys: set() (16451:train_pytorch.py:573)
|
| 160 |
+
23:01:48.715 [I] Weight loading unexpected key count: 0 (16451:train_pytorch.py:574)
|
| 161 |
+
23:01:48.715 [I] Weight loading unexpected keys: [] (16451:train_pytorch.py:575)
|
| 162 |
+
23:01:48.715 [I] Loaded PyTorch weights from /workspace/checkpoints/pi05_base_single_pytorch (16451:train_pytorch.py:576)
|
| 163 |
+
23:01:48.722 [I] Running on: 9e9e564d5d6e | world_size=4 (16451:train_pytorch.py:616)
|
| 164 |
+
23:01:48.722 [I] Training config: batch_size=16, effective_batch_size=4, num_train_steps=20 (16451:train_pytorch.py:617)
|
| 165 |
+
23:01:48.723 [I] Memory optimizations: gradient_checkpointing=True (16451:train_pytorch.py:620)
|
| 166 |
+
23:01:48.724 [I] LR schedule: warmup=200, peak_lr=2.50e-05, decay_steps=2000, end_lr=2.50e-06 (16451:train_pytorch.py:621)
|
| 167 |
+
23:01:48.724 [I] Optimizer: AdamW, weight_decay=1e-10, clip_norm=1.0 (16451:train_pytorch.py:624)
|
| 168 |
+
23:01:48.724 [I] EMA is not supported for PyTorch training (16451:train_pytorch.py:627)
|
| 169 |
+
23:01:48.725 [I] Training precision: bfloat16 (16451:train_pytorch.py:628)
|
| 170 |
+
23:01:48.733 [I] Resolved config name: pi05_twin_handover_256_packed_baseline_pytorch_2k (16451:train_pytorch.py:234)
|
| 171 |
+
23:01:48.733 [I] Dataset repo_id: lsnu/twin_handover_256_train (16451:train_pytorch.py:235)
|
| 172 |
+
23:01:48.733 [I] Norm-stats file path: /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json (16451:train_pytorch.py:236)
|
| 173 |
+
23:01:48.734 [I] Norm-stats summary: {'keys': ['actions', 'state'], 'state_mean_len': 16, 'state_std_len': 16, 'actions_mean_len': 16, 'actions_std_len': 16} (16451:train_pytorch.py:237)
|
| 174 |
+
23:01:48.734 [I] Checkpoint source path: /workspace/checkpoints/pi05_base_single_pytorch (16451:train_pytorch.py:238)
|
| 175 |
+
23:01:48.734 [I] Model type: baseline (16451:train_pytorch.py:239)
|
| 176 |
+
23:01:48.734 [I] Packed transforms active: True (16451:train_pytorch.py:240)
|
| 177 |
+
23:01:48.734 [I] World size: 4 (16451:train_pytorch.py:241)
|
| 178 |
+
23:01:48.735 [I] Batch size: local=4, global=16 (16451:train_pytorch.py:242)
|
| 179 |
+
23:01:48.735 [I] num_workers: 8 (16451:train_pytorch.py:243)
|
| 180 |
+
23:01:48.735 [I] Precision: bfloat16 (16451:train_pytorch.py:244)
|
| 181 |
+
23:01:48.735 [I] LR schedule summary: warmup_steps=200, peak_lr=2.50e-05, decay_steps=2000, decay_lr=2.50e-06 (16451:train_pytorch.py:245)
|
| 182 |
+
23:01:48.736 [I] Save/log intervals: save_interval=250, log_interval=10 (16451:train_pytorch.py:252)
|
| 183 |
+
23:01:48.736 [I] Action-loss mask: (1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) (16451:train_pytorch.py:253)
|
| 184 |
+
23:01:48.736 [I] Active mask dims: [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] (16451:train_pytorch.py:254)
|
| 185 |
+
23:01:48.736 [I] Masked dims: [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] (16451:train_pytorch.py:255)
|
| 186 |
+
|
| 187 |
+
23:01:48.822 [I] Weight loading missing key count: 0 (16453:train_pytorch.py:572)
|
| 188 |
+
23:01:48.822 [I] Weight loading missing keys: set() (16454:train_pytorch.py:573)
|
| 189 |
+
23:01:48.823 [I] Weight loading missing keys: set() (16453:train_pytorch.py:573)
|
| 190 |
+
23:01:48.823 [I] Weight loading unexpected key count: 0 (16454:train_pytorch.py:574)
|
| 191 |
+
23:01:48.823 [I] Weight loading missing key count: 0 (16452:train_pytorch.py:572)
|
| 192 |
+
23:01:48.823 [I] Weight loading unexpected key count: 0 (16453:train_pytorch.py:574)
|
| 193 |
+
23:01:48.823 [I] Weight loading unexpected keys: [] (16454:train_pytorch.py:575)
|
| 194 |
+
23:01:48.823 [I] Weight loading missing keys: set() (16452:train_pytorch.py:573)
|
| 195 |
+
23:01:48.824 [I] Weight loading unexpected keys: [] (16453:train_pytorch.py:575)
|
| 196 |
+
23:01:48.824 [I] Loaded PyTorch weights from /workspace/checkpoints/pi05_base_single_pytorch (16454:train_pytorch.py:576)
|
| 197 |
+
23:01:48.824 [I] Weight loading unexpected key count: 0 (16452:train_pytorch.py:574)
|
| 198 |
+
23:01:48.824 [I] Loaded PyTorch weights from /workspace/checkpoints/pi05_base_single_pytorch (16453:train_pytorch.py:576)
|
| 199 |
+
23:01:48.825 [I] Weight loading unexpected keys: [] (16452:train_pytorch.py:575)
|
| 200 |
+
23:01:48.825 [I] Loaded PyTorch weights from /workspace/checkpoints/pi05_base_single_pytorch (16452:train_pytorch.py:576)
|
| 201 |
+
W0308 23:06:44.622000 16356 torch/distributed/elastic/agent/server/api.py:719] Received 15 death signal, shutting down workers
|
| 202 |
+
W0308 23:06:44.645000 16356 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 16451 closing signal SIGTERM
|
| 203 |
+
W0308 23:06:44.659000 16356 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 16452 closing signal SIGTERM
|
| 204 |
+
W0308 23:06:44.679000 16356 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 16453 closing signal SIGTERM
|
| 205 |
+
W0308 23:06:44.728000 16356 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 16454 closing signal SIGTERM
|
| 206 |
+
/usr/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 14 leaked semaphore objects to clean up at shutdown
|
| 207 |
+
warnings.warn('resource_tracker: There appear to be %d '
|
| 208 |
+
/usr/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 14 leaked semaphore objects to clean up at shutdown
|
| 209 |
+
warnings.warn('resource_tracker: There appear to be %d '
|
| 210 |
+
/usr/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 14 leaked semaphore objects to clean up at shutdown
|
| 211 |
+
warnings.warn('resource_tracker: There appear to be %d '
|
| 212 |
+
/usr/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 14 leaked semaphore objects to clean up at shutdown
|
| 213 |
+
warnings.warn('resource_tracker: There appear to be %d '
|
| 214 |
+
Traceback (most recent call last):
|
| 215 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/bin/torchrun", line 10, in <module>
|
| 216 |
+
sys.exit(main())
|
| 217 |
+
^^^^^^
|
| 218 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
|
| 219 |
+
return f(*args, **kwargs)
|
| 220 |
+
^^^^^^^^^^^^^^^^^^
|
| 221 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in main
|
| 222 |
+
run(args)
|
| 223 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/run.py", line 883, in run
|
| 224 |
+
elastic_launch(
|
| 225 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
|
| 226 |
+
return launch_agent(self._config, self._entrypoint, list(args))
|
| 227 |
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 228 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
|
| 229 |
+
result = agent.run()
|
| 230 |
+
^^^^^^^^^^^
|
| 231 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 138, in wrapper
|
| 232 |
+
result = f(*args, **kwargs)
|
| 233 |
+
^^^^^^^^^^^^^^^^^^
|
| 234 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 711, in run
|
| 235 |
+
result = self._invoke_run(role)
|
| 236 |
+
^^^^^^^^^^^^^^^^^^^^^^
|
| 237 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 870, in _invoke_run
|
| 238 |
+
time.sleep(monitor_interval)
|
| 239 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 84, in _terminate_process_handler
|
| 240 |
+
raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
|
| 241 |
+
torch.distributed.elastic.multiprocessing.api.SignalException: Process 16356 got signal: 15
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20b.log
ADDED
|
File without changes
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20d.log
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
W0308 23:09:45.070000 19958 torch/distributed/run.py:766]
|
| 2 |
+
W0308 23:09:45.070000 19958 torch/distributed/run.py:766] *****************************************
|
| 3 |
+
W0308 23:09:45.070000 19958 torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
|
| 4 |
+
W0308 23:09:45.070000 19958 torch/distributed/run.py:766] *****************************************
|
| 5 |
+
W0308 23:12:25.090000 19958 torch/distributed/elastic/agent/server/api.py:719] Received 15 death signal, shutting down workers
|
| 6 |
+
W0308 23:12:25.147000 19958 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 20051 closing signal SIGTERM
|
| 7 |
+
Traceback (most recent call last):
|
| 8 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/bin/torchrun", line 10, in <module>
|
| 9 |
+
sys.exit(main())
|
| 10 |
+
^^^^^^
|
| 11 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
|
| 12 |
+
return f(*args, **kwargs)
|
| 13 |
+
^^^^^^^^^^^^^^^^^^
|
| 14 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in main
|
| 15 |
+
run(args)
|
| 16 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/run.py", line 883, in run
|
| 17 |
+
elastic_launch(
|
| 18 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
|
| 19 |
+
return launch_agent(self._config, self._entrypoint, list(args))
|
| 20 |
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 21 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
|
| 22 |
+
result = agent.run()
|
| 23 |
+
^^^^^^^^^^^
|
| 24 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 138, in wrapper
|
| 25 |
+
result = f(*args, **kwargs)
|
| 26 |
+
^^^^^^^^^^^^^^^^^^
|
| 27 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 711, in run
|
| 28 |
+
result = self._invoke_run(role)
|
| 29 |
+
^^^^^^^^^^^^^^^^^^^^^^
|
| 30 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 870, in _invoke_run
|
| 31 |
+
time.sleep(monitor_interval)
|
| 32 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 84, in _terminate_process_handler
|
| 33 |
+
raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
|
| 34 |
+
torch.distributed.elastic.multiprocessing.api.SignalException: Process 19958 got signal: 15
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20e.log
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
W0308 23:13:16.278000 20146 torch/distributed/run.py:766]
|
| 2 |
+
W0308 23:13:16.278000 20146 torch/distributed/run.py:766] *****************************************
|
| 3 |
+
W0308 23:13:16.278000 20146 torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
|
| 4 |
+
W0308 23:13:16.278000 20146 torch/distributed/run.py:766] *****************************************
|
| 5 |
+
W0308 23:15:58.203000 20146 torch/distributed/elastic/agent/server/api.py:719] Received 15 death signal, shutting down workers
|
| 6 |
+
W0308 23:15:58.263000 20146 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 20244 closing signal SIGTERM
|
| 7 |
+
Traceback (most recent call last):
|
| 8 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/bin/torchrun", line 10, in <module>
|
| 9 |
+
sys.exit(main())
|
| 10 |
+
^^^^^^
|
| 11 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
|
| 12 |
+
return f(*args, **kwargs)
|
| 13 |
+
^^^^^^^^^^^^^^^^^^
|
| 14 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in main
|
| 15 |
+
run(args)
|
| 16 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/run.py", line 883, in run
|
| 17 |
+
elastic_launch(
|
| 18 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
|
| 19 |
+
return launch_agent(self._config, self._entrypoint, list(args))
|
| 20 |
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 21 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
|
| 22 |
+
result = agent.run()
|
| 23 |
+
^^^^^^^^^^^
|
| 24 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/metrics/api.py", line 138, in wrapper
|
| 25 |
+
result = f(*args, **kwargs)
|
| 26 |
+
^^^^^^^^^^^^^^^^^^
|
| 27 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 711, in run
|
| 28 |
+
result = self._invoke_run(role)
|
| 29 |
+
^^^^^^^^^^^^^^^^^^^^^^
|
| 30 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/agent/server/api.py", line 870, in _invoke_run
|
| 31 |
+
time.sleep(monitor_interval)
|
| 32 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 84, in _terminate_process_handler
|
| 33 |
+
raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
|
| 34 |
+
torch.distributed.elastic.multiprocessing.api.SignalException: Process 20146 got signal: 15
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20k.log
ADDED
|
@@ -0,0 +1,234 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
W0308 23:45:59.171000 25558 torch/distributed/run.py:766]
|
| 2 |
+
W0308 23:45:59.171000 25558 torch/distributed/run.py:766] *****************************************
|
| 3 |
+
W0308 23:45:59.171000 25558 torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
|
| 4 |
+
W0308 23:45:59.171000 25558 torch/distributed/run.py:766] *****************************************
|
| 5 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 6 |
+
warnings.warn( # warn only once
|
| 7 |
+
[rank1]:[W308 23:48:06.218806836 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
|
| 8 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 9 |
+
warnings.warn( # warn only once
|
| 10 |
+
[rank3]:[W308 23:48:09.583585113 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
|
| 11 |
+
23:48:18.157 [I] Created experiment checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20k (25643:train_pytorch.py:478)
|
| 12 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 13 |
+
warnings.warn( # warn only once
|
| 14 |
+
[rank0]:[W308 23:48:18.631390841 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
|
| 15 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 16 |
+
warnings.warn( # warn only once
|
| 17 |
+
[rank2]:[W308 23:48:20.490054230 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
|
| 18 |
+
23:48:21.532 [I] Using batch size per GPU: 4 (total batch size across 4 GPUs: 16) (25643:train_pytorch.py:497)
|
| 19 |
+
23:48:21.656 [I] Loaded norm stats from /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train (25643:config.py:234)
|
| 20 |
+
23:48:21.658 [I] data_config: DataConfig(repo_id='lsnu/twin_handover_256_train', asset_id='lsnu/twin_handover_256_train', norm_stats={'state': NormStats(mean=array([ 0.40321857, 0.17899239, -0.07588876, -2.06326795, -0.46418607,
|
| 21 |
+
1.79356563, 0.70229131, 0.48194093, 0.93952829, 0.86693275,
|
| 22 |
+
-1.03168762, -1.9056077 , -0.53421056, 1.87584054, 2.36738205,
|
| 23 |
+
0.91249251]), std=array([0.73344636, 0.47653052, 0.72710407, 0.42399687, 0.63613892,
|
| 24 |
+
0.61144608, 1.11724186, 0.49967375, 0.86981195, 0.75071597,
|
| 25 |
+
0.90787333, 0.35008711, 0.51183224, 0.36600712, 0.56947577,
|
| 26 |
+
0.28257725]), q01=array([-1.52408956, -1.32446341, -1.91092197, -2.89885788, -1.66315554,
|
| 27 |
+
0.59010215, -2.27611645, 0. , -1.77352981, -1.62131719,
|
| 28 |
+
-1.77092851, -2.19172778, -2.03159353, 0.55409113, 0.79255736,
|
| 29 |
+
0. ]), q99=array([ 2.16638614, 1.38857444, 1.93436338, -0.88548369, 1.39976143,
|
| 30 |
+
2.99162304, 2.8194857 , 0.9998 , 1.46557211, 1.74660106,
|
| 31 |
+
1.58644652, -0.87876934, 2.25910752, 2.54628449, 2.89347284,
|
| 32 |
+
0.9998 ])), 'actions': NormStats(mean=array([ 0.05879939, -0.00704042, -0.02719213, -0.07685276, -0.07520971,
|
| 33 |
+
-0.00498583, 0.03577602, 0.48164892, 0.06564316, 0.06023132,
|
| 34 |
+
-0.10068271, -0.09547432, -0.0526481 , 0.08205888, 0.13954687,
|
| 35 |
+
0.88333535]), std=array([0.18337056, 0.28128958, 0.18525195, 0.29767084, 0.22944973,
|
| 36 |
+
0.40312037, 0.3896611 , 0.49966311, 0.21938531, 0.16883859,
|
| 37 |
+
0.20206179, 0.14864719, 0.12629333, 0.15546791, 0.23423795,
|
| 38 |
+
0.32102022]), q01=array([-0.34140511, -0.71597991, -0.55301429, -0.8233152 , -0.68097536,
|
| 39 |
+
-0.87723451, -0.86000918, 0. , -0.53261366, -0.49289397,
|
| 40 |
+
-0.48524564, -0.35752607, -0.42426748, -0.18230745, -0.09212705,
|
| 41 |
+
0. ]), q99=array([0.55444025, 0.69361174, 0.44115428, 0.550829 , 0.49707318,
|
| 42 |
+
0.68353445, 0.82907713, 0.9998 , 0.42654409, 0.44255511,
|
| 43 |
+
0.4114292 , 0.01550327, 0.38038206, 0.71452535, 0.62808441,
|
| 44 |
+
0.9998 ]))}, repack_transforms=Group(inputs=[RepackTransform(structure={'images': {'cam_high': 'front_image', 'cam_left_wrist': 'wrist_left_image', 'cam_right_wrist': 'wrist_right_image'}, 'state': 'state', 'actions': 'action', 'prompt': 'task'})], outputs=()), data_transforms=Group(inputs=[AlohaInputs(adapt_to_pi=False)], outputs=[]), model_transforms=Group(inputs=[InjectDefaultPrompt(prompt=None), ResizeImages(height=224, width=224), TokenizePrompt(tokenizer=<openpi.models.tokenizer.PaligemmaTokenizer object at 0x7ded44f10710>, discrete_state_input=True), PackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))], outputs=[UnpackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))]), use_quantile_norm=True, action_sequence_keys=('action',), prompt_from_task=False, rlds_data_dir=None, action_space=None, datasets=()) (25643:data_loader.py:283)
|
| 45 |
+
23:48:21.665 [I] Using existing local LeRobot dataset mirror for lsnu/twin_handover_256_train: /workspace/lerobot/lsnu/twin_handover_256_train (25643:data_loader.py:149)
|
| 46 |
+
23:48:27.988 [I] local_batch_size: 4 (25643:data_loader.py:364)
|
| 47 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 48 |
+
warnings.warn( # warn only once
|
| 49 |
+
23:50:52.339 [I] Enabled gradient checkpointing for PI0Pytorch model (25643:pi0_pytorch.py:150)
|
| 50 |
+
23:50:52.344 [I] Enabled gradient checkpointing for memory optimization (25643:train_pytorch.py:569)
|
| 51 |
+
23:50:52.345 [I] Step 0 (after_model_creation): GPU memory - allocated: 7.47GB, reserved: 7.48GB, free: 0.01GB, peak_allocated: 7.47GB, peak_reserved: 7.48GB | DDP: rank=0, world_size=4 (25643:train_pytorch.py:438)
|
| 52 |
+
23:51:03.555 [I] Loading weights from: /workspace/checkpoints/pi05_base_single_pytorch (25643:train_pytorch.py:598)
|
| 53 |
+
23:51:05.643 [I] Weight loading missing key count: 0 (25643:train_pytorch.py:606)
|
| 54 |
+
23:51:05.643 [I] Weight loading missing keys: set() (25643:train_pytorch.py:607)
|
| 55 |
+
23:51:05.643 [I] Weight loading unexpected key count: 0 (25643:train_pytorch.py:608)
|
| 56 |
+
23:51:05.644 [I] Weight loading unexpected keys: [] (25643:train_pytorch.py:609)
|
| 57 |
+
23:51:05.644 [I] Loaded PyTorch weights from /workspace/checkpoints/pi05_base_single_pytorch (25643:train_pytorch.py:610)
|
| 58 |
+
23:51:05.647 [I] Running on: 9e9e564d5d6e | world_size=4 (25643:train_pytorch.py:650)
|
| 59 |
+
23:51:05.648 [I] Training config: batch_size=16, effective_batch_size=4, num_train_steps=20 (25643:train_pytorch.py:651)
|
| 60 |
+
23:51:05.648 [I] Memory optimizations: gradient_checkpointing=True (25643:train_pytorch.py:654)
|
| 61 |
+
23:51:05.648 [I] LR schedule: warmup=200, peak_lr=2.50e-05, decay_steps=2000, end_lr=2.50e-06 (25643:train_pytorch.py:655)
|
| 62 |
+
23:51:05.649 [I] Optimizer: AdamW, weight_decay=1e-10, clip_norm=1.0 (25643:train_pytorch.py:658)
|
| 63 |
+
23:51:05.649 [I] EMA is not supported for PyTorch training (25643:train_pytorch.py:661)
|
| 64 |
+
23:51:05.650 [I] Training precision: bfloat16 (25643:train_pytorch.py:662)
|
| 65 |
+
23:51:05.671 [I] Resolved config name: pi05_twin_handover_256_packed_baseline_pytorch_2k (25643:train_pytorch.py:249)
|
| 66 |
+
23:51:05.671 [I] Dataset repo_id: lsnu/twin_handover_256_train (25643:train_pytorch.py:250)
|
| 67 |
+
23:51:05.672 [I] Norm-stats file path: /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json (25643:train_pytorch.py:251)
|
| 68 |
+
23:51:05.672 [I] Norm-stats summary: {'keys': ['actions', 'state'], 'state_mean_len': 16, 'state_std_len': 16, 'actions_mean_len': 16, 'actions_std_len': 16} (25643:train_pytorch.py:252)
|
| 69 |
+
23:51:05.673 [I] Checkpoint source path: /workspace/checkpoints/pi05_base_single_pytorch (25643:train_pytorch.py:253)
|
| 70 |
+
23:51:05.673 [I] Model type: baseline (25643:train_pytorch.py:254)
|
| 71 |
+
23:51:05.674 [I] Packed transforms active: True (25643:train_pytorch.py:255)
|
| 72 |
+
23:51:05.674 [I] World size: 4 (25643:train_pytorch.py:256)
|
| 73 |
+
23:51:05.674 [I] Batch size: local=4, global=16 (25643:train_pytorch.py:257)
|
| 74 |
+
23:51:05.674 [I] num_workers: 8 (25643:train_pytorch.py:258)
|
| 75 |
+
23:51:05.675 [I] Precision: bfloat16 (25643:train_pytorch.py:259)
|
| 76 |
+
23:51:05.675 [I] LR schedule summary: warmup_steps=200, peak_lr=2.50e-05, decay_steps=2000, decay_lr=2.50e-06 (25643:train_pytorch.py:260)
|
| 77 |
+
23:51:05.676 [I] Save/log intervals: save_interval=250, log_interval=10 (25643:train_pytorch.py:267)
|
| 78 |
+
23:51:05.676 [I] Action-loss mask: (1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) (25643:train_pytorch.py:268)
|
| 79 |
+
23:51:05.676 [I] Active mask dims: [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] (25643:train_pytorch.py:269)
|
| 80 |
+
23:51:05.677 [I] Masked dims: [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] (25643:train_pytorch.py:270)
|
| 81 |
+
|
| 82 |
+
self.pid = os.fork()
|
| 83 |
+
/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
|
| 84 |
+
self.pid = os.fork()
|
| 85 |
+
/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
|
| 86 |
+
self.pid = os.fork()
|
| 87 |
+
/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
|
| 88 |
+
self.pid = os.fork()
|
| 89 |
+
23:51:12.079 [I] debug_step=1 observation.state shape=(4, 32) dtype=torch.float64 actions shape=(4, 16, 32) dtype=torch.float32 (25643:train_pytorch.py:762)
|
| 90 |
+
23:51:12.080 [I] debug_step=1 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (25643:train_pytorch.py:766)
|
| 91 |
+
23:51:12.080 [I] debug_step=1 prompt_token_lengths=[74, 72, 76, 78] (25643:train_pytorch.py:769)
|
| 92 |
+
23:51:12.080 [I] debug_step=1 state_stats min=-1.0000 max=1.0004 mean=0.0715 std=0.4362 (25643:train_pytorch.py:770)
|
| 93 |
+
23:51:12.080 [I] debug_step=1 action_stats min=-1.0000 max=1.0947 mean=0.0331 std=0.4134 (25643:train_pytorch.py:773)
|
| 94 |
+
23:51:12.092 [I] debug_step=1 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (25643:train_pytorch.py:776)
|
| 95 |
+
23:51:12.221 [I] debug_step=1 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (25643:train_pytorch.py:780)
|
| 96 |
+
23:51:12.222 [I] debug_step=1 lr=1.24e-07 grad_norm=6.6952 data_time=2.5702s step_time=3.8197s gpu_mem_allocated=28.49GB gpu_mem_reserved=35.24GB gpu_mem_max_allocated=35.23GB gpu_mem_max_reserved=35.24GB (25643:train_pytorch.py:785)
|
| 97 |
+
|
| 98 |
+
[rank3]: File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 862, in <module>
|
| 99 |
+
[rank3]: main()
|
| 100 |
+
[rank3]: File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 858, in main
|
| 101 |
+
[rank3]: train_loop(config)
|
| 102 |
+
[rank3]: File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 703, in train_loop
|
| 103 |
+
[rank3]: losses = model(observation, actions)
|
| 104 |
+
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 105 |
+
[rank3]: File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
|
| 106 |
+
[rank3]: return self._call_impl(*args, **kwargs)
|
| 107 |
+
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 108 |
+
[rank3]: File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
|
| 109 |
+
[rank3]: return forward_call(*args, **kwargs)
|
| 110 |
+
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 111 |
+
[rank3]: File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1633, in forward
|
| 112 |
+
[rank3]: inputs, kwargs = self._pre_forward(*inputs, **kwargs)
|
| 113 |
+
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 114 |
+
[rank3]: File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1522, in _pre_forward
|
| 115 |
+
[rank3]: if torch.is_grad_enabled() and self.reducer._rebuild_buckets():
|
| 116 |
+
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 117 |
+
[rank3]: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
|
| 118 |
+
[rank3]: making sure all `forward` function outputs participate in calculating loss.
|
| 119 |
+
[rank3]: If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
|
| 120 |
+
[rank3]: Parameter indices which did not receive grad for rank 3: 596 597 598 599 601 602 803
|
| 121 |
+
[rank3]: In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error
|
| 122 |
+
[rank1]: Traceback (most recent call last):
|
| 123 |
+
[rank1]: File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 862, in <module>
|
| 124 |
+
[rank1]: main()
|
| 125 |
+
[rank1]: File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 858, in main
|
| 126 |
+
[rank1]: train_loop(config)
|
| 127 |
+
[rank1]: File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 703, in train_loop
|
| 128 |
+
[rank1]: losses = model(observation, actions)
|
| 129 |
+
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 130 |
+
[rank1]: File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
|
| 131 |
+
[rank1]: return self._call_impl(*args, **kwargs)
|
| 132 |
+
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 133 |
+
[rank1]: File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
|
| 134 |
+
[rank1]: return forward_call(*args, **kwargs)
|
| 135 |
+
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 136 |
+
[rank1]: File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1633, in forward
|
| 137 |
+
[rank1]: inputs, kwargs = self._pre_forward(*inputs, **kwargs)
|
| 138 |
+
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 139 |
+
[rank1]: File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1522, in _pre_forward
|
| 140 |
+
[rank1]: if torch.is_grad_enabled() and self.reducer._rebuild_buckets():
|
| 141 |
+
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 142 |
+
[rank1]: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
|
| 143 |
+
[rank1]: making sure all `forward` function outputs participate in calculating loss.
|
| 144 |
+
[rank1]: If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
|
| 145 |
+
[rank1]: Parameter indices which did not receive grad for rank 1: 596 597 598 599 601 602 803
|
| 146 |
+
[rank1]: In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error
|
| 147 |
+
[rank2]: Traceback (most recent call last):
|
| 148 |
+
[rank2]: File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 862, in <module>
|
| 149 |
+
[rank2]: main()
|
| 150 |
+
[rank2]: File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 858, in main
|
| 151 |
+
[rank2]: train_loop(config)
|
| 152 |
+
[rank2]: File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 703, in train_loop
|
| 153 |
+
[rank2]: losses = model(observation, actions)
|
| 154 |
+
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 155 |
+
[rank2]: File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
|
| 156 |
+
[rank2]: return self._call_impl(*args, **kwargs)
|
| 157 |
+
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 158 |
+
[rank2]: File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
|
| 159 |
+
[rank2]: return forward_call(*args, **kwargs)
|
| 160 |
+
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 161 |
+
[rank2]: File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1633, in forward
|
| 162 |
+
[rank2]: inputs, kwargs = self._pre_forward(*inputs, **kwargs)
|
| 163 |
+
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 164 |
+
[rank2]: File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1522, in _pre_forward
|
| 165 |
+
[rank2]: if torch.is_grad_enabled() and self.reducer._rebuild_buckets():
|
| 166 |
+
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 167 |
+
[rank2]: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
|
| 168 |
+
[rank2]: making sure all `forward` function outputs participate in calculating loss.
|
| 169 |
+
[rank2]: If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
|
| 170 |
+
[rank2]: Parameter indices which did not receive grad for rank 2: 596 597 598 599 601 602 803
|
| 171 |
+
[rank2]: In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error
|
| 172 |
+
[rank0]: Traceback (most recent call last):
|
| 173 |
+
[rank0]: File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 862, in <module>
|
| 174 |
+
[rank0]: main()
|
| 175 |
+
[rank0]: File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 858, in main
|
| 176 |
+
[rank0]: train_loop(config)
|
| 177 |
+
[rank0]: File "/workspace/pi05tests-openpi-multiarm/openpi/scripts/train_pytorch.py", line 703, in train_loop
|
| 178 |
+
[rank0]: losses = model(observation, actions)
|
| 179 |
+
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 180 |
+
[rank0]: File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
|
| 181 |
+
[rank0]: return self._call_impl(*args, **kwargs)
|
| 182 |
+
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 183 |
+
[rank0]: File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
|
| 184 |
+
[rank0]: return forward_call(*args, **kwargs)
|
| 185 |
+
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 186 |
+
[rank0]: File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1633, in forward
|
| 187 |
+
[rank0]: inputs, kwargs = self._pre_forward(*inputs, **kwargs)
|
| 188 |
+
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 189 |
+
[rank0]: File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/nn/parallel/distributed.py", line 1522, in _pre_forward
|
| 190 |
+
[rank0]: if torch.is_grad_enabled() and self.reducer._rebuild_buckets():
|
| 191 |
+
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 192 |
+
[rank0]: RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by
|
| 193 |
+
[rank0]: making sure all `forward` function outputs participate in calculating loss.
|
| 194 |
+
[rank0]: If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).
|
| 195 |
+
[rank0]: Parameter indices which did not receive grad for rank 0: 596 597 598 599 601 602 803
|
| 196 |
+
[rank0]: In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this rank as part of this error
|
| 197 |
+
|
| 198 |
+
[rank0]:[W308 23:51:13.598698202 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
|
| 199 |
+
W0308 23:51:15.249000 25558 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 25644 closing signal SIGTERM
|
| 200 |
+
W0308 23:51:15.305000 25558 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 25645 closing signal SIGTERM
|
| 201 |
+
W0308 23:51:15.328000 25558 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 25646 closing signal SIGTERM
|
| 202 |
+
E0308 23:51:16.314000 25558 torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 25643) of binary: /workspace/pi05tests-openpi-multiarm/openpi/.venv/bin/python
|
| 203 |
+
Traceback (most recent call last):
|
| 204 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/bin/torchrun", line 10, in <module>
|
| 205 |
+
sys.exit(main())
|
| 206 |
+
^^^^^^
|
| 207 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
|
| 208 |
+
return f(*args, **kwargs)
|
| 209 |
+
^^^^^^^^^^^^^^^^^^
|
| 210 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in main
|
| 211 |
+
run(args)
|
| 212 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/run.py", line 883, in run
|
| 213 |
+
elastic_launch(
|
| 214 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
|
| 215 |
+
return launch_agent(self._config, self._entrypoint, list(args))
|
| 216 |
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
| 217 |
+
File "/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 270, in launch_agent
|
| 218 |
+
raise ChildFailedError(
|
| 219 |
+
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
|
| 220 |
+
============================================================
|
| 221 |
+
scripts/train_pytorch.py FAILED
|
| 222 |
+
------------------------------------------------------------
|
| 223 |
+
Failures:
|
| 224 |
+
<NO_OTHER_FAILURES>
|
| 225 |
+
------------------------------------------------------------
|
| 226 |
+
Root Cause (first observed failure):
|
| 227 |
+
[0]:
|
| 228 |
+
time : 2026-03-08_23:51:15
|
| 229 |
+
host : 9e9e564d5d6e
|
| 230 |
+
rank : 0 (local_rank: 0)
|
| 231 |
+
exitcode : 1 (pid: 25643)
|
| 232 |
+
error_file: <N/A>
|
| 233 |
+
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
|
| 234 |
+
============================================================
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_baseline_20l.log
ADDED
|
@@ -0,0 +1,141 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
W0308 23:57:51.073000 28870 torch/distributed/run.py:766]
|
| 2 |
+
W0308 23:57:51.073000 28870 torch/distributed/run.py:766] *****************************************
|
| 3 |
+
W0308 23:57:51.073000 28870 torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
|
| 4 |
+
W0308 23:57:51.073000 28870 torch/distributed/run.py:766] *****************************************
|
| 5 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 6 |
+
warnings.warn( # warn only once
|
| 7 |
+
[rank1]:[W309 00:00:38.424269437 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
|
| 8 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 9 |
+
warnings.warn( # warn only once
|
| 10 |
+
[rank2]:[W309 00:00:39.886552746 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
|
| 11 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 12 |
+
warnings.warn( # warn only once
|
| 13 |
+
[rank3]:[W309 00:00:48.235773018 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
|
| 14 |
+
00:00:50.394 [I] Created experiment checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20l (28954:train_pytorch.py:478)
|
| 15 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 16 |
+
warnings.warn( # warn only once
|
| 17 |
+
[rank0]:[W309 00:00:50.868996725 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
|
| 18 |
+
00:00:52.168 [I] Using batch size per GPU: 4 (total batch size across 4 GPUs: 16) (28954:train_pytorch.py:497)
|
| 19 |
+
00:00:52.345 [I] Loaded norm stats from /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train (28954:config.py:234)
|
| 20 |
+
00:00:52.350 [I] data_config: DataConfig(repo_id='lsnu/twin_handover_256_train', asset_id='lsnu/twin_handover_256_train', norm_stats={'state': NormStats(mean=array([ 0.40321857, 0.17899239, -0.07588876, -2.06326795, -0.46418607,
|
| 21 |
+
1.79356563, 0.70229131, 0.48194093, 0.93952829, 0.86693275,
|
| 22 |
+
-1.03168762, -1.9056077 , -0.53421056, 1.87584054, 2.36738205,
|
| 23 |
+
0.91249251]), std=array([0.73344636, 0.47653052, 0.72710407, 0.42399687, 0.63613892,
|
| 24 |
+
0.61144608, 1.11724186, 0.49967375, 0.86981195, 0.75071597,
|
| 25 |
+
0.90787333, 0.35008711, 0.51183224, 0.36600712, 0.56947577,
|
| 26 |
+
0.28257725]), q01=array([-1.52408956, -1.32446341, -1.91092197, -2.89885788, -1.66315554,
|
| 27 |
+
0.59010215, -2.27611645, 0. , -1.77352981, -1.62131719,
|
| 28 |
+
-1.77092851, -2.19172778, -2.03159353, 0.55409113, 0.79255736,
|
| 29 |
+
0. ]), q99=array([ 2.16638614, 1.38857444, 1.93436338, -0.88548369, 1.39976143,
|
| 30 |
+
2.99162304, 2.8194857 , 0.9998 , 1.46557211, 1.74660106,
|
| 31 |
+
1.58644652, -0.87876934, 2.25910752, 2.54628449, 2.89347284,
|
| 32 |
+
0.9998 ])), 'actions': NormStats(mean=array([ 0.05879939, -0.00704042, -0.02719213, -0.07685276, -0.07520971,
|
| 33 |
+
-0.00498583, 0.03577602, 0.48164892, 0.06564316, 0.06023132,
|
| 34 |
+
-0.10068271, -0.09547432, -0.0526481 , 0.08205888, 0.13954687,
|
| 35 |
+
0.88333535]), std=array([0.18337056, 0.28128958, 0.18525195, 0.29767084, 0.22944973,
|
| 36 |
+
0.40312037, 0.3896611 , 0.49966311, 0.21938531, 0.16883859,
|
| 37 |
+
0.20206179, 0.14864719, 0.12629333, 0.15546791, 0.23423795,
|
| 38 |
+
0.32102022]), q01=array([-0.34140511, -0.71597991, -0.55301429, -0.8233152 , -0.68097536,
|
| 39 |
+
-0.87723451, -0.86000918, 0. , -0.53261366, -0.49289397,
|
| 40 |
+
-0.48524564, -0.35752607, -0.42426748, -0.18230745, -0.09212705,
|
| 41 |
+
0. ]), q99=array([0.55444025, 0.69361174, 0.44115428, 0.550829 , 0.49707318,
|
| 42 |
+
0.68353445, 0.82907713, 0.9998 , 0.42654409, 0.44255511,
|
| 43 |
+
0.4114292 , 0.01550327, 0.38038206, 0.71452535, 0.62808441,
|
| 44 |
+
0.9998 ]))}, repack_transforms=Group(inputs=[RepackTransform(structure={'images': {'cam_high': 'front_image', 'cam_left_wrist': 'wrist_left_image', 'cam_right_wrist': 'wrist_right_image'}, 'state': 'state', 'actions': 'action', 'prompt': 'task'})], outputs=()), data_transforms=Group(inputs=[AlohaInputs(adapt_to_pi=False)], outputs=[]), model_transforms=Group(inputs=[InjectDefaultPrompt(prompt=None), ResizeImages(height=224, width=224), TokenizePrompt(tokenizer=<openpi.models.tokenizer.PaligemmaTokenizer object at 0x7fceff4c7710>, discrete_state_input=True), PackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))], outputs=[UnpackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))]), use_quantile_norm=True, action_sequence_keys=('action',), prompt_from_task=False, rlds_data_dir=None, action_space=None, datasets=()) (28954:data_loader.py:283)
|
| 45 |
+
00:00:52.360 [I] Using existing local LeRobot dataset mirror for lsnu/twin_handover_256_train: /workspace/lerobot/lsnu/twin_handover_256_train (28954:data_loader.py:149)
|
| 46 |
+
00:00:59.307 [I] local_batch_size: 4 (28954:data_loader.py:364)
|
| 47 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 48 |
+
warnings.warn( # warn only once
|
| 49 |
+
00:02:31.673 [I] Enabled gradient checkpointing for PI0Pytorch model (28954:pi0_pytorch.py:150)
|
| 50 |
+
00:02:31.680 [I] Enabled gradient checkpointing for memory optimization (28954:train_pytorch.py:569)
|
| 51 |
+
00:02:31.681 [I] Step 0 (after_model_creation): GPU memory - allocated: 7.47GB, reserved: 7.48GB, free: 0.01GB, peak_allocated: 7.47GB, peak_reserved: 7.48GB | DDP: rank=0, world_size=4 (28954:train_pytorch.py:438)
|
| 52 |
+
00:02:46.133 [I] Loading weights from: /workspace/checkpoints/pi05_base_single_pytorch (28954:train_pytorch.py:598)
|
| 53 |
+
00:02:48.254 [I] Weight loading missing key count: 0 (28954:train_pytorch.py:606)
|
| 54 |
+
00:02:48.254 [I] Weight loading missing keys: set() (28954:train_pytorch.py:607)
|
| 55 |
+
00:02:48.255 [I] Weight loading unexpected key count: 0 (28954:train_pytorch.py:608)
|
| 56 |
+
00:02:48.255 [I] Weight loading unexpected keys: [] (28954:train_pytorch.py:609)
|
| 57 |
+
00:02:48.255 [I] Loaded PyTorch weights from /workspace/checkpoints/pi05_base_single_pytorch (28954:train_pytorch.py:610)
|
| 58 |
+
00:02:48.259 [I] Running on: 9e9e564d5d6e | world_size=4 (28954:train_pytorch.py:650)
|
| 59 |
+
00:02:48.259 [I] Training config: batch_size=16, effective_batch_size=4, num_train_steps=20 (28954:train_pytorch.py:651)
|
| 60 |
+
/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
|
| 61 |
+
self.pid = os.fork()
|
| 62 |
+
/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
|
| 63 |
+
self.pid = os.fork()
|
| 64 |
+
00:02:48.260 [I] Memory optimizations: gradient_checkpointing=True (28954:train_pytorch.py:654)
|
| 65 |
+
00:02:48.261 [I] DDP settings: find_unused_parameters=False, gradient_as_bucket_view=True, static_graph=True (28954:train_pytorch.py:655)
|
| 66 |
+
00:02:48.261 [I] LR schedule: warmup=200, peak_lr=2.50e-05, decay_steps=2000, end_lr=2.50e-06 (28954:train_pytorch.py:656)
|
| 67 |
+
/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
|
| 68 |
+
self.pid = os.fork()
|
| 69 |
+
00:02:48.261 [I] Optimizer: AdamW, weight_decay=1e-10, clip_norm=1.0 (28954:train_pytorch.py:659)
|
| 70 |
+
00:02:48.262 [I] EMA is not supported for PyTorch training (28954:train_pytorch.py:662)
|
| 71 |
+
00:02:48.262 [I] Training precision: bfloat16 (28954:train_pytorch.py:663)
|
| 72 |
+
00:02:48.266 [I] Resolved config name: pi05_twin_handover_256_packed_baseline_pytorch_2k (28954:train_pytorch.py:249)
|
| 73 |
+
00:02:48.266 [I] Dataset repo_id: lsnu/twin_handover_256_train (28954:train_pytorch.py:250)
|
| 74 |
+
00:02:48.266 [I] Norm-stats file path: /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json (28954:train_pytorch.py:251)
|
| 75 |
+
00:02:48.266 [I] Norm-stats summary: {'keys': ['actions', 'state'], 'state_mean_len': 16, 'state_std_len': 16, 'actions_mean_len': 16, 'actions_std_len': 16} (28954:train_pytorch.py:252)
|
| 76 |
+
00:02:48.266 [I] Checkpoint source path: /workspace/checkpoints/pi05_base_single_pytorch (28954:train_pytorch.py:253)
|
| 77 |
+
00:02:48.267 [I] Model type: baseline (28954:train_pytorch.py:254)
|
| 78 |
+
00:02:48.267 [I] Packed transforms active: True (28954:train_pytorch.py:255)
|
| 79 |
+
00:02:48.267 [I] World size: 4 (28954:train_pytorch.py:256)
|
| 80 |
+
00:02:48.267 [I] Batch size: local=4, global=16 (28954:train_pytorch.py:257)
|
| 81 |
+
00:02:48.267 [I] num_workers: 8 (28954:train_pytorch.py:258)
|
| 82 |
+
00:02:48.267 [I] Precision: bfloat16 (28954:train_pytorch.py:259)
|
| 83 |
+
00:02:48.268 [I] LR schedule summary: warmup_steps=200, peak_lr=2.50e-05, decay_steps=2000, decay_lr=2.50e-06 (28954:train_pytorch.py:260)
|
| 84 |
+
00:02:48.268 [I] Save/log intervals: save_interval=250, log_interval=10 (28954:train_pytorch.py:267)
|
| 85 |
+
00:02:48.268 [I] Action-loss mask: (1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) (28954:train_pytorch.py:268)
|
| 86 |
+
00:02:48.268 [I] Active mask dims: [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] (28954:train_pytorch.py:269)
|
| 87 |
+
00:02:48.268 [I] Masked dims: [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] (28954:train_pytorch.py:270)
|
| 88 |
+
|
| 89 |
+
self.pid = os.fork()
|
| 90 |
+
00:02:51.626 [I] debug_step=1 observation.state shape=(4, 32) dtype=torch.float64 actions shape=(4, 16, 32) dtype=torch.float32 (28954:train_pytorch.py:763)
|
| 91 |
+
00:02:51.627 [I] debug_step=1 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (28954:train_pytorch.py:767)
|
| 92 |
+
00:02:51.627 [I] debug_step=1 prompt_token_lengths=[74, 72, 76, 78] (28954:train_pytorch.py:770)
|
| 93 |
+
00:02:51.627 [I] debug_step=1 state_stats min=-1.0000 max=1.0004 mean=0.0715 std=0.4362 (28954:train_pytorch.py:771)
|
| 94 |
+
00:02:51.627 [I] debug_step=1 action_stats min=-1.0000 max=1.0947 mean=0.0331 std=0.4134 (28954:train_pytorch.py:774)
|
| 95 |
+
00:02:51.628 [I] debug_step=1 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (28954:train_pytorch.py:777)
|
| 96 |
+
00:02:51.645 [I] debug_step=1 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (28954:train_pytorch.py:781)
|
| 97 |
+
00:02:51.645 [I] debug_step=1 lr=1.24e-07 grad_norm=15.9656 data_time=1.1114s step_time=2.2178s gpu_mem_allocated=28.49GB gpu_mem_reserved=35.24GB gpu_mem_max_allocated=35.23GB gpu_mem_max_reserved=35.24GB (28954:train_pytorch.py:786)
|
| 98 |
+
|
| 99 |
+
00:02:52.155 [I] debug_step=2 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (28954:train_pytorch.py:767)
|
| 100 |
+
00:02:52.156 [I] debug_step=2 prompt_token_lengths=[79, 76, 69, 69] (28954:train_pytorch.py:770)
|
| 101 |
+
00:02:52.157 [I] debug_step=2 state_stats min=-1.0000 max=1.0004 mean=0.0430 std=0.4223 (28954:train_pytorch.py:771)
|
| 102 |
+
00:02:52.157 [I] debug_step=2 action_stats min=-1.0000 max=1.0071 mean=0.0532 std=0.4394 (28954:train_pytorch.py:774)
|
| 103 |
+
00:02:52.158 [I] debug_step=2 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (28954:train_pytorch.py:777)
|
| 104 |
+
00:02:52.159 [I] debug_step=2 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (28954:train_pytorch.py:781)
|
| 105 |
+
00:02:52.159 [I] debug_step=2 lr=2.49e-07 grad_norm=7.5785 data_time=0.0858s step_time=0.4435s gpu_mem_allocated=28.49GB gpu_mem_reserved=35.24GB gpu_mem_max_allocated=35.23GB gpu_mem_max_reserved=35.24GB (28954:train_pytorch.py:786)
|
| 106 |
+
|
| 107 |
+
00:02:52.947 [I] debug_step=3 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (28954:train_pytorch.py:767)
|
| 108 |
+
00:02:52.948 [I] debug_step=3 prompt_token_lengths=[74, 68, 72, 73] (28954:train_pytorch.py:770)
|
| 109 |
+
00:02:52.949 [I] debug_step=3 state_stats min=-1.1677 max=1.0004 mean=0.0099 std=0.5093 (28954:train_pytorch.py:771)
|
| 110 |
+
00:02:52.949 [I] debug_step=3 action_stats min=-1.1487 max=1.1439 mean=0.0173 std=0.4079 (28954:train_pytorch.py:774)
|
| 111 |
+
00:02:52.950 [I] debug_step=3 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (28954:train_pytorch.py:777)
|
| 112 |
+
00:02:52.951 [I] debug_step=3 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (28954:train_pytorch.py:781)
|
| 113 |
+
00:02:52.951 [I] debug_step=3 lr=3.73e-07 grad_norm=10.5944 data_time=0.1892s step_time=0.6031s gpu_mem_allocated=28.49GB gpu_mem_reserved=35.24GB gpu_mem_max_allocated=35.23GB gpu_mem_max_reserved=35.24GB (28954:train_pytorch.py:786)
|
| 114 |
+
|
| 115 |
+
00:02:53.749 [I] debug_step=4 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (28954:train_pytorch.py:767)
|
| 116 |
+
00:02:53.750 [I] debug_step=4 prompt_token_lengths=[75, 73, 76, 71] (28954:train_pytorch.py:770)
|
| 117 |
+
00:02:53.750 [I] debug_step=4 state_stats min=-1.0000 max=1.0708 mean=0.0711 std=0.4551 (28954:train_pytorch.py:771)
|
| 118 |
+
00:02:53.750 [I] debug_step=4 action_stats min=-1.0000 max=1.4460 mean=0.0674 std=0.4311 (28954:train_pytorch.py:774)
|
| 119 |
+
00:02:53.751 [I] debug_step=4 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (28954:train_pytorch.py:777)
|
| 120 |
+
00:02:53.752 [I] debug_step=4 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (28954:train_pytorch.py:781)
|
| 121 |
+
00:02:53.752 [I] debug_step=4 lr=4.98e-07 grad_norm=13.1086 data_time=0.1977s step_time=0.6039s gpu_mem_allocated=28.49GB gpu_mem_reserved=35.24GB gpu_mem_max_allocated=35.23GB gpu_mem_max_reserved=35.24GB (28954:train_pytorch.py:786)
|
| 122 |
+
|
| 123 |
+
00:02:54.234 [I] debug_step=5 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (28954:train_pytorch.py:767)
|
| 124 |
+
00:02:54.234 [I] debug_step=5 prompt_token_lengths=[73, 75, 70, 73] (28954:train_pytorch.py:770)
|
| 125 |
+
00:02:54.234 [I] debug_step=5 state_stats min=-1.0000 max=1.0004 mean=0.0188 std=0.4734 (28954:train_pytorch.py:771)
|
| 126 |
+
00:02:54.235 [I] debug_step=5 action_stats min=-1.0000 max=1.0647 mean=0.0147 std=0.3985 (28954:train_pytorch.py:774)
|
| 127 |
+
00:02:54.235 [I] debug_step=5 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (28954:train_pytorch.py:777)
|
| 128 |
+
00:02:54.235 [I] debug_step=5 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (28954:train_pytorch.py:781)
|
| 129 |
+
00:02:54.236 [I] debug_step=5 lr=6.22e-07 grad_norm=21.4053 data_time=0.0611s step_time=0.4238s gpu_mem_allocated=28.49GB gpu_mem_reserved=35.24GB gpu_mem_max_allocated=35.23GB gpu_mem_max_reserved=35.24GB (28954:train_pytorch.py:786)
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 133 |
+
warnings.warn( # warn only once
|
| 134 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 135 |
+
warnings.warn( # warn only once
|
| 136 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 137 |
+
warnings.warn( # warn only once
|
| 138 |
+
00:04:31.529 [I] Saved checkpoint at step 20 -> /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/smoke_handover_packed_baseline_20l/20 (28954:train_pytorch.py:323)
|
| 139 |
+
|
| 140 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 141 |
+
warnings.warn( # warn only once
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/smoke_handover_packed_parallel_20a.log
ADDED
|
@@ -0,0 +1,141 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
W0309 00:05:58.586000 31870 torch/distributed/run.py:766]
|
| 2 |
+
W0309 00:05:58.586000 31870 torch/distributed/run.py:766] *****************************************
|
| 3 |
+
W0309 00:05:58.586000 31870 torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
|
| 4 |
+
W0309 00:05:58.586000 31870 torch/distributed/run.py:766] *****************************************
|
| 5 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 6 |
+
warnings.warn( # warn only once
|
| 7 |
+
[rank3]:[W309 00:07:35.438460211 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
|
| 8 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 9 |
+
warnings.warn( # warn only once
|
| 10 |
+
[rank2]:[W309 00:07:38.377129614 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
|
| 11 |
+
00:07:39.654 [I] Created experiment checkpoint directory: /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/smoke_handover_packed_parallel_20a (31952:train_pytorch.py:478)
|
| 12 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 13 |
+
warnings.warn( # warn only once
|
| 14 |
+
[rank0]:[W309 00:07:39.073712842 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
|
| 15 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 16 |
+
warnings.warn( # warn only once
|
| 17 |
+
[rank1]:[W309 00:07:43.016127248 ProcessGroupNCCL.cpp:4718] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device.
|
| 18 |
+
00:07:45.272 [I] Using batch size per GPU: 4 (total batch size across 4 GPUs: 16) (31952:train_pytorch.py:497)
|
| 19 |
+
00:07:45.376 [I] Loaded norm stats from /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_parallel_pytorch_2k/lsnu/twin_handover_256_train (31952:config.py:234)
|
| 20 |
+
00:07:45.378 [I] data_config: DataConfig(repo_id='lsnu/twin_handover_256_train', asset_id='lsnu/twin_handover_256_train', norm_stats={'state': NormStats(mean=array([ 0.40321857, 0.17899239, -0.07588876, -2.06326795, -0.46418607,
|
| 21 |
+
1.79356563, 0.70229131, 0.48194093, 0.93952829, 0.86693275,
|
| 22 |
+
-1.03168762, -1.9056077 , -0.53421056, 1.87584054, 2.36738205,
|
| 23 |
+
0.91249251]), std=array([0.73344636, 0.47653052, 0.72710407, 0.42399687, 0.63613892,
|
| 24 |
+
0.61144608, 1.11724186, 0.49967375, 0.86981195, 0.75071597,
|
| 25 |
+
0.90787333, 0.35008711, 0.51183224, 0.36600712, 0.56947577,
|
| 26 |
+
0.28257725]), q01=array([-1.52408956, -1.32446341, -1.91092197, -2.89885788, -1.66315554,
|
| 27 |
+
0.59010215, -2.27611645, 0. , -1.77352981, -1.62131719,
|
| 28 |
+
-1.77092851, -2.19172778, -2.03159353, 0.55409113, 0.79255736,
|
| 29 |
+
0. ]), q99=array([ 2.16638614, 1.38857444, 1.93436338, -0.88548369, 1.39976143,
|
| 30 |
+
2.99162304, 2.8194857 , 0.9998 , 1.46557211, 1.74660106,
|
| 31 |
+
1.58644652, -0.87876934, 2.25910752, 2.54628449, 2.89347284,
|
| 32 |
+
0.9998 ])), 'actions': NormStats(mean=array([ 0.05879939, -0.00704042, -0.02719213, -0.07685276, -0.07520971,
|
| 33 |
+
-0.00498583, 0.03577602, 0.48164892, 0.06564316, 0.06023132,
|
| 34 |
+
-0.10068271, -0.09547432, -0.0526481 , 0.08205888, 0.13954687,
|
| 35 |
+
0.88333535]), std=array([0.18337056, 0.28128958, 0.18525195, 0.29767084, 0.22944973,
|
| 36 |
+
0.40312037, 0.3896611 , 0.49966311, 0.21938531, 0.16883859,
|
| 37 |
+
0.20206179, 0.14864719, 0.12629333, 0.15546791, 0.23423795,
|
| 38 |
+
0.32102022]), q01=array([-0.34140511, -0.71597991, -0.55301429, -0.8233152 , -0.68097536,
|
| 39 |
+
-0.87723451, -0.86000918, 0. , -0.53261366, -0.49289397,
|
| 40 |
+
-0.48524564, -0.35752607, -0.42426748, -0.18230745, -0.09212705,
|
| 41 |
+
0. ]), q99=array([0.55444025, 0.69361174, 0.44115428, 0.550829 , 0.49707318,
|
| 42 |
+
0.68353445, 0.82907713, 0.9998 , 0.42654409, 0.44255511,
|
| 43 |
+
0.4114292 , 0.01550327, 0.38038206, 0.71452535, 0.62808441,
|
| 44 |
+
0.9998 ]))}, repack_transforms=Group(inputs=[RepackTransform(structure={'images': {'cam_high': 'front_image', 'cam_left_wrist': 'wrist_left_image', 'cam_right_wrist': 'wrist_right_image'}, 'state': 'state', 'actions': 'action', 'prompt': 'task'})], outputs=()), data_transforms=Group(inputs=[AlohaInputs(adapt_to_pi=False)], outputs=[]), model_transforms=Group(inputs=[InjectDefaultPrompt(prompt=None), ResizeImages(height=224, width=224), TokenizePrompt(tokenizer=<openpi.models.tokenizer.PaligemmaTokenizer object at 0x70ac18e479d0>, discrete_state_input=True), PackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))], outputs=[UnpackPerArmBlocks(real_arm_dims=(8, 8), block_dims=(16, 16))]), use_quantile_norm=True, action_sequence_keys=('action',), prompt_from_task=False, rlds_data_dir=None, action_space=None, datasets=()) (31952:data_loader.py:283)
|
| 45 |
+
00:07:45.381 [I] Using existing local LeRobot dataset mirror for lsnu/twin_handover_256_train: /workspace/lerobot/lsnu/twin_handover_256_train (31952:data_loader.py:149)
|
| 46 |
+
00:07:51.404 [I] local_batch_size: 4 (31952:data_loader.py:364)
|
| 47 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 48 |
+
warnings.warn( # warn only once
|
| 49 |
+
00:09:48.120 [I] Enabled gradient checkpointing for PI0Pytorch model (31952:pi0_pytorch.py:150)
|
| 50 |
+
00:09:48.121 [I] Enabled gradient checkpointing for memory optimization (31952:train_pytorch.py:569)
|
| 51 |
+
00:09:48.122 [I] Step 0 (after_model_creation): GPU memory - allocated: 7.48GB, reserved: 7.48GB, free: 0.00GB, peak_allocated: 7.48GB, peak_reserved: 7.48GB | DDP: rank=0, world_size=4 (31952:train_pytorch.py:438)
|
| 52 |
+
00:10:05.891 [I] Loading weights from: /workspace/checkpoints/pi05_base_parallel_packed_from_single (31952:train_pytorch.py:598)
|
| 53 |
+
/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
|
| 54 |
+
self.pid = os.fork()
|
| 55 |
+
/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
|
| 56 |
+
self.pid = os.fork()
|
| 57 |
+
00:12:47.760 [I] Weight loading missing key count: 0 (31952:train_pytorch.py:606)
|
| 58 |
+
00:12:47.761 [I] Weight loading missing keys: set() (31952:train_pytorch.py:607)
|
| 59 |
+
00:12:47.761 [I] Weight loading unexpected key count: 0 (31952:train_pytorch.py:608)
|
| 60 |
+
00:12:47.761 [I] Weight loading unexpected keys: [] (31952:train_pytorch.py:609)
|
| 61 |
+
00:12:47.762 [I] Loaded PyTorch weights from /workspace/checkpoints/pi05_base_parallel_packed_from_single (31952:train_pytorch.py:610)
|
| 62 |
+
00:12:47.766 [I] Running on: 9e9e564d5d6e | world_size=4 (31952:train_pytorch.py:650)
|
| 63 |
+
00:12:47.766 [I] Training config: batch_size=16, effective_batch_size=4, num_train_steps=20 (31952:train_pytorch.py:651)
|
| 64 |
+
00:12:47.766 [I] Memory optimizations: gradient_checkpointing=True (31952:train_pytorch.py:654)
|
| 65 |
+
00:12:47.766 [I] DDP settings: find_unused_parameters=False, gradient_as_bucket_view=True, static_graph=True (31952:train_pytorch.py:655)
|
| 66 |
+
00:12:47.767 [I] LR schedule: warmup=200, peak_lr=2.50e-05, decay_steps=2000, end_lr=2.50e-06 (31952:train_pytorch.py:656)
|
| 67 |
+
00:12:47.767 [I] Optimizer: AdamW, weight_decay=1e-10, clip_norm=1.0 (31952:train_pytorch.py:659)
|
| 68 |
+
00:12:47.767 [I] EMA is not supported for PyTorch training (31952:train_pytorch.py:662)
|
| 69 |
+
00:12:47.767 [I] Training precision: bfloat16 (31952:train_pytorch.py:663)
|
| 70 |
+
00:12:47.771 [I] Resolved config name: pi05_twin_handover_256_packed_parallel_pytorch_2k (31952:train_pytorch.py:249)
|
| 71 |
+
00:12:47.771 [I] Dataset repo_id: lsnu/twin_handover_256_train (31952:train_pytorch.py:250)
|
| 72 |
+
00:12:47.771 [I] Norm-stats file path: /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_parallel_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json (31952:train_pytorch.py:251)
|
| 73 |
+
00:12:47.771 [I] Norm-stats summary: {'keys': ['actions', 'state'], 'state_mean_len': 16, 'state_std_len': 16, 'actions_mean_len': 16, 'actions_std_len': 16} (31952:train_pytorch.py:252)
|
| 74 |
+
00:12:47.771 [I] Checkpoint source path: /workspace/checkpoints/pi05_base_parallel_packed_from_single (31952:train_pytorch.py:253)
|
| 75 |
+
00:12:47.771 [I] Model type: parallel (31952:train_pytorch.py:254)
|
| 76 |
+
00:12:47.771 [I] Packed transforms active: True (31952:train_pytorch.py:255)
|
| 77 |
+
00:12:47.772 [I] World size: 4 (31952:train_pytorch.py:256)
|
| 78 |
+
00:12:47.772 [I] Batch size: local=4, global=16 (31952:train_pytorch.py:257)
|
| 79 |
+
00:12:47.772 [I] num_workers: 8 (31952:train_pytorch.py:258)
|
| 80 |
+
00:12:47.772 [I] Precision: bfloat16 (31952:train_pytorch.py:259)
|
| 81 |
+
00:12:47.772 [I] LR schedule summary: warmup_steps=200, peak_lr=2.50e-05, decay_steps=2000, decay_lr=2.50e-06 (31952:train_pytorch.py:260)
|
| 82 |
+
00:12:47.772 [I] Save/log intervals: save_interval=250, log_interval=10 (31952:train_pytorch.py:267)
|
| 83 |
+
00:12:47.772 [I] Action-loss mask: (1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0) (31952:train_pytorch.py:268)
|
| 84 |
+
00:12:47.772 [I] Active mask dims: [0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] (31952:train_pytorch.py:269)
|
| 85 |
+
00:12:47.772 [I] Masked dims: [8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] (31952:train_pytorch.py:270)
|
| 86 |
+
|
| 87 |
+
self.pid = os.fork()
|
| 88 |
+
/usr/lib/python3.11/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
|
| 89 |
+
self.pid = os.fork()
|
| 90 |
+
00:12:51.535 [I] debug_step=1 observation.state shape=(4, 32) dtype=torch.float64 actions shape=(4, 16, 32) dtype=torch.float32 (31952:train_pytorch.py:763)
|
| 91 |
+
00:12:51.536 [I] debug_step=1 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (31952:train_pytorch.py:767)
|
| 92 |
+
00:12:51.536 [I] debug_step=1 prompt_token_lengths=[74, 72, 76, 78] (31952:train_pytorch.py:770)
|
| 93 |
+
00:12:51.536 [I] debug_step=1 state_stats min=-1.0000 max=1.0004 mean=0.0715 std=0.4362 (31952:train_pytorch.py:771)
|
| 94 |
+
00:12:51.536 [I] debug_step=1 action_stats min=-1.0000 max=1.0947 mean=0.0331 std=0.4134 (31952:train_pytorch.py:774)
|
| 95 |
+
00:12:51.537 [I] debug_step=1 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (31952:train_pytorch.py:777)
|
| 96 |
+
00:12:51.560 [I] debug_step=1 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (31952:train_pytorch.py:781)
|
| 97 |
+
00:12:51.560 [I] debug_step=1 lr=1.24e-07 grad_norm=16.1250 data_time=1.1500s step_time=2.5752s gpu_mem_allocated=28.53GB gpu_mem_reserved=35.28GB gpu_mem_max_allocated=35.27GB gpu_mem_max_reserved=35.28GB (31952:train_pytorch.py:786)
|
| 98 |
+
|
| 99 |
+
00:12:52.214 [I] debug_step=2 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (31952:train_pytorch.py:767)
|
| 100 |
+
00:12:52.214 [I] debug_step=2 prompt_token_lengths=[79, 76, 69, 69] (31952:train_pytorch.py:770)
|
| 101 |
+
00:12:52.214 [I] debug_step=2 state_stats min=-1.0000 max=1.0004 mean=0.0430 std=0.4223 (31952:train_pytorch.py:771)
|
| 102 |
+
00:12:52.215 [I] debug_step=2 action_stats min=-1.0000 max=1.0071 mean=0.0532 std=0.4394 (31952:train_pytorch.py:774)
|
| 103 |
+
00:12:52.215 [I] debug_step=2 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (31952:train_pytorch.py:777)
|
| 104 |
+
00:12:52.216 [I] debug_step=2 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (31952:train_pytorch.py:781)
|
| 105 |
+
00:12:52.216 [I] debug_step=2 lr=2.49e-07 grad_norm=7.6422 data_time=0.1756s step_time=0.5095s gpu_mem_allocated=28.53GB gpu_mem_reserved=35.28GB gpu_mem_max_allocated=35.27GB gpu_mem_max_reserved=35.28GB (31952:train_pytorch.py:786)
|
| 106 |
+
|
| 107 |
+
00:12:52.866 [I] debug_step=3 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (31952:train_pytorch.py:767)
|
| 108 |
+
00:12:52.867 [I] debug_step=3 prompt_token_lengths=[74, 68, 72, 73] (31952:train_pytorch.py:770)
|
| 109 |
+
00:12:52.868 [I] debug_step=3 state_stats min=-1.1677 max=1.0004 mean=0.0099 std=0.5093 (31952:train_pytorch.py:771)
|
| 110 |
+
00:12:52.868 [I] debug_step=3 action_stats min=-1.1487 max=1.1439 mean=0.0173 std=0.4079 (31952:train_pytorch.py:774)
|
| 111 |
+
00:12:52.870 [I] debug_step=3 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (31952:train_pytorch.py:777)
|
| 112 |
+
00:12:52.871 [I] debug_step=3 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (31952:train_pytorch.py:781)
|
| 113 |
+
00:12:52.871 [I] debug_step=3 lr=3.73e-07 grad_norm=10.7104 data_time=0.1504s step_time=0.5022s gpu_mem_allocated=28.53GB gpu_mem_reserved=35.28GB gpu_mem_max_allocated=35.27GB gpu_mem_max_reserved=35.28GB (31952:train_pytorch.py:786)
|
| 114 |
+
|
| 115 |
+
00:12:53.506 [I] debug_step=4 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (31952:train_pytorch.py:767)
|
| 116 |
+
00:12:53.507 [I] debug_step=4 prompt_token_lengths=[75, 73, 76, 71] (31952:train_pytorch.py:770)
|
| 117 |
+
00:12:53.507 [I] debug_step=4 state_stats min=-1.0000 max=1.0708 mean=0.0711 std=0.4551 (31952:train_pytorch.py:771)
|
| 118 |
+
00:12:53.507 [I] debug_step=4 action_stats min=-1.0000 max=1.4460 mean=0.0674 std=0.4311 (31952:train_pytorch.py:774)
|
| 119 |
+
00:12:53.508 [I] debug_step=4 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (31952:train_pytorch.py:777)
|
| 120 |
+
00:12:53.509 [I] debug_step=4 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (31952:train_pytorch.py:781)
|
| 121 |
+
00:12:53.509 [I] debug_step=4 lr=4.98e-07 grad_norm=13.2371 data_time=0.1376s step_time=0.5020s gpu_mem_allocated=28.53GB gpu_mem_reserved=35.28GB gpu_mem_max_allocated=35.27GB gpu_mem_max_reserved=35.28GB (31952:train_pytorch.py:786)
|
| 122 |
+
|
| 123 |
+
00:12:54.201 [I] debug_step=5 image_keys=['base_0_rgb', 'left_wrist_0_rgb', 'right_wrist_0_rgb'] image_shapes={'base_0_rgb': (4, 3, 224, 224), 'left_wrist_0_rgb': (4, 3, 224, 224), 'right_wrist_0_rgb': (4, 3, 224, 224)} (31952:train_pytorch.py:767)
|
| 124 |
+
00:12:54.202 [I] debug_step=5 prompt_token_lengths=[73, 75, 70, 73] (31952:train_pytorch.py:770)
|
| 125 |
+
00:12:54.203 [I] debug_step=5 state_stats min=-1.0000 max=1.0004 mean=0.0188 std=0.4734 (31952:train_pytorch.py:771)
|
| 126 |
+
00:12:54.203 [I] debug_step=5 action_stats min=-1.0000 max=1.0647 mean=0.0147 std=0.3985 (31952:train_pytorch.py:774)
|
| 127 |
+
00:12:54.203 [I] debug_step=5 state_nonzero_counts_8d_blocks=[32, 0, 32, 0] action_nonzero_counts_8d_blocks=[512, 0, 512, 0] (31952:train_pytorch.py:777)
|
| 128 |
+
00:12:54.204 [I] debug_step=5 masked_dims=[8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 27, 28, 29, 30, 31] active_dims=[0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23] masked_zero_counts state=64 actions=1024 (31952:train_pytorch.py:781)
|
| 129 |
+
00:12:54.204 [I] debug_step=5 lr=6.22e-07 grad_norm=21.7693 data_time=0.1479s step_time=0.5475s gpu_mem_allocated=28.53GB gpu_mem_reserved=35.28GB gpu_mem_max_allocated=35.27GB gpu_mem_max_reserved=35.28GB (31952:train_pytorch.py:786)
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 133 |
+
warnings.warn( # warn only once
|
| 134 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 135 |
+
warnings.warn( # warn only once
|
| 136 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 137 |
+
warnings.warn( # warn only once
|
| 138 |
+
00:14:36.586 [I] Saved checkpoint at step 20 -> /workspace/pi05tests-openpi-multiarm/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/smoke_handover_packed_parallel_20a/20 (31952:train_pytorch.py:323)
|
| 139 |
+
|
| 140 |
+
/workspace/pi05tests-openpi-multiarm/openpi/.venv/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:4631: UserWarning: No device id is provided via `init_process_group` or `barrier `. Using the current device set by the user.
|
| 141 |
+
warnings.warn( # warn only once
|
artifacts/twin_handover_packed_parallelization_20260309/run_logs/twin_handover_followup.log
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[2026-03-09 00:31:32 UTC] follow-up runner started
|
| 2 |
+
[2026-03-09 00:31:32 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 3 |
+
[2026-03-09 00:32:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 4 |
+
[2026-03-09 00:33:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 5 |
+
[2026-03-09 00:34:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 6 |
+
[2026-03-09 00:35:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 7 |
+
[2026-03-09 00:36:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 8 |
+
[2026-03-09 00:37:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 9 |
+
[2026-03-09 00:38:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 10 |
+
[2026-03-09 00:39:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 11 |
+
[2026-03-09 00:40:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 12 |
+
[2026-03-09 00:41:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 13 |
+
[2026-03-09 00:42:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 14 |
+
[2026-03-09 00:43:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 15 |
+
[2026-03-09 00:44:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 16 |
+
[2026-03-09 00:45:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 17 |
+
[2026-03-09 00:46:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 18 |
+
[2026-03-09 00:47:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 19 |
+
[2026-03-09 00:48:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 20 |
+
[2026-03-09 00:49:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 21 |
+
[2026-03-09 00:50:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 22 |
+
[2026-03-09 00:51:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 23 |
+
[2026-03-09 00:52:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 24 |
+
[2026-03-09 00:53:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 25 |
+
[2026-03-09 00:54:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 26 |
+
[2026-03-09 00:55:33 UTC] waiting for processes matching: scripts/train_pytorch.py pi05_twin_handover_256_packed_baseline_pytorch_2k --exp_name handover_packed_baseline_2k
|
| 27 |
+
[2026-03-09 00:56:33 UTC] eval start config=pi05_twin_handover_256_packed_baseline_pytorch_2k ckpt=/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/1000 batches=50
|
| 28 |
+
[2026-03-09 01:01:47 UTC] eval done log=/workspace/run_logs/handover_packed_baseline_2k_val_1000.log
|
| 29 |
+
[2026-03-09 01:01:47 UTC] eval start config=pi05_twin_handover_256_packed_baseline_pytorch_2k ckpt=/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_baseline_pytorch_2k/handover_packed_baseline_2k/2000 batches=100
|
| 30 |
+
[2026-03-09 01:07:06 UTC] eval done log=/workspace/run_logs/handover_packed_baseline_2k_val_2000.log
|
| 31 |
+
[2026-03-09 01:07:06 UTC] launching parallel run
|
| 32 |
+
[2026-03-09 01:42:23 UTC] parallel run finished
|
| 33 |
+
[2026-03-09 01:42:23 UTC] eval start config=pi05_twin_handover_256_packed_parallel_pytorch_2k ckpt=/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/1000 batches=50
|
| 34 |
+
[2026-03-09 01:45:46 UTC] eval done log=/workspace/run_logs/handover_packed_parallel_2k_val_1000.log
|
| 35 |
+
[2026-03-09 01:45:46 UTC] eval start config=pi05_twin_handover_256_packed_parallel_pytorch_2k ckpt=/workspace/openpi/checkpoints/pi05_twin_handover_256_packed_parallel_pytorch_2k/handover_packed_parallel_2k/2000 batches=100
|
| 36 |
+
[2026-03-09 01:49:19 UTC] eval done log=/workspace/run_logs/handover_packed_parallel_2k_val_2000.log
|
| 37 |
+
[2026-03-09 01:49:19 UTC] follow-up runner finished
|
artifacts/twin_handover_packed_parallelization_20260309/sanity_checks/inspect_twin_packed_batch_handover_train.log
ADDED
|
@@ -0,0 +1,176 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
config_name: pi05_twin_handover_256_packed_baseline_pytorch_2k
|
| 2 |
+
repo_id: lsnu/twin_handover_256_train
|
| 3 |
+
sample_index: 0
|
| 4 |
+
norm_stats_path: /workspace/pi05tests-openpi-multiarm/openpi/assets/pi05_twin_handover_256_packed_baseline_pytorch_2k/lsnu/twin_handover_256_train/norm_stats.json
|
| 5 |
+
norm_stats_keys: ['actions', 'state']
|
| 6 |
+
norm_stats_lengths: state_mean=16 state_std=16 action_mean=16 action_std=16
|
| 7 |
+
block_boundaries: [0:8] [8:16] [16:24] [24:32]
|
| 8 |
+
raw_state_16d_shape: (16,)
|
| 9 |
+
raw_state_16d:
|
| 10 |
+
[ 7.1883e-07 1.7515e-01 -5.6890e-06 -8.7299e-01 -6.3130e-06 1.2216e+00
|
| 11 |
+
7.8540e-01 1.0000e+00 1.1957e-06 1.7514e-01 -9.2062e-07 -8.7312e-01
|
| 12 |
+
1.6098e-05 1.2216e+00 7.8539e-01 1.0000e+00]
|
| 13 |
+
raw_actions_16d_shape: (16, 16)
|
| 14 |
+
raw_actions_16d:
|
| 15 |
+
[[ 2.3842e-05 -8.2493e-04 -5.7220e-05 3.9577e-04 2.8610e-05 7.8201e-04
|
| 16 |
+
-1.2398e-04 1.0000e+00 9.5367e-05 4.0293e-03 9.5367e-06 7.2479e-04
|
| 17 |
+
1.8120e-04 -1.4305e-05 -2.2411e-04 1.0000e+00]
|
| 18 |
+
[ 5.0068e-04 -1.5645e-02 2.6083e-03 -5.5575e-02 1.8883e-03 2.5430e-02
|
| 19 |
+
-1.9326e-02 1.0000e+00 2.7800e-02 2.4877e-02 -2.7924e-02 -2.7843e-02
|
| 20 |
+
-1.6832e-02 1.0629e-02 3.8543e-02 1.0000e+00]
|
| 21 |
+
[ 1.7738e-03 -7.6041e-02 8.9645e-03 -1.7257e-01 6.0558e-03 8.7943e-02
|
| 22 |
+
-6.4831e-02 1.0000e+00 9.2287e-02 5.8761e-02 -9.3136e-02 -7.6413e-02
|
| 23 |
+
-5.3630e-02 4.2353e-02 1.2606e-01 1.0000e+00]
|
| 24 |
+
[ 3.2425e-03 -1.3747e-01 1.5845e-02 -3.1527e-01 1.0653e-02 1.6477e-01
|
| 25 |
+
-1.1840e-01 1.0000e+00 1.7036e-01 1.0629e-01 -1.7153e-01 -1.4015e-01
|
| 26 |
+
-9.7461e-02 7.8468e-02 2.3009e-01 1.0000e+00]
|
| 27 |
+
[ 5.5885e-03 -2.1545e-01 2.4767e-02 -4.6663e-01 1.6103e-02 2.4452e-01
|
| 28 |
+
-1.7446e-01 1.0000e+00 2.5305e-01 1.5107e-01 -2.5392e-01 -2.1260e-01
|
| 29 |
+
-1.4490e-01 1.1766e-01 3.4122e-01 1.0000e+00]
|
| 30 |
+
[ 6.1035e-03 -2.8390e-01 3.3288e-02 -6.1909e-01 2.1739e-02 3.2683e-01
|
| 31 |
+
-2.3199e-01 1.0000e+00 3.3677e-01 1.9970e-01 -3.3804e-01 -2.8173e-01
|
| 32 |
+
-1.9161e-01 1.5831e-01 4.5282e-01 1.0000e+00]
|
| 33 |
+
[ 9.3937e-03 -3.1736e-01 3.8815e-02 -7.2264e-01 2.9097e-02 3.8407e-01
|
| 34 |
+
-2.9788e-01 1.0000e+00 3.9431e-01 2.3764e-01 -3.9650e-01 -3.2045e-01
|
| 35 |
+
-2.2884e-01 1.8487e-01 5.3961e-01 1.0000e+00]
|
| 36 |
+
[ 1.1177e-02 -3.3051e-01 4.2367e-02 -7.4072e-01 3.5295e-02 4.0234e-01
|
| 37 |
+
-3.4810e-01 1.0000e+00 4.1353e-01 2.4687e-01 -4.1600e-01 -3.4033e-01
|
| 38 |
+
-2.4390e-01 1.9067e-01 5.7513e-01 1.0000e+00]
|
| 39 |
+
[ 1.2674e-02 -3.1841e-01 4.3559e-02 -7.5366e-01 3.7665e-02 4.1035e-01
|
| 40 |
+
-3.7488e-01 1.0000e+00 4.2095e-01 2.5672e-01 -4.2238e-01 -3.4335e-01
|
| 41 |
+
-2.4950e-01 1.9567e-01 5.8634e-01 1.0000e+00]
|
| 42 |
+
[ 1.5645e-02 -3.0324e-01 4.3592e-02 -7.4167e-01 4.2624e-02 4.1367e-01
|
| 43 |
+
-4.1199e-01 1.0000e+00 4.2353e-01 2.6254e-01 -4.2444e-01 -3.4899e-01
|
| 44 |
+
-2.5064e-01 1.9762e-01 5.8977e-01 1.0000e+00]
|
| 45 |
+
[ 1.6398e-02 -2.9560e-01 4.2553e-02 -7.3503e-01 4.5595e-02 4.1383e-01
|
| 46 |
+
-4.3354e-01 1.0000e+00 4.2382e-01 2.5776e-01 -4.2612e-01 -3.5491e-01
|
| 47 |
+
-2.5177e-01 1.9462e-01 5.9134e-01 1.0000e+00]
|
| 48 |
+
[ 2.0757e-02 -2.9058e-01 4.2739e-02 -7.3133e-01 4.6840e-02 4.1339e-01
|
| 49 |
+
-4.5310e-01 1.0000e+00 4.2468e-01 2.5057e-01 -4.2498e-01 -3.4835e-01
|
| 50 |
+
-2.5149e-01 2.0029e-01 5.9138e-01 1.0000e+00]
|
| 51 |
+
[ 2.3303e-02 -2.7753e-01 4.1437e-02 -7.2254e-01 4.8075e-02 4.1380e-01
|
| 52 |
+
-4.7155e-01 1.0000e+00 4.2468e-01 2.5254e-01 -4.2522e-01 -3.4195e-01
|
| 53 |
+
-2.5130e-01 1.9623e-01 5.9127e-01 1.0000e+00]
|
| 54 |
+
[ 2.7924e-02 -2.5505e-01 4.0684e-02 -7.0069e-01 5.3768e-02 4.1076e-01
|
| 55 |
+
-5.1048e-01 1.0000e+00 4.2446e-01 2.5574e-01 -4.2656e-01 -3.5101e-01
|
| 56 |
+
-2.5181e-01 1.9645e-01 5.9101e-01 1.0000e+00]
|
| 57 |
+
[ 3.2401e-02 -2.4053e-01 4.1451e-02 -6.8364e-01 5.6882e-02 4.1132e-01
|
| 58 |
+
-5.4158e-01 1.0000e+00 4.2435e-01 2.5109e-01 -4.2632e-01 -3.5082e-01
|
| 59 |
+
-2.5095e-01 1.9805e-01 5.9107e-01 1.0000e+00]
|
| 60 |
+
[ 3.4809e-02 -2.2431e-01 4.0565e-02 -6.7288e-01 5.6076e-02 4.0839e-01
|
| 61 |
+
-5.6400e-01 1.0000e+00 4.2504e-01 2.5486e-01 -4.2588e-01 -3.4874e-01
|
| 62 |
+
-2.5139e-01 1.9783e-01 5.9183e-01 1.0000e+00]]
|
| 63 |
+
normalized_state_16d_shape: (16,)
|
| 64 |
+
normalized_state_16d:
|
| 65 |
+
[-0.174 0.1055 -0.0061 1.0124 0.086 -0.4741 0.2016 1.0004 0.0951
|
| 66 |
+
0.0668 0.0549 1.0086 -0.053 -0.3299 -1.0068 1.0004]
|
| 67 |
+
normalized_actions_16d_shape: (16, 16)
|
| 68 |
+
normalized_actions_16d:
|
| 69 |
+
[[-0.2378 0.0147 0.1124 0.1989 0.1562 0.1251 0.0182 1.0004 0.1108
|
| 70 |
+
0.0624 0.0823 0.9208 0.055 -0.5935 -0.7448 1.0004]
|
| 71 |
+
[-0.2367 -0.0063 0.1178 0.1174 0.1593 0.1567 -0.0046 1.0004 0.1686
|
| 72 |
+
0.107 0.02 0.7676 0.0127 -0.5697 -0.6371 1.0004]
|
| 73 |
+
[-0.2338 -0.092 0.1305 -0.0529 0.1664 0.2368 -0.0585 1.0004 0.303
|
| 74 |
+
0.1794 -0.1254 0.5072 -0.0788 -0.499 -0.3941 1.0004]
|
| 75 |
+
[-0.2306 -0.1792 0.1444 -0.2606 0.1742 0.3352 -0.1219 1.0004 0.4658
|
| 76 |
+
0.2811 -0.3003 0.1655 -0.1877 -0.4185 -0.1052 1.0004]
|
| 77 |
+
[-0.2253 -0.2898 0.1623 -0.4809 0.1834 0.4374 -0.1883 1.0004 0.6382
|
| 78 |
+
0.3768 -0.484 -0.223 -0.3056 -0.3311 0.2034 1.0004]
|
| 79 |
+
[-0.2242 -0.3869 0.1795 -0.7028 0.193 0.5429 -0.2564 1.0004 0.8128
|
| 80 |
+
0.4808 -0.6717 -0.5936 -0.4217 -0.2404 0.5133 1.0004]
|
| 81 |
+
[-0.2168 -0.4344 0.1906 -0.8535 0.2055 0.6163 -0.3344 1.0004 0.9328
|
| 82 |
+
0.5619 -0.8021 -0.8012 -0.5143 -0.1812 0.7543 1.0004]
|
| 83 |
+
[-0.2129 -0.4531 0.1977 -0.8798 0.216 0.6397 -0.3939 1.0004 0.9729
|
| 84 |
+
0.5816 -0.8455 -0.9078 -0.5517 -0.1682 0.8529 1.0004]
|
| 85 |
+
[-0.2095 -0.4359 0.2001 -0.8986 0.2201 0.6499 -0.4256 1.0004 0.9883
|
| 86 |
+
0.6027 -0.8598 -0.924 -0.5656 -0.1571 0.8841 1.0004]
|
| 87 |
+
[-0.2029 -0.4144 0.2002 -0.8812 0.2285 0.6542 -0.4695 1.0004 0.9937
|
| 88 |
+
0.6151 -0.8644 -0.9542 -0.5684 -0.1527 0.8936 1.0004]
|
| 89 |
+
[-0.2012 -0.4035 0.1981 -0.8715 0.2335 0.6544 -0.495 1.0004 0.9943
|
| 90 |
+
0.6049 -0.8681 -0.986 -0.5713 -0.1594 0.8979 1.0004]
|
| 91 |
+
[-0.1915 -0.3964 0.1985 -0.8661 0.2356 0.6538 -0.5182 1.0004 0.9961
|
| 92 |
+
0.5895 -0.8656 -0.9508 -0.5705 -0.1468 0.8981 1.0004]
|
| 93 |
+
[-0.1858 -0.3779 0.1959 -0.8533 0.2377 0.6544 -0.54 1.0004 0.9961
|
| 94 |
+
0.5937 -0.8661 -0.9165 -0.5701 -0.1558 0.8978 1.0004]
|
| 95 |
+
[-0.1755 -0.346 0.1944 -0.8215 0.2474 0.6505 -0.5861 1.0004 0.9956
|
| 96 |
+
0.6006 -0.8691 -0.9651 -0.5713 -0.1554 0.897 1.0004]
|
| 97 |
+
[-0.1655 -0.3254 0.1959 -0.7967 0.2527 0.6512 -0.623 1.0004 0.9954
|
| 98 |
+
0.5907 -0.8686 -0.9641 -0.5692 -0.1518 0.8972 1.0004]
|
| 99 |
+
[-0.1601 -0.3024 0.1941 -0.7811 0.2513 0.6474 -0.6495 1.0004 0.9969
|
| 100 |
+
0.5987 -0.8676 -0.9529 -0.5703 -0.1523 0.8993 1.0004]]
|
| 101 |
+
packed_state_32d_shape: (32,)
|
| 102 |
+
packed_state_32d:
|
| 103 |
+
[-0.174 0.1055 -0.0061 1.0124 0.086 -0.4741 0.2016 1.0004 0.
|
| 104 |
+
0. 0. 0. 0. 0. 0. 0. 0.0951 0.0668
|
| 105 |
+
0.0549 1.0086 -0.053 -0.3299 -1.0068 1.0004 0. 0. 0.
|
| 106 |
+
0. 0. 0. 0. 0. ]
|
| 107 |
+
packed_actions_32d_shape: (16, 32)
|
| 108 |
+
packed_actions_32d:
|
| 109 |
+
[[-0.2378 0.0147 0.1124 0.1989 0.1562 0.1251 0.0182 1.0004 0.
|
| 110 |
+
0. 0. 0. 0. 0. 0. 0. 0.1108 0.0624
|
| 111 |
+
0.0823 0.9208 0.055 -0.5935 -0.7448 1.0004 0. 0. 0.
|
| 112 |
+
0. 0. 0. 0. 0. ]
|
| 113 |
+
[-0.2367 -0.0063 0.1178 0.1174 0.1593 0.1567 -0.0046 1.0004 0.
|
| 114 |
+
0. 0. 0. 0. 0. 0. 0. 0.1686 0.107
|
| 115 |
+
0.02 0.7676 0.0127 -0.5697 -0.6371 1.0004 0. 0. 0.
|
| 116 |
+
0. 0. 0. 0. 0. ]
|
| 117 |
+
[-0.2338 -0.092 0.1305 -0.0529 0.1664 0.2368 -0.0585 1.0004 0.
|
| 118 |
+
0. 0. 0. 0. 0. 0. 0. 0.303 0.1794
|
| 119 |
+
-0.1254 0.5072 -0.0788 -0.499 -0.3941 1.0004 0. 0. 0.
|
| 120 |
+
0. 0. 0. 0. 0. ]
|
| 121 |
+
[-0.2306 -0.1792 0.1444 -0.2606 0.1742 0.3352 -0.1219 1.0004 0.
|
| 122 |
+
0. 0. 0. 0. 0. 0. 0. 0.4658 0.2811
|
| 123 |
+
-0.3003 0.1655 -0.1877 -0.4185 -0.1052 1.0004 0. 0. 0.
|
| 124 |
+
0. 0. 0. 0. 0. ]
|
| 125 |
+
[-0.2253 -0.2898 0.1623 -0.4809 0.1834 0.4374 -0.1883 1.0004 0.
|
| 126 |
+
0. 0. 0. 0. 0. 0. 0. 0.6382 0.3768
|
| 127 |
+
-0.484 -0.223 -0.3056 -0.3311 0.2034 1.0004 0. 0. 0.
|
| 128 |
+
0. 0. 0. 0. 0. ]
|
| 129 |
+
[-0.2242 -0.3869 0.1795 -0.7028 0.193 0.5429 -0.2564 1.0004 0.
|
| 130 |
+
0. 0. 0. 0. 0. 0. 0. 0.8128 0.4808
|
| 131 |
+
-0.6717 -0.5936 -0.4217 -0.2404 0.5133 1.0004 0. 0. 0.
|
| 132 |
+
0. 0. 0. 0. 0. ]
|
| 133 |
+
[-0.2168 -0.4344 0.1906 -0.8535 0.2055 0.6163 -0.3344 1.0004 0.
|
| 134 |
+
0. 0. 0. 0. 0. 0. 0. 0.9328 0.5619
|
| 135 |
+
-0.8021 -0.8012 -0.5143 -0.1812 0.7543 1.0004 0. 0. 0.
|
| 136 |
+
0. 0. 0. 0. 0. ]
|
| 137 |
+
[-0.2129 -0.4531 0.1977 -0.8798 0.216 0.6397 -0.3939 1.0004 0.
|
| 138 |
+
0. 0. 0. 0. 0. 0. 0. 0.9729 0.5816
|
| 139 |
+
-0.8455 -0.9078 -0.5517 -0.1682 0.8529 1.0004 0. 0. 0.
|
| 140 |
+
0. 0. 0. 0. 0. ]
|
| 141 |
+
[-0.2095 -0.4359 0.2001 -0.8986 0.2201 0.6499 -0.4256 1.0004 0.
|
| 142 |
+
0. 0. 0. 0. 0. 0. 0. 0.9883 0.6027
|
| 143 |
+
-0.8598 -0.924 -0.5656 -0.1571 0.8841 1.0004 0. 0. 0.
|
| 144 |
+
0. 0. 0. 0. 0. ]
|
| 145 |
+
[-0.2029 -0.4144 0.2002 -0.8812 0.2285 0.6542 -0.4695 1.0004 0.
|
| 146 |
+
0. 0. 0. 0. 0. 0. 0. 0.9937 0.6151
|
| 147 |
+
-0.8644 -0.9542 -0.5684 -0.1527 0.8936 1.0004 0. 0. 0.
|
| 148 |
+
0. 0. 0. 0. 0. ]
|
| 149 |
+
[-0.2012 -0.4035 0.1981 -0.8715 0.2335 0.6544 -0.495 1.0004 0.
|
| 150 |
+
0. 0. 0. 0. 0. 0. 0. 0.9943 0.6049
|
| 151 |
+
-0.8681 -0.986 -0.5713 -0.1594 0.8979 1.0004 0. 0. 0.
|
| 152 |
+
0. 0. 0. 0. 0. ]
|
| 153 |
+
[-0.1915 -0.3964 0.1985 -0.8661 0.2356 0.6538 -0.5182 1.0004 0.
|
| 154 |
+
0. 0. 0. 0. 0. 0. 0. 0.9961 0.5895
|
| 155 |
+
-0.8656 -0.9508 -0.5705 -0.1468 0.8981 1.0004 0. 0. 0.
|
| 156 |
+
0. 0. 0. 0. 0. ]
|
| 157 |
+
[-0.1858 -0.3779 0.1959 -0.8533 0.2377 0.6544 -0.54 1.0004 0.
|
| 158 |
+
0. 0. 0. 0. 0. 0. 0. 0.9961 0.5937
|
| 159 |
+
-0.8661 -0.9165 -0.5701 -0.1558 0.8978 1.0004 0. 0. 0.
|
| 160 |
+
0. 0. 0. 0. 0. ]
|
| 161 |
+
[-0.1755 -0.346 0.1944 -0.8215 0.2474 0.6505 -0.5861 1.0004 0.
|
| 162 |
+
0. 0. 0. 0. 0. 0. 0. 0.9956 0.6006
|
| 163 |
+
-0.8691 -0.9651 -0.5713 -0.1554 0.897 1.0004 0. 0. 0.
|
| 164 |
+
0. 0. 0. 0. 0. ]
|
| 165 |
+
[-0.1655 -0.3254 0.1959 -0.7967 0.2527 0.6512 -0.623 1.0004 0.
|
| 166 |
+
0. 0. 0. 0. 0. 0. 0. 0.9954 0.5907
|
| 167 |
+
-0.8686 -0.9641 -0.5692 -0.1518 0.8972 1.0004 0. 0. 0.
|
| 168 |
+
0. 0. 0. 0. 0. ]
|
| 169 |
+
[-0.1601 -0.3024 0.1941 -0.7811 0.2513 0.6474 -0.6495 1.0004 0.
|
| 170 |
+
0. 0. 0. 0. 0. 0. 0. 0.9969 0.5987
|
| 171 |
+
-0.8676 -0.9529 -0.5703 -0.1523 0.8993 1.0004 0. 0. 0.
|
| 172 |
+
0. 0. 0. 0. 0. ]]
|
| 173 |
+
state_padded_zero_count: 16 / 16
|
| 174 |
+
actions_padded_zero_count: 256 / 256
|
| 175 |
+
state_padded_exact_zero: True
|
| 176 |
+
actions_padded_exact_zero: True
|