| # Modification Summary (branch `0502_mp_process`) |
|
|
| Snapshot of every edit in this session, grouped by theme. Use this as a |
| code-review checklist and as a record of what changed *behaviorally* in the |
| training, evaluation, and data-prep pipelines. |
|
|
| ## High-level themes |
|
|
| 1. **Multi-process dataset builder + per-build diagnostics** — cleaning & |
| replay summaries written to `meta/`, with per-speed integrated-motion |
| error reporting. |
| 2. **Action-norm profiler** for data-driven `clean_*_eps` calibration. |
| 3. **Speed-integration ablation framework** — three model-side strategies |
| (`text` / `modulation` / `soft_prompt`) selectable via a single config |
| field, plus a P-length sweep on the soft-prompt arm. |
| 4. **Ablation orchestration** — `scripts/run_ablations.py` (end-to-end) and |
| `scripts/build_ablation_datasets.py` (data-prep only) with build / |
| norm-stats dedup. |
| 5. **8-GPU LIBERO evaluation** — partitioned eval driver that runs |
| `libero_spatial / goal / object` plus a 5-way split of `libero_10`, |
| tracks per-episode step counts, and reports per-suite + global rollups. |
| 6. **`train_pytorch.py` cleanup** — removed silent NFS hang (wandb sample |
| image block), removed hardcoded WANDB API key, made per-speed wandb |
| breakdown config-driven. |
| 7. **Four latent bugs fixed** that were silently degrading training or |
| evaluation, see §7. |
| |
| ## 1. Data processing |
| |
| ### New files |
| |
| | File | Purpose | |
| |---|---| |
| | `scripts/build_libero_speed_dataset_mp.py` | Multi-process build. Per-source-episode workers via `ProcessPoolExecutor` (`spawn`); main process does a sequential fix-up to fill the global `index` column. Same output schema as the single-process variant. CLI adds `--num-workers`. | |
| | `scripts/profile_action_norms.py` | Profile `‖action[:, :3]‖` and `‖action[:, 3:6]‖` distributions on the source dataset. Prints percentiles + threshold table; suggests P1 / P5 values for `--clean-transl-eps` / `--clean-rot-eps`. Optional JSON output. | |
| |
| ### Modified files |
| |
| | File | Change | |
| |---|---| |
| | `scripts/build_libero_speed_dataset.py` | New `_aggregate_cleaning_stats` (deduped by `source_episode_index`) and `_aggregate_replay_metrics` (per target speed). Writes `meta/cleaning_summary.json` and `meta/replay_summary.json`; prints both at end of build. | |
| | `src/various_speed/core.py` | `transform_episode` metrics now include `cleaned_any_frames`, `cleaned_both_frames`, plus the four ratios (`*_translation_ratio`, `*_rotation_ratio`, `*_any_ratio`, `*_both_ratio`). | |
| | `scripts/compute_norm_stats.py` | `main()` accepts `--repo-id` and `--asset-id` overrides. Uses `dataclasses.replace` on frozen `TrainConfig` / nested `LeRobotVariousSpeedLiberoDataConfig` / `AssetsConfig` to apply overrides without registering one TrainConfig per ablation. | |
| |
| ## 2. `train_pytorch.py` cleanup |
| |
| - Removed the wandb sample-image logging block. It created a second |
| DataLoader and fetched 256 samples on the first batch, hanging silently |
| for many minutes on NFS-backed datasets. Loss / lr / grad-norm wandb |
| logging is unaffected. |
| - Removed the hardcoded `WANDB_API_KEY` env-var assignment (security: a real |
| key was committed to the repo). Auth now uses the standard wandb |
| resolution order. **The leaked key is in git history; rotate it on |
| wandb**. |
| - Removed the `"EMA is not supported for PyTorch training"` log line |
| (noise). |
| - `speed_specs` (per-speed wandb loss breakdown) and `avg_flow_metrics` key |
| list are now derived from `config.eval_speed_set`, no longer hardcoded |
| `0p5 / 1p0 / 2p0`. (Only fires when `observation.flow_control is not None`.) |
| |
| ## 3. Speed-integration ablation: model + config |
| |
| ### New TrainConfig field (`src/openpi/training/config.py`) |
| |
| ```python |
| eval_speed_set: tuple[float, ...] = (0.5, 1.0, 2.0) |
| ``` |
| |
| Drives the per-speed wandb breakdown in `train_pytorch.py`. |
| |
| ### New LeRobotVariousSpeedLiberoDataConfig field |
| |
| ```python |
| speed_integration: Literal["text", "modulation", "soft_prompt", "auto"] = "auto" |
| ``` |
| |
| A high-level switch: |
| |
| | value | behavior | requirement | |
| |---|---|---| |
| | `text` | adds `SpeedConditionedPrompt` to data transforms | none | |
| | `modulation` | model reads raw `observation.speed` -> MLP -> adaRMS in action expert | `Pi0Config.speed_modulation=True` | |
| | `soft_prompt` | inserts K × P learnable tokens between vision and instruction | `Pi0Config.soft_prompt_p >= 1` and `soft_prompt_speeds` non-empty | |
| |
| ### New Pi0Config fields (`src/openpi/models/pi0_config.py`) |
| |
| ```python |
| speed_modulation: bool = False |
| soft_prompt_speeds: tuple[float, ...] = () |
| soft_prompt_p: int = 0 |
| ``` |
| |
| `flow_control_dim` was removed. `inputs_spec` now declares |
| `speed=ShapeDtypeStruct([B, 1], float32)` whenever modulation OR |
| soft_prompt is enabled. |
| |
| ### Observation schema (`src/openpi/models/model.py`) |
| |
| Added `speed: at.Float[ArrayT, "*b 1"] | None = None`. Both `from_dict` |
| and `preprocess_observation` (JAX) propagate it. |
| |
| ### PI0Pytorch surgery (`src/openpi/models_pytorch/pi0_pytorch.py`) |
| |
| - `__init__`: |
| - When `speed_modulation=True`: registers `speed_mod_mlp_in/out` and |
| `speed_condition_mlp_in/out` (replaces the old `flow_control_*` / |
| `flow_condition_*` MLPs). Reads raw `observation.speed` |
| (shape `(B, 1)`); no log transform is applied. |
| - When `soft_prompt_p > 0`: registers `soft_prompt_tokens: nn.Parameter` |
| of shape `(K, P, paligemma_width)` with `N(0, 0.02)` init, plus a |
| non-persistent buffer `soft_prompt_anchors: tensor(K,)`. |
| - `_preprocess_observation`: returns a 6-tuple |
| `(images, image_masks, lang_tokens, lang_masks, state, speed)`. |
| - `embed_prefix`: accepts `speed=None`; when soft_prompt is enabled, |
| computes `argmin |speed − anchors|` per batch element and inserts |
| `(B, P, hidden)` tokens between image and language tokens with full |
| attention. OOD speeds fall back to the nearest training anchor. |
| - `embed_suffix`: accepts `speed=None`; when modulation is enabled, |
| pushes raw speed through `speed_mod_mlp` and fuses with the timestep |
| embedding via `speed_condition_mlp`. |
| - `forward`, `sample_actions`, and `denoise_step` plumb `speed` through. |
| |
| JAX `Pi0` (`src/openpi/models/pi0.py`) was renamed in the same way for |
| consistency: `flow_control_dim → speed_modulation`, `flow_control_mlp_* |
| → speed_mod_mlp_*`, `flow_condition_mlp_* → speed_condition_mlp_*`, |
| reads `obs.speed`. |
| |
| ### Policy passthrough (`src/openpi/policies/libero_policy.py`) |
| |
| `LiberoInputs` now passes `data["speed"]` through, alongside the existing |
| `flow_control` passthrough. |
| |
| ### Smoke tests (`tests/test_soft_prompt_smoke.py`) |
| |
| Light tests for `Pi0Config` field acceptance and the argmin |
| nearest-neighbor logic. The full end-to-end forward-pass test requires |
| PaliGemma weights and is gated to a manual GPU run. |
| |
| ## 4. Ablation orchestration |
| |
| ### `scripts/run_ablations.py` (new) |
| |
| End-to-end orchestrator: build → norm-stats → train, per ablation. |
| |
| - `Ablation` dataclass: `name`, `speeds`, `speed_integration`, |
| `extra_train_args`, `shared_norm_key`. |
| - 12 default ablations: |
| - **Speed-set sweep** (5): `g1_baseline`, `g2_coarse`, `g3a_step025`, |
| `g4_narrow`, `g5_extreme`. All use `text` integration. |
| - **Speed-integration sweep** (3): `speedint_text`, |
| `speedint_modulation` (with `--model.flow-control-dim=1`), and |
| `softprompt_p8` (reused from the P-sweep). |
| - **Soft-prompt P-length sweep** (5): `softprompt_p{1,4,8,16,32}`. All |
| declare `shared_norm_key="softprompt_shared"` so they reuse one |
| `norm_stats.json`. |
| - For the default 12-ablation table, dedup gives **5 builds, 8 norm-stats, |
| 12 train runs**. |
| - `--only`, `--skip-build`, `--skip-norm-stats`, `--skip-train`, |
| `--dry-run` for scoped runs. |
| |
| ### `scripts/build_ablation_datasets.py` (new) |
|
|
| Thin focused wrapper for the data-prep stage only. Imports the same |
| `ABLATIONS` table from `run_ablations.py`, applies the same build dedup, |
| exits with a summary mapping ablation names to dataset paths. |
|
|
| ## 5. 8-GPU LIBERO evaluation |
|
|
| ### New files |
|
|
| | File | Purpose | |
| |---|---| |
| | `scripts/eval_libero_speed.py` | Single-GPU LIBERO eval client: connects to a websocket policy server, runs rollouts on a chosen suite or task-id subset, sends `speed` and `speed_label` in the observation element, records per-episode `success` / `steps` / `task_id`, prints per-rank summary, and writes a JSON. | |
| | `scripts/eval_libero_8gpu.sh` | Driver: dispatches 8 parallel `eval_libero_speed.py` clients. Partition is **GPU 0 → spatial / 1 → goal / 2 → object** (full suites) and **GPU 3-7 → libero_10 split into pairs of tasks**, balancing wall-clock. After all 8 finish, auto-aggregates per-suite and global rollups (success rate, mean steps for successes, mean steps overall). | |
| |
| ### Per-episode tracking |
| |
| `eval_libero_speed.py` records the **policy steps actually executed** |
| (excluding the `num_steps_wait` warmup), so: |
| |
| - successes terminate early when `env.step` returns `done=True` and report |
| the true step count; |
| - failures hit `max_steps` for the suite (220 / 280 / 300 / 520 / 400 for |
| spatial / object / goal / 10 / 90 respectively). |
| |
| The gap between `mean_steps_success` and `mean_steps_all` is a fast |
| read-out for failure rate at a glance: `mean_steps_all` rises sharply |
| when failures push the time-limit cap. |
| |
| ### Output |
| |
| ``` |
| results/libero_eval_<speed>x_<ts>/ |
| spatial_<speed>x.json |
| goal_<speed>x.json |
| object_<speed>x.json |
| long_t0_1_<speed>x.json long_t2_3_<speed>x.json ... long_t8_9_<speed>x.json |
| logs/<...>.log |
| videos/<...>/<rollout>.mp4 |
| ``` |
| |
| Each per-rank JSON contains a `summary` block (success rate, step |
| statistics, summary line) and a per-episode list. The driver's final |
| output is a per-suite rollup and a global line. |
| |
| ## 6. Documentation |
| |
| - `VARIOUS_SPEED_README.md` — added §2 (action-norm profiling) and §4 |
| (multi-process build + cleaning/replay summary outputs); §8 notes the |
| wandb-image-logging removal. |
| - `README_ablation.md` (new) — full ablation workflow doc, including: |
| the four sweep tables, build/norm dedup behavior, the 8-GPU evaluation |
| workflow, and the soft-prompt implementation notes. |
| - `modification_summary.md` (this file). |
| - `VARIOUS_SPEED_CURRENT_PIPELINE.md` — deleted (superseded). |
| |
| ## 7. Bug fixes worth flagging |
| |
| ### 7.1 `flow_control` was being silently dropped |
| |
| `src/openpi/models_pytorch/preprocessing_pytorch.py` constructed |
| `SimpleProcessedObservation` without copying `flow_control`. Result: in |
| PyTorch training, `observation.flow_control` was always `None`, so the |
| action expert's modulation MLP always received zeros. |
| |
| **Implication**: any prior PyTorch run with the (now-removed) |
| `pi05_libero_various_speed_all_flow_prompt` config did **not** actually |
| use modulation — it was equivalent to the text-only path. After the |
| later refactor, the `flow_control` field was eliminated entirely; the |
| modulation path now reads `observation.speed` directly. The new |
| `pi05_libero_various_speed_all_modulation` config replaces it. |
| |
| Fix: pass `speed` through to `SimpleProcessedObservation` (the |
| `flow_control` field has since been removed). |
| |
| ### 7.2 JAX `preprocess_observation` did not pass `speed` |
| |
| `src/openpi/models/model.py:preprocess_observation` (JAX path) didn't |
| propagate the new `speed` field. Even though the JAX trainer is not used |
| for the soft_prompt sweep, the field should round-trip cleanly to keep |
| both backends consistent. Fixed. |
| |
| ### 7.3 `--model.soft-prompt-speeds` CLI syntax |
| |
| `scripts/run_ablations.py` initially emitted |
| `--model.soft-prompt-speeds=0.75,1,1.25,1.5` (comma-joined). Tyro parses |
| `tuple[float, ...]` from space-separated argv elements (matching |
| `--eval-speed-set` style). Fixed: emit the flag and each value as |
| separate argv elements. |
| |
| ### 7.4 Hardcoded WANDB API key in `init_wandb` |
| |
| A live key was hardcoded and unconditionally written to |
| `os.environ["WANDB_API_KEY"]`, overriding each user's own credentials and |
| attributing all runs to one account. Removed; wandb now uses its standard |
| auth resolution order. **The committed key is exposed in git history; |
| revoke and rotate**. |
| |
| ## 8. Behavioral changes you should be aware of |
| |
| - **`speed_integration` defaults to `"auto"`**, which preserves legacy |
| behavior of the existing 3 LIBERO speed configs in `config.py`. New |
| ablation configs should set `speed_integration` explicitly. |
| - **The legacy `pi05_libero_various_speed_all_flow_prompt` and |
| `pi05_libero_various_speed_all_flow_noprompt` configs were removed** |
| (replaced by `pi05_libero_various_speed_all_modulation`). The old |
| configs were equivalent to text-only training due to the §7.1 bug, so |
| any checkpoints from those names are not the modulation behavior they |
| appeared to be. |
| - **wandb no longer logs sample camera images on first batch**. If you |
| relied on that for debugging data inputs, run |
| `scripts/visualize_speed_dataset.py` separately. |
| - **Per-build `meta/cleaning_summary.json` and `meta/replay_summary.json`** |
| are new artifacts. Existing downstream consumers should ignore unknown |
| meta files; verify if you have custom tooling that reads `meta/*.json`. |
| - **`g2_coarse` and `g4_narrow` speeds were updated mid-session**: |
| - `g2_coarse`: `[0.5, 1.0, 2.0]` → `[0.5, 1.0, 1.5, 2.0]` |
| - `g4_narrow`: `[0.75, 1.0, 1.25]` → `[0.75, 1.0, 1.25, 1.5]` |
| - `g4_narrow` now shares its dataset with the entire speed-integration |
| sweep, so the runner builds it only once. |
| |
| ## 9. Files added / modified / deleted |
|
|
| ``` |
| # New |
| A README_ablation.md |
| A modification_summary.md |
| A scripts/build_ablation_datasets.py |
| A scripts/eval_libero_8gpu.sh |
| A scripts/eval_libero_speed.py |
| A scripts/profile_action_norms.py |
| A scripts/run_ablations.py |
| A tests/test_soft_prompt_smoke.py |
| |
| # Modified |
| M src/openpi/models/model.py |
| M src/openpi/models/pi0.py |
| M src/openpi/models/pi0_config.py |
| M src/openpi/models_pytorch/pi0_pytorch.py |
| M src/openpi/models_pytorch/preprocessing_pytorch.py |
| M src/openpi/policies/libero_policy.py |
| M src/openpi/transforms.py |
| M src/openpi/training/config.py |
| M src/various_speed/core.py |
| M scripts/build_libero_speed_dataset.py |
| M scripts/build_libero_speed_dataset_mp.py |
| M scripts/compute_norm_stats.py |
| M scripts/train_pytorch.py |
| M VARIOUS_SPEED_README.md |
| |
| # Deleted |
| D VARIOUS_SPEED_CURRENT_PIPELINE.md |
| ``` |
|
|
| ## 10. Verification still needed (manual, on GPU host) |
|
|
| 1. `uv run pytest tests/test_soft_prompt_smoke.py -v` — config validation |
| and nearest-neighbor logic. CPU-only, fast. |
| 2. Single-batch forward pass of `PI0Pytorch` with soft_prompt enabled |
| (see docstring on |
| `tests/test_soft_prompt_smoke.py::test_full_forward_pass_manual_only`). |
| 3. `uv run python scripts/run_ablations.py ... --dry-run` — visually |
| confirm the printed CLI commands look correct, especially that |
| `--model.soft-prompt-speeds 0.75 1 1.25 1.5` is space-separated. |
| 4. ~50-step smoke run of `softprompt_p8` on 1 GPU to confirm the model |
| trains without shape / mask / dtype errors. |
| 5. `profile_action_norms.py` on the source dataset, then update |
| `--clean-transl-eps` / `--clean-rot-eps` in build commands to |
| data-driven values before kicking off the full sweep. |
| 6. `eval_libero_8gpu.sh` end-to-end with a single trained checkpoint, |
| `SPEED=1.0` on the in-distribution speed first to confirm 8-rank |
| coordination works, then iterate over OOD speeds. |
|
|