# Modification Summary (branch `0502_mp_process`) Snapshot of every edit in this session, grouped by theme. Use this as a code-review checklist and as a record of what changed *behaviorally* in the training, evaluation, and data-prep pipelines. ## High-level themes 1. **Multi-process dataset builder + per-build diagnostics** — cleaning & replay summaries written to `meta/`, with per-speed integrated-motion error reporting. 2. **Action-norm profiler** for data-driven `clean_*_eps` calibration. 3. **Speed-integration ablation framework** — three model-side strategies (`text` / `modulation` / `soft_prompt`) selectable via a single config field, plus a P-length sweep on the soft-prompt arm. 4. **Ablation orchestration** — `scripts/run_ablations.py` (end-to-end) and `scripts/build_ablation_datasets.py` (data-prep only) with build / norm-stats dedup. 5. **8-GPU LIBERO evaluation** — partitioned eval driver that runs `libero_spatial / goal / object` plus a 5-way split of `libero_10`, tracks per-episode step counts, and reports per-suite + global rollups. 6. **`train_pytorch.py` cleanup** — removed silent NFS hang (wandb sample image block), removed hardcoded WANDB API key, made per-speed wandb breakdown config-driven. 7. **Four latent bugs fixed** that were silently degrading training or evaluation, see §7. ## 1. Data processing ### New files | File | Purpose | |---|---| | `scripts/build_libero_speed_dataset_mp.py` | Multi-process build. Per-source-episode workers via `ProcessPoolExecutor` (`spawn`); main process does a sequential fix-up to fill the global `index` column. Same output schema as the single-process variant. CLI adds `--num-workers`. | | `scripts/profile_action_norms.py` | Profile `‖action[:, :3]‖` and `‖action[:, 3:6]‖` distributions on the source dataset. Prints percentiles + threshold table; suggests P1 / P5 values for `--clean-transl-eps` / `--clean-rot-eps`. Optional JSON output. | ### Modified files | File | Change | |---|---| | `scripts/build_libero_speed_dataset.py` | New `_aggregate_cleaning_stats` (deduped by `source_episode_index`) and `_aggregate_replay_metrics` (per target speed). Writes `meta/cleaning_summary.json` and `meta/replay_summary.json`; prints both at end of build. | | `src/various_speed/core.py` | `transform_episode` metrics now include `cleaned_any_frames`, `cleaned_both_frames`, plus the four ratios (`*_translation_ratio`, `*_rotation_ratio`, `*_any_ratio`, `*_both_ratio`). | | `scripts/compute_norm_stats.py` | `main()` accepts `--repo-id` and `--asset-id` overrides. Uses `dataclasses.replace` on frozen `TrainConfig` / nested `LeRobotVariousSpeedLiberoDataConfig` / `AssetsConfig` to apply overrides without registering one TrainConfig per ablation. | ## 2. `train_pytorch.py` cleanup - Removed the wandb sample-image logging block. It created a second DataLoader and fetched 256 samples on the first batch, hanging silently for many minutes on NFS-backed datasets. Loss / lr / grad-norm wandb logging is unaffected. - Removed the hardcoded `WANDB_API_KEY` env-var assignment (security: a real key was committed to the repo). Auth now uses the standard wandb resolution order. **The leaked key is in git history; rotate it on wandb**. - Removed the `"EMA is not supported for PyTorch training"` log line (noise). - `speed_specs` (per-speed wandb loss breakdown) and `avg_flow_metrics` key list are now derived from `config.eval_speed_set`, no longer hardcoded `0p5 / 1p0 / 2p0`. (Only fires when `observation.flow_control is not None`.) ## 3. Speed-integration ablation: model + config ### New TrainConfig field (`src/openpi/training/config.py`) ```python eval_speed_set: tuple[float, ...] = (0.5, 1.0, 2.0) ``` Drives the per-speed wandb breakdown in `train_pytorch.py`. ### New LeRobotVariousSpeedLiberoDataConfig field ```python speed_integration: Literal["text", "modulation", "soft_prompt", "auto"] = "auto" ``` A high-level switch: | value | behavior | requirement | |---|---|---| | `text` | adds `SpeedConditionedPrompt` to data transforms | none | | `modulation` | model reads raw `observation.speed` -> MLP -> adaRMS in action expert | `Pi0Config.speed_modulation=True` | | `soft_prompt` | inserts K × P learnable tokens between vision and instruction | `Pi0Config.soft_prompt_p >= 1` and `soft_prompt_speeds` non-empty | ### New Pi0Config fields (`src/openpi/models/pi0_config.py`) ```python speed_modulation: bool = False soft_prompt_speeds: tuple[float, ...] = () soft_prompt_p: int = 0 ``` `flow_control_dim` was removed. `inputs_spec` now declares `speed=ShapeDtypeStruct([B, 1], float32)` whenever modulation OR soft_prompt is enabled. ### Observation schema (`src/openpi/models/model.py`) Added `speed: at.Float[ArrayT, "*b 1"] | None = None`. Both `from_dict` and `preprocess_observation` (JAX) propagate it. ### PI0Pytorch surgery (`src/openpi/models_pytorch/pi0_pytorch.py`) - `__init__`: - When `speed_modulation=True`: registers `speed_mod_mlp_in/out` and `speed_condition_mlp_in/out` (replaces the old `flow_control_*` / `flow_condition_*` MLPs). Reads raw `observation.speed` (shape `(B, 1)`); no log transform is applied. - When `soft_prompt_p > 0`: registers `soft_prompt_tokens: nn.Parameter` of shape `(K, P, paligemma_width)` with `N(0, 0.02)` init, plus a non-persistent buffer `soft_prompt_anchors: tensor(K,)`. - `_preprocess_observation`: returns a 6-tuple `(images, image_masks, lang_tokens, lang_masks, state, speed)`. - `embed_prefix`: accepts `speed=None`; when soft_prompt is enabled, computes `argmin |speed − anchors|` per batch element and inserts `(B, P, hidden)` tokens between image and language tokens with full attention. OOD speeds fall back to the nearest training anchor. - `embed_suffix`: accepts `speed=None`; when modulation is enabled, pushes raw speed through `speed_mod_mlp` and fuses with the timestep embedding via `speed_condition_mlp`. - `forward`, `sample_actions`, and `denoise_step` plumb `speed` through. JAX `Pi0` (`src/openpi/models/pi0.py`) was renamed in the same way for consistency: `flow_control_dim → speed_modulation`, `flow_control_mlp_* → speed_mod_mlp_*`, `flow_condition_mlp_* → speed_condition_mlp_*`, reads `obs.speed`. ### Policy passthrough (`src/openpi/policies/libero_policy.py`) `LiberoInputs` now passes `data["speed"]` through, alongside the existing `flow_control` passthrough. ### Smoke tests (`tests/test_soft_prompt_smoke.py`) Light tests for `Pi0Config` field acceptance and the argmin nearest-neighbor logic. The full end-to-end forward-pass test requires PaliGemma weights and is gated to a manual GPU run. ## 4. Ablation orchestration ### `scripts/run_ablations.py` (new) End-to-end orchestrator: build → norm-stats → train, per ablation. - `Ablation` dataclass: `name`, `speeds`, `speed_integration`, `extra_train_args`, `shared_norm_key`. - 12 default ablations: - **Speed-set sweep** (5): `g1_baseline`, `g2_coarse`, `g3a_step025`, `g4_narrow`, `g5_extreme`. All use `text` integration. - **Speed-integration sweep** (3): `speedint_text`, `speedint_modulation` (with `--model.flow-control-dim=1`), and `softprompt_p8` (reused from the P-sweep). - **Soft-prompt P-length sweep** (5): `softprompt_p{1,4,8,16,32}`. All declare `shared_norm_key="softprompt_shared"` so they reuse one `norm_stats.json`. - For the default 12-ablation table, dedup gives **5 builds, 8 norm-stats, 12 train runs**. - `--only`, `--skip-build`, `--skip-norm-stats`, `--skip-train`, `--dry-run` for scoped runs. ### `scripts/build_ablation_datasets.py` (new) Thin focused wrapper for the data-prep stage only. Imports the same `ABLATIONS` table from `run_ablations.py`, applies the same build dedup, exits with a summary mapping ablation names to dataset paths. ## 5. 8-GPU LIBERO evaluation ### New files | File | Purpose | |---|---| | `scripts/eval_libero_speed.py` | Single-GPU LIBERO eval client: connects to a websocket policy server, runs rollouts on a chosen suite or task-id subset, sends `speed` and `speed_label` in the observation element, records per-episode `success` / `steps` / `task_id`, prints per-rank summary, and writes a JSON. | | `scripts/eval_libero_8gpu.sh` | Driver: dispatches 8 parallel `eval_libero_speed.py` clients. Partition is **GPU 0 → spatial / 1 → goal / 2 → object** (full suites) and **GPU 3-7 → libero_10 split into pairs of tasks**, balancing wall-clock. After all 8 finish, auto-aggregates per-suite and global rollups (success rate, mean steps for successes, mean steps overall). | ### Per-episode tracking `eval_libero_speed.py` records the **policy steps actually executed** (excluding the `num_steps_wait` warmup), so: - successes terminate early when `env.step` returns `done=True` and report the true step count; - failures hit `max_steps` for the suite (220 / 280 / 300 / 520 / 400 for spatial / object / goal / 10 / 90 respectively). The gap between `mean_steps_success` and `mean_steps_all` is a fast read-out for failure rate at a glance: `mean_steps_all` rises sharply when failures push the time-limit cap. ### Output ``` results/libero_eval_x_/ spatial_x.json goal_x.json object_x.json long_t0_1_x.json long_t2_3_x.json ... long_t8_9_x.json logs/<...>.log videos/<...>/.mp4 ``` Each per-rank JSON contains a `summary` block (success rate, step statistics, summary line) and a per-episode list. The driver's final output is a per-suite rollup and a global line. ## 6. Documentation - `VARIOUS_SPEED_README.md` — added §2 (action-norm profiling) and §4 (multi-process build + cleaning/replay summary outputs); §8 notes the wandb-image-logging removal. - `README_ablation.md` (new) — full ablation workflow doc, including: the four sweep tables, build/norm dedup behavior, the 8-GPU evaluation workflow, and the soft-prompt implementation notes. - `modification_summary.md` (this file). - `VARIOUS_SPEED_CURRENT_PIPELINE.md` — deleted (superseded). ## 7. Bug fixes worth flagging ### 7.1 `flow_control` was being silently dropped `src/openpi/models_pytorch/preprocessing_pytorch.py` constructed `SimpleProcessedObservation` without copying `flow_control`. Result: in PyTorch training, `observation.flow_control` was always `None`, so the action expert's modulation MLP always received zeros. **Implication**: any prior PyTorch run with the (now-removed) `pi05_libero_various_speed_all_flow_prompt` config did **not** actually use modulation — it was equivalent to the text-only path. After the later refactor, the `flow_control` field was eliminated entirely; the modulation path now reads `observation.speed` directly. The new `pi05_libero_various_speed_all_modulation` config replaces it. Fix: pass `speed` through to `SimpleProcessedObservation` (the `flow_control` field has since been removed). ### 7.2 JAX `preprocess_observation` did not pass `speed` `src/openpi/models/model.py:preprocess_observation` (JAX path) didn't propagate the new `speed` field. Even though the JAX trainer is not used for the soft_prompt sweep, the field should round-trip cleanly to keep both backends consistent. Fixed. ### 7.3 `--model.soft-prompt-speeds` CLI syntax `scripts/run_ablations.py` initially emitted `--model.soft-prompt-speeds=0.75,1,1.25,1.5` (comma-joined). Tyro parses `tuple[float, ...]` from space-separated argv elements (matching `--eval-speed-set` style). Fixed: emit the flag and each value as separate argv elements. ### 7.4 Hardcoded WANDB API key in `init_wandb` A live key was hardcoded and unconditionally written to `os.environ["WANDB_API_KEY"]`, overriding each user's own credentials and attributing all runs to one account. Removed; wandb now uses its standard auth resolution order. **The committed key is exposed in git history; revoke and rotate**. ## 8. Behavioral changes you should be aware of - **`speed_integration` defaults to `"auto"`**, which preserves legacy behavior of the existing 3 LIBERO speed configs in `config.py`. New ablation configs should set `speed_integration` explicitly. - **The legacy `pi05_libero_various_speed_all_flow_prompt` and `pi05_libero_various_speed_all_flow_noprompt` configs were removed** (replaced by `pi05_libero_various_speed_all_modulation`). The old configs were equivalent to text-only training due to the §7.1 bug, so any checkpoints from those names are not the modulation behavior they appeared to be. - **wandb no longer logs sample camera images on first batch**. If you relied on that for debugging data inputs, run `scripts/visualize_speed_dataset.py` separately. - **Per-build `meta/cleaning_summary.json` and `meta/replay_summary.json`** are new artifacts. Existing downstream consumers should ignore unknown meta files; verify if you have custom tooling that reads `meta/*.json`. - **`g2_coarse` and `g4_narrow` speeds were updated mid-session**: - `g2_coarse`: `[0.5, 1.0, 2.0]` → `[0.5, 1.0, 1.5, 2.0]` - `g4_narrow`: `[0.75, 1.0, 1.25]` → `[0.75, 1.0, 1.25, 1.5]` - `g4_narrow` now shares its dataset with the entire speed-integration sweep, so the runner builds it only once. ## 9. Files added / modified / deleted ``` # New A README_ablation.md A modification_summary.md A scripts/build_ablation_datasets.py A scripts/eval_libero_8gpu.sh A scripts/eval_libero_speed.py A scripts/profile_action_norms.py A scripts/run_ablations.py A tests/test_soft_prompt_smoke.py # Modified M src/openpi/models/model.py M src/openpi/models/pi0.py M src/openpi/models/pi0_config.py M src/openpi/models_pytorch/pi0_pytorch.py M src/openpi/models_pytorch/preprocessing_pytorch.py M src/openpi/policies/libero_policy.py M src/openpi/transforms.py M src/openpi/training/config.py M src/various_speed/core.py M scripts/build_libero_speed_dataset.py M scripts/build_libero_speed_dataset_mp.py M scripts/compute_norm_stats.py M scripts/train_pytorch.py M VARIOUS_SPEED_README.md # Deleted D VARIOUS_SPEED_CURRENT_PIPELINE.md ``` ## 10. Verification still needed (manual, on GPU host) 1. `uv run pytest tests/test_soft_prompt_smoke.py -v` — config validation and nearest-neighbor logic. CPU-only, fast. 2. Single-batch forward pass of `PI0Pytorch` with soft_prompt enabled (see docstring on `tests/test_soft_prompt_smoke.py::test_full_forward_pass_manual_only`). 3. `uv run python scripts/run_ablations.py ... --dry-run` — visually confirm the printed CLI commands look correct, especially that `--model.soft-prompt-speeds 0.75 1 1.25 1.5` is space-separated. 4. ~50-step smoke run of `softprompt_p8` on 1 GPU to confirm the model trains without shape / mask / dtype errors. 5. `profile_action_norms.py` on the source dataset, then update `--clean-transl-eps` / `--clean-rot-eps` in build commands to data-driven values before kicking off the full sweep. 6. `eval_libero_8gpu.sh` end-to-end with a single trained checkpoint, `SPEED=1.0` on the in-distribution speed first to confirm 8-rank coordination works, then iterate over OOD speeds.