VLAwithVariousSpeed / modification_summary.md

Upload folder using huggingface_hub

08ff31f verified about 1 month ago

15.2 kB

Modification Summary (branch `0502_mp_process`)

Snapshot of every edit in this session, grouped by theme. Use this as a code-review checklist and as a record of what changed behaviorally in the training, evaluation, and data-prep pipelines.

High-level themes

Multi-process dataset builder + per-build diagnostics — cleaning & replay summaries written to meta/, with per-speed integrated-motion error reporting.
Action-norm profiler for data-driven clean_*_eps calibration.
Speed-integration ablation framework — three model-side strategies (text / modulation / soft_prompt) selectable via a single config field, plus a P-length sweep on the soft-prompt arm.
Ablation orchestration — scripts/run_ablations.py (end-to-end) and scripts/build_ablation_datasets.py (data-prep only) with build / norm-stats dedup.
8-GPU LIBERO evaluation — partitioned eval driver that runs libero_spatial / goal / object plus a 5-way split of libero_10, tracks per-episode step counts, and reports per-suite + global rollups.
train_pytorch.py cleanup — removed silent NFS hang (wandb sample image block), removed hardcoded WANDB API key, made per-speed wandb breakdown config-driven.
Four latent bugs fixed that were silently degrading training or evaluation, see §7.

1. Data processing

New files

File	Purpose
`scripts/build_libero_speed_dataset_mp.py`	Multi-process build. Per-source-episode workers via `ProcessPoolExecutor` (`spawn`); main process does a sequential fix-up to fill the global `index` column. Same output schema as the single-process variant. CLI adds `--num-workers`.
`scripts/profile_action_norms.py`	Profile `‖action[:, :3]‖` and `‖action[:, 3:6]‖` distributions on the source dataset. Prints percentiles + threshold table; suggests P1 / P5 values for `--clean-transl-eps` / `--clean-rot-eps`. Optional JSON output.

Modified files

File	Change
`scripts/build_libero_speed_dataset.py`	New `_aggregate_cleaning_stats` (deduped by `source_episode_index`) and `_aggregate_replay_metrics` (per target speed). Writes `meta/cleaning_summary.json` and `meta/replay_summary.json`; prints both at end of build.
`src/various_speed/core.py`	`transform_episode` metrics now include `cleaned_any_frames`, `cleaned_both_frames`, plus the four ratios (`_translation_ratio`, `_rotation_ratio`, `_any_ratio`, `_both_ratio`).
`scripts/compute_norm_stats.py`	`main()` accepts `--repo-id` and `--asset-id` overrides. Uses `dataclasses.replace` on frozen `TrainConfig` / nested `LeRobotVariousSpeedLiberoDataConfig` / `AssetsConfig` to apply overrides without registering one TrainConfig per ablation.

2. `train_pytorch.py` cleanup

Removed the wandb sample-image logging block. It created a second DataLoader and fetched 256 samples on the first batch, hanging silently for many minutes on NFS-backed datasets. Loss / lr / grad-norm wandb logging is unaffected.
Removed the hardcoded WANDB_API_KEY env-var assignment (security: a real key was committed to the repo). Auth now uses the standard wandb resolution order. The leaked key is in git history; rotate it on wandb.
Removed the "EMA is not supported for PyTorch training" log line (noise).
speed_specs (per-speed wandb loss breakdown) and avg_flow_metrics key list are now derived from config.eval_speed_set, no longer hardcoded 0p5 / 1p0 / 2p0. (Only fires when observation.flow_control is not None.)

3. Speed-integration ablation: model + config

New TrainConfig field (`src/openpi/training/config.py`)

eval_speed_set: tuple[float, ...] = (0.5, 1.0, 2.0)

Drives the per-speed wandb breakdown in train_pytorch.py.

New LeRobotVariousSpeedLiberoDataConfig field

speed_integration: Literal["text", "modulation", "soft_prompt", "auto"] = "auto"

A high-level switch:

value	behavior	requirement
`text`	adds `SpeedConditionedPrompt` to data transforms	none
`modulation`	model reads raw `observation.speed` -> MLP -> adaRMS in action expert	`Pi0Config.speed_modulation=True`
`soft_prompt`	inserts K × P learnable tokens between vision and instruction	`Pi0Config.soft_prompt_p >= 1` and `soft_prompt_speeds` non-empty

New Pi0Config fields (`src/openpi/models/pi0_config.py`)

speed_modulation: bool = False
soft_prompt_speeds: tuple[float, ...] = ()
soft_prompt_p: int = 0

flow_control_dim was removed. inputs_spec now declares speed=ShapeDtypeStruct([B, 1], float32) whenever modulation OR soft_prompt is enabled.

Observation schema (`src/openpi/models/model.py`)

Added speed: at.Float[ArrayT, "*b 1"] | None = None. Both from_dict and preprocess_observation (JAX) propagate it.

PI0Pytorch surgery (`src/openpi/models_pytorch/pi0_pytorch.py`)

__init__:
- When speed_modulation=True: registers speed_mod_mlp_in/out and speed_condition_mlp_in/out (replaces the old flow_control_* / flow_condition_* MLPs). Reads raw observation.speed (shape (B, 1)); no log transform is applied.
- When soft_prompt_p > 0: registers soft_prompt_tokens: nn.Parameter of shape (K, P, paligemma_width) with N(0, 0.02) init, plus a non-persistent buffer soft_prompt_anchors: tensor(K,).
_preprocess_observation: returns a 6-tuple (images, image_masks, lang_tokens, lang_masks, state, speed).
embed_prefix: accepts speed=None; when soft_prompt is enabled, computes argmin |speed − anchors| per batch element and inserts (B, P, hidden) tokens between image and language tokens with full attention. OOD speeds fall back to the nearest training anchor.
embed_suffix: accepts speed=None; when modulation is enabled, pushes raw speed through speed_mod_mlp and fuses with the timestep embedding via speed_condition_mlp.
forward, sample_actions, and denoise_step plumb speed through.

JAX Pi0 (src/openpi/models/pi0.py) was renamed in the same way for consistency: flow_control_dim → speed_modulation, flow_control_mlp_* → speed_mod_mlp_*, flow_condition_mlp_* → speed_condition_mlp_*, reads obs.speed.

Policy passthrough (`src/openpi/policies/libero_policy.py`)

LiberoInputs now passes data["speed"] through, alongside the existing flow_control passthrough.

Smoke tests (`tests/test_soft_prompt_smoke.py`)

Light tests for Pi0Config field acceptance and the argmin nearest-neighbor logic. The full end-to-end forward-pass test requires PaliGemma weights and is gated to a manual GPU run.

4. Ablation orchestration

`scripts/run_ablations.py` (new)

End-to-end orchestrator: build → norm-stats → train, per ablation.

Ablation dataclass: name, speeds, speed_integration, extra_train_args, shared_norm_key.
12 default ablations:
- Speed-set sweep (5): g1_baseline, g2_coarse, g3a_step025, g4_narrow, g5_extreme. All use text integration.
- Speed-integration sweep (3): speedint_text, speedint_modulation (with --model.flow-control-dim=1), and softprompt_p8 (reused from the P-sweep).
- Soft-prompt P-length sweep (5): softprompt_p{1,4,8,16,32}. All declare shared_norm_key="softprompt_shared" so they reuse one norm_stats.json.
For the default 12-ablation table, dedup gives 5 builds, 8 norm-stats, 12 train runs.
--only, --skip-build, --skip-norm-stats, --skip-train, --dry-run for scoped runs.

`scripts/build_ablation_datasets.py` (new)

Thin focused wrapper for the data-prep stage only. Imports the same ABLATIONS table from run_ablations.py, applies the same build dedup, exits with a summary mapping ablation names to dataset paths.

5. 8-GPU LIBERO evaluation

New files

File	Purpose
`scripts/eval_libero_speed.py`	Single-GPU LIBERO eval client: connects to a websocket policy server, runs rollouts on a chosen suite or task-id subset, sends `speed` and `speed_label` in the observation element, records per-episode `success` / `steps` / `task_id`, prints per-rank summary, and writes a JSON.
`scripts/eval_libero_8gpu.sh`	Driver: dispatches 8 parallel `eval_libero_speed.py` clients. Partition is GPU 0 → spatial / 1 → goal / 2 → object (full suites) and GPU 3-7 → libero_10 split into pairs of tasks, balancing wall-clock. After all 8 finish, auto-aggregates per-suite and global rollups (success rate, mean steps for successes, mean steps overall).

Per-episode tracking

eval_libero_speed.py records the policy steps actually executed (excluding the num_steps_wait warmup), so:

successes terminate early when env.step returns done=True and report the true step count;
failures hit max_steps for the suite (220 / 280 / 300 / 520 / 400 for spatial / object / goal / 10 / 90 respectively).

The gap between mean_steps_success and mean_steps_all is a fast read-out for failure rate at a glance: mean_steps_all rises sharply when failures push the time-limit cap.

Output

results/libero_eval_<speed>x_<ts>/
  spatial_<speed>x.json
  goal_<speed>x.json
  object_<speed>x.json
  long_t0_1_<speed>x.json   long_t2_3_<speed>x.json   ...   long_t8_9_<speed>x.json
  logs/<...>.log
  videos/<...>/<rollout>.mp4

Each per-rank JSON contains a summary block (success rate, step statistics, summary line) and a per-episode list. The driver's final output is a per-suite rollup and a global line.

6. Documentation

VARIOUS_SPEED_README.md — added §2 (action-norm profiling) and §4 (multi-process build + cleaning/replay summary outputs); §8 notes the wandb-image-logging removal.
README_ablation.md (new) — full ablation workflow doc, including: the four sweep tables, build/norm dedup behavior, the 8-GPU evaluation workflow, and the soft-prompt implementation notes.
modification_summary.md (this file).
VARIOUS_SPEED_CURRENT_PIPELINE.md — deleted (superseded).

7. Bug fixes worth flagging

7.1 `flow_control` was being silently dropped

src/openpi/models_pytorch/preprocessing_pytorch.py constructed SimpleProcessedObservation without copying flow_control. Result: in PyTorch training, observation.flow_control was always None, so the action expert's modulation MLP always received zeros.

Implication: any prior PyTorch run with the (now-removed) pi05_libero_various_speed_all_flow_prompt config did not actually use modulation — it was equivalent to the text-only path. After the later refactor, the flow_control field was eliminated entirely; the modulation path now reads observation.speed directly. The new pi05_libero_various_speed_all_modulation config replaces it.

Fix: pass speed through to SimpleProcessedObservation (the flow_control field has since been removed).

7.2 JAX `preprocess_observation` did not pass `speed`

src/openpi/models/model.py:preprocess_observation (JAX path) didn't propagate the new speed field. Even though the JAX trainer is not used for the soft_prompt sweep, the field should round-trip cleanly to keep both backends consistent. Fixed.

7.3 `--model.soft-prompt-speeds` CLI syntax

scripts/run_ablations.py initially emitted --model.soft-prompt-speeds=0.75,1,1.25,1.5 (comma-joined). Tyro parses tuple[float, ...] from space-separated argv elements (matching --eval-speed-set style). Fixed: emit the flag and each value as separate argv elements.

7.4 Hardcoded WANDB API key in `init_wandb`

A live key was hardcoded and unconditionally written to os.environ["WANDB_API_KEY"], overriding each user's own credentials and attributing all runs to one account. Removed; wandb now uses its standard auth resolution order. The committed key is exposed in git history; revoke and rotate.

8. Behavioral changes you should be aware of

speed_integration defaults to "auto", which preserves legacy behavior of the existing 3 LIBERO speed configs in config.py. New ablation configs should set speed_integration explicitly.
The legacy pi05_libero_various_speed_all_flow_prompt and pi05_libero_various_speed_all_flow_noprompt configs were removed (replaced by pi05_libero_various_speed_all_modulation). The old configs were equivalent to text-only training due to the §7.1 bug, so any checkpoints from those names are not the modulation behavior they appeared to be.
wandb no longer logs sample camera images on first batch. If you relied on that for debugging data inputs, run scripts/visualize_speed_dataset.py separately.
Per-build meta/cleaning_summary.json and meta/replay_summary.json are new artifacts. Existing downstream consumers should ignore unknown meta files; verify if you have custom tooling that reads meta/*.json.
g2_coarse and g4_narrow speeds were updated mid-session:
- g2_coarse: [0.5, 1.0, 2.0] → [0.5, 1.0, 1.5, 2.0]
- g4_narrow: [0.75, 1.0, 1.25] → [0.75, 1.0, 1.25, 1.5]
- g4_narrow now shares its dataset with the entire speed-integration sweep, so the runner builds it only once.

9. Files added / modified / deleted

# New
A  README_ablation.md
A  modification_summary.md
A  scripts/build_ablation_datasets.py
A  scripts/eval_libero_8gpu.sh
A  scripts/eval_libero_speed.py
A  scripts/profile_action_norms.py
A  scripts/run_ablations.py
A  tests/test_soft_prompt_smoke.py

# Modified
M  src/openpi/models/model.py
M  src/openpi/models/pi0.py
M  src/openpi/models/pi0_config.py
M  src/openpi/models_pytorch/pi0_pytorch.py
M  src/openpi/models_pytorch/preprocessing_pytorch.py
M  src/openpi/policies/libero_policy.py
M  src/openpi/transforms.py
M  src/openpi/training/config.py
M  src/various_speed/core.py
M  scripts/build_libero_speed_dataset.py
M  scripts/build_libero_speed_dataset_mp.py
M  scripts/compute_norm_stats.py
M  scripts/train_pytorch.py
M  VARIOUS_SPEED_README.md

# Deleted
D  VARIOUS_SPEED_CURRENT_PIPELINE.md

10. Verification still needed (manual, on GPU host)

uv run pytest tests/test_soft_prompt_smoke.py -v — config validation and nearest-neighbor logic. CPU-only, fast.
Single-batch forward pass of PI0Pytorch with soft_prompt enabled (see docstring on tests/test_soft_prompt_smoke.py::test_full_forward_pass_manual_only).
uv run python scripts/run_ablations.py ... --dry-run — visually confirm the printed CLI commands look correct, especially that --model.soft-prompt-speeds 0.75 1 1.25 1.5 is space-separated.
~50-step smoke run of softprompt_p8 on 1 GPU to confirm the model trains without shape / mask / dtype errors.
profile_action_norms.py on the source dataset, then update --clean-transl-eps / --clean-rot-eps in build commands to data-driven values before kicking off the full sweep.
eval_libero_8gpu.sh end-to-end with a single trained checkpoint, SPEED=1.0 on the in-distribution speed first to confirm 8-rank coordination works, then iterate over OOD speeds.

Modification Summary (branch 0502_mp_process)

High-level themes

1. Data processing

New files

Modified files

2. train_pytorch.py cleanup

3. Speed-integration ablation: model + config

New TrainConfig field (src/openpi/training/config.py)

New LeRobotVariousSpeedLiberoDataConfig field

New Pi0Config fields (src/openpi/models/pi0_config.py)

Observation schema (src/openpi/models/model.py)

PI0Pytorch surgery (src/openpi/models_pytorch/pi0_pytorch.py)

Policy passthrough (src/openpi/policies/libero_policy.py)

Smoke tests (tests/test_soft_prompt_smoke.py)

4. Ablation orchestration

scripts/run_ablations.py (new)

scripts/build_ablation_datasets.py (new)

5. 8-GPU LIBERO evaluation

New files

Per-episode tracking

Output

6. Documentation

7. Bug fixes worth flagging

7.1 flow_control was being silently dropped

7.2 JAX preprocess_observation did not pass speed

7.3 --model.soft-prompt-speeds CLI syntax

7.4 Hardcoded WANDB API key in init_wandb

8. Behavioral changes you should be aware of

9. Files added / modified / deleted

10. Verification still needed (manual, on GPU host)

Modification Summary (branch `0502_mp_process`)

2. `train_pytorch.py` cleanup

New TrainConfig field (`src/openpi/training/config.py`)

New Pi0Config fields (`src/openpi/models/pi0_config.py`)

Observation schema (`src/openpi/models/model.py`)

PI0Pytorch surgery (`src/openpi/models_pytorch/pi0_pytorch.py`)

Policy passthrough (`src/openpi/policies/libero_policy.py`)

Smoke tests (`tests/test_soft_prompt_smoke.py`)

`scripts/run_ablations.py` (new)

`scripts/build_ablation_datasets.py` (new)

7.1 `flow_control` was being silently dropped

7.2 JAX `preprocess_observation` did not pass `speed`

7.3 `--model.soft-prompt-speeds` CLI syntax

7.4 Hardcoded WANDB API key in `init_wandb`