VLAwithVariousSpeed / modification_summary.md

Upload folder using huggingface_hub

08ff31f verified about 2 months ago

15.2 kB

	# Modification Summary (branch `0502_mp_process`)

	Snapshot of every edit in this session, grouped by theme. Use this as a
	code-review checklist and as a record of what changed behaviorally in the
	training, evaluation, and data-prep pipelines.

	## High-level themes

	1. Multi-process dataset builder + per-build diagnostics — cleaning &
	replay summaries written to `meta/`, with per-speed integrated-motion
	error reporting.
	2. Action-norm profiler for data-driven `clean_*_eps` calibration.
	3. Speed-integration ablation framework — three model-side strategies
	(`text` / `modulation` / `soft_prompt`) selectable via a single config
	field, plus a P-length sweep on the soft-prompt arm.
	4. Ablation orchestration — `scripts/run_ablations.py` (end-to-end) and
	`scripts/build_ablation_datasets.py` (data-prep only) with build /
	norm-stats dedup.
	5. 8-GPU LIBERO evaluation — partitioned eval driver that runs
	`libero_spatial / goal / object` plus a 5-way split of `libero_10`,
	tracks per-episode step counts, and reports per-suite + global rollups.
	6. `train_pytorch.py` cleanup — removed silent NFS hang (wandb sample
	image block), removed hardcoded WANDB API key, made per-speed wandb
	breakdown config-driven.
	7. Four latent bugs fixed that were silently degrading training or
	evaluation, see §7.

	## 1. Data processing

	### New files

	\| File \| Purpose \|
	\|---\|---\|
	\| `scripts/build_libero_speed_dataset_mp.py` \| Multi-process build. Per-source-episode workers via `ProcessPoolExecutor` (`spawn`); main process does a sequential fix-up to fill the global `index` column. Same output schema as the single-process variant. CLI adds `--num-workers`. \|
	\| `scripts/profile_action_norms.py` \| Profile `‖action[:, :3]‖` and `‖action[:, 3:6]‖` distributions on the source dataset. Prints percentiles + threshold table; suggests P1 / P5 values for `--clean-transl-eps` / `--clean-rot-eps`. Optional JSON output. \|

	### Modified files

	\| File \| Change \|
	\|---\|---\|
	\| `scripts/build_libero_speed_dataset.py` \| New `_aggregate_cleaning_stats` (deduped by `source_episode_index`) and `_aggregate_replay_metrics` (per target speed). Writes `meta/cleaning_summary.json` and `meta/replay_summary.json`; prints both at end of build. \|
	\| `src/various_speed/core.py` \| `transform_episode` metrics now include `cleaned_any_frames`, `cleaned_both_frames`, plus the four ratios (`_translation_ratio`, `_rotation_ratio`, `_any_ratio`, `_both_ratio`). \|
	\| `scripts/compute_norm_stats.py` \| `main()` accepts `--repo-id` and `--asset-id` overrides. Uses `dataclasses.replace` on frozen `TrainConfig` / nested `LeRobotVariousSpeedLiberoDataConfig` / `AssetsConfig` to apply overrides without registering one TrainConfig per ablation. \|

	## 2. `train_pytorch.py` cleanup

	- Removed the wandb sample-image logging block. It created a second
	DataLoader and fetched 256 samples on the first batch, hanging silently
	for many minutes on NFS-backed datasets. Loss / lr / grad-norm wandb
	logging is unaffected.
	- Removed the hardcoded `WANDB_API_KEY` env-var assignment (security: a real
	key was committed to the repo). Auth now uses the standard wandb
	resolution order. **The leaked key is in git history; rotate it on
	wandb**.
	- Removed the `"EMA is not supported for PyTorch training"` log line
	(noise).
	- `speed_specs` (per-speed wandb loss breakdown) and `avg_flow_metrics` key
	list are now derived from `config.eval_speed_set`, no longer hardcoded
	`0p5 / 1p0 / 2p0`. (Only fires when `observation.flow_control is not None`.)

	## 3. Speed-integration ablation: model + config

	### New TrainConfig field (`src/openpi/training/config.py`)

	```python
	eval_speed_set: tuple[float, ...] = (0.5, 1.0, 2.0)
	```

	Drives the per-speed wandb breakdown in `train_pytorch.py`.

	### New LeRobotVariousSpeedLiberoDataConfig field

	```python
	speed_integration: Literal["text", "modulation", "soft_prompt", "auto"] = "auto"
	```

	A high-level switch:

	\| value \| behavior \| requirement \|
	\|---\|---\|---\|
	\| `text` \| adds `SpeedConditionedPrompt` to data transforms \| none \|
	\| `modulation` \| model reads raw `observation.speed` -> MLP -> adaRMS in action expert \| `Pi0Config.speed_modulation=True` \|
	\| `soft_prompt` \| inserts K × P learnable tokens between vision and instruction \| `Pi0Config.soft_prompt_p >= 1` and `soft_prompt_speeds` non-empty \|

	### New Pi0Config fields (`src/openpi/models/pi0_config.py`)

	```python
	speed_modulation: bool = False
	soft_prompt_speeds: tuple[float, ...] = ()
	soft_prompt_p: int = 0
	```

	`flow_control_dim` was removed. `inputs_spec` now declares
	`speed=ShapeDtypeStruct([B, 1], float32)` whenever modulation OR
	soft_prompt is enabled.

	### Observation schema (`src/openpi/models/model.py`)

	Added `speed: at.Float[ArrayT, "*b 1"] \| None = None`. Both `from_dict`
	and `preprocess_observation` (JAX) propagate it.

	### PI0Pytorch surgery (`src/openpi/models_pytorch/pi0_pytorch.py`)

	- `__init__`:
	- When `speed_modulation=True`: registers `speed_mod_mlp_in/out` and
	`speed_condition_mlp_in/out` (replaces the old `flow_control_*` /
	`flow_condition_*` MLPs). Reads raw `observation.speed`
	(shape `(B, 1)`); no log transform is applied.
	- When `soft_prompt_p > 0`: registers `soft_prompt_tokens: nn.Parameter`
	of shape `(K, P, paligemma_width)` with `N(0, 0.02)` init, plus a
	non-persistent buffer `soft_prompt_anchors: tensor(K,)`.
	- `_preprocess_observation`: returns a 6-tuple
	`(images, image_masks, lang_tokens, lang_masks, state, speed)`.
	- `embed_prefix`: accepts `speed=None`; when soft_prompt is enabled,
	computes `argmin \|speed − anchors\|` per batch element and inserts
	`(B, P, hidden)` tokens between image and language tokens with full
	attention. OOD speeds fall back to the nearest training anchor.
	- `embed_suffix`: accepts `speed=None`; when modulation is enabled,
	pushes raw speed through `speed_mod_mlp` and fuses with the timestep
	embedding via `speed_condition_mlp`.
	- `forward`, `sample_actions`, and `denoise_step` plumb `speed` through.

	JAX `Pi0` (`src/openpi/models/pi0.py`) was renamed in the same way for
	consistency: `flow_control_dim → speed_modulation`, `flow_control_mlp_*
	→ speed_mod_mlp_`, `flow_condition_mlp_ → speed_condition_mlp_*`,
	reads `obs.speed`.

	### Policy passthrough (`src/openpi/policies/libero_policy.py`)

	`LiberoInputs` now passes `data["speed"]` through, alongside the existing
	`flow_control` passthrough.

	### Smoke tests (`tests/test_soft_prompt_smoke.py`)

	Light tests for `Pi0Config` field acceptance and the argmin
	nearest-neighbor logic. The full end-to-end forward-pass test requires
	PaliGemma weights and is gated to a manual GPU run.

	## 4. Ablation orchestration

	### `scripts/run_ablations.py` (new)

	End-to-end orchestrator: build → norm-stats → train, per ablation.

	- `Ablation` dataclass: `name`, `speeds`, `speed_integration`,
	`extra_train_args`, `shared_norm_key`.
	- 12 default ablations:
	- Speed-set sweep (5): `g1_baseline`, `g2_coarse`, `g3a_step025`,
	`g4_narrow`, `g5_extreme`. All use `text` integration.
	- Speed-integration sweep (3): `speedint_text`,
	`speedint_modulation` (with `--model.flow-control-dim=1`), and
	`softprompt_p8` (reused from the P-sweep).
	- Soft-prompt P-length sweep (5): `softprompt_p{1,4,8,16,32}`. All
	declare `shared_norm_key="softprompt_shared"` so they reuse one
	`norm_stats.json`.
	- For the default 12-ablation table, dedup gives **5 builds, 8 norm-stats,
	12 train runs**.
	- `--only`, `--skip-build`, `--skip-norm-stats`, `--skip-train`,
	`--dry-run` for scoped runs.

	### `scripts/build_ablation_datasets.py` (new)

	Thin focused wrapper for the data-prep stage only. Imports the same
	`ABLATIONS` table from `run_ablations.py`, applies the same build dedup,
	exits with a summary mapping ablation names to dataset paths.

	## 5. 8-GPU LIBERO evaluation

	### New files

	\| File \| Purpose \|
	\|---\|---\|
	\| `scripts/eval_libero_speed.py` \| Single-GPU LIBERO eval client: connects to a websocket policy server, runs rollouts on a chosen suite or task-id subset, sends `speed` and `speed_label` in the observation element, records per-episode `success` / `steps` / `task_id`, prints per-rank summary, and writes a JSON. \|
	\| `scripts/eval_libero_8gpu.sh` \| Driver: dispatches 8 parallel `eval_libero_speed.py` clients. Partition is GPU 0 → spatial / 1 → goal / 2 → object (full suites) and GPU 3-7 → libero_10 split into pairs of tasks, balancing wall-clock. After all 8 finish, auto-aggregates per-suite and global rollups (success rate, mean steps for successes, mean steps overall). \|

	### Per-episode tracking

	`eval_libero_speed.py` records the policy steps actually executed
	(excluding the `num_steps_wait` warmup), so:

	- successes terminate early when `env.step` returns `done=True` and report
	the true step count;
	- failures hit `max_steps` for the suite (220 / 280 / 300 / 520 / 400 for
	spatial / object / goal / 10 / 90 respectively).

	The gap between `mean_steps_success` and `mean_steps_all` is a fast
	read-out for failure rate at a glance: `mean_steps_all` rises sharply
	when failures push the time-limit cap.

	### Output

	```
	results/libero_eval_<speed>x_<ts>/
	spatial_<speed>x.json
	goal_<speed>x.json
	object_<speed>x.json
	long_t0_1_<speed>x.json long_t2_3_<speed>x.json ... long_t8_9_<speed>x.json
	logs/<...>.log
	videos/<...>/<rollout>.mp4
	```

	Each per-rank JSON contains a `summary` block (success rate, step
	statistics, summary line) and a per-episode list. The driver's final
	output is a per-suite rollup and a global line.

	## 6. Documentation

	- `VARIOUS_SPEED_README.md` — added §2 (action-norm profiling) and §4
	(multi-process build + cleaning/replay summary outputs); §8 notes the
	wandb-image-logging removal.
	- `README_ablation.md` (new) — full ablation workflow doc, including:
	the four sweep tables, build/norm dedup behavior, the 8-GPU evaluation
	workflow, and the soft-prompt implementation notes.
	- `modification_summary.md` (this file).
	- `VARIOUS_SPEED_CURRENT_PIPELINE.md` — deleted (superseded).

	## 7. Bug fixes worth flagging

	### 7.1 `flow_control` was being silently dropped

	`src/openpi/models_pytorch/preprocessing_pytorch.py` constructed
	`SimpleProcessedObservation` without copying `flow_control`. Result: in
	PyTorch training, `observation.flow_control` was always `None`, so the
	action expert's modulation MLP always received zeros.

	Implication: any prior PyTorch run with the (now-removed)
	`pi05_libero_various_speed_all_flow_prompt` config did not actually
	use modulation — it was equivalent to the text-only path. After the
	later refactor, the `flow_control` field was eliminated entirely; the
	modulation path now reads `observation.speed` directly. The new
	`pi05_libero_various_speed_all_modulation` config replaces it.

	Fix: pass `speed` through to `SimpleProcessedObservation` (the
	`flow_control` field has since been removed).

	### 7.2 JAX `preprocess_observation` did not pass `speed`

	`src/openpi/models/model.py:preprocess_observation` (JAX path) didn't
	propagate the new `speed` field. Even though the JAX trainer is not used
	for the soft_prompt sweep, the field should round-trip cleanly to keep
	both backends consistent. Fixed.

	### 7.3 `--model.soft-prompt-speeds` CLI syntax

	`scripts/run_ablations.py` initially emitted
	`--model.soft-prompt-speeds=0.75,1,1.25,1.5` (comma-joined). Tyro parses
	`tuple[float, ...]` from space-separated argv elements (matching
	`--eval-speed-set` style). Fixed: emit the flag and each value as
	separate argv elements.

	### 7.4 Hardcoded WANDB API key in `init_wandb`

	A live key was hardcoded and unconditionally written to
	`os.environ["WANDB_API_KEY"]`, overriding each user's own credentials and
	attributing all runs to one account. Removed; wandb now uses its standard
	auth resolution order. **The committed key is exposed in git history;
	revoke and rotate**.

	## 8. Behavioral changes you should be aware of

	- `speed_integration` defaults to `"auto"`, which preserves legacy
	behavior of the existing 3 LIBERO speed configs in `config.py`. New
	ablation configs should set `speed_integration` explicitly.
	- **The legacy `pi05_libero_various_speed_all_flow_prompt` and
	`pi05_libero_various_speed_all_flow_noprompt` configs were removed**
	(replaced by `pi05_libero_various_speed_all_modulation`). The old
	configs were equivalent to text-only training due to the §7.1 bug, so
	any checkpoints from those names are not the modulation behavior they
	appeared to be.
	- wandb no longer logs sample camera images on first batch. If you
	relied on that for debugging data inputs, run
	`scripts/visualize_speed_dataset.py` separately.
	- Per-build `meta/cleaning_summary.json` and `meta/replay_summary.json`
	are new artifacts. Existing downstream consumers should ignore unknown
	meta files; verify if you have custom tooling that reads `meta/*.json`.
	- `g2_coarse` and `g4_narrow` speeds were updated mid-session:
	- `g2_coarse`: `[0.5, 1.0, 2.0]` → `[0.5, 1.0, 1.5, 2.0]`
	- `g4_narrow`: `[0.75, 1.0, 1.25]` → `[0.75, 1.0, 1.25, 1.5]`
	- `g4_narrow` now shares its dataset with the entire speed-integration
	sweep, so the runner builds it only once.

	## 9. Files added / modified / deleted

	```
	# New
	A README_ablation.md
	A modification_summary.md
	A scripts/build_ablation_datasets.py
	A scripts/eval_libero_8gpu.sh
	A scripts/eval_libero_speed.py
	A scripts/profile_action_norms.py
	A scripts/run_ablations.py
	A tests/test_soft_prompt_smoke.py

	# Modified
	M src/openpi/models/model.py
	M src/openpi/models/pi0.py
	M src/openpi/models/pi0_config.py
	M src/openpi/models_pytorch/pi0_pytorch.py
	M src/openpi/models_pytorch/preprocessing_pytorch.py
	M src/openpi/policies/libero_policy.py
	M src/openpi/transforms.py
	M src/openpi/training/config.py
	M src/various_speed/core.py
	M scripts/build_libero_speed_dataset.py
	M scripts/build_libero_speed_dataset_mp.py
	M scripts/compute_norm_stats.py
	M scripts/train_pytorch.py
	M VARIOUS_SPEED_README.md

	# Deleted
	D VARIOUS_SPEED_CURRENT_PIPELINE.md
	```

	## 10. Verification still needed (manual, on GPU host)

	1. `uv run pytest tests/test_soft_prompt_smoke.py -v` — config validation
	and nearest-neighbor logic. CPU-only, fast.
	2. Single-batch forward pass of `PI0Pytorch` with soft_prompt enabled
	(see docstring on
	`tests/test_soft_prompt_smoke.py::test_full_forward_pass_manual_only`).
	3. `uv run python scripts/run_ablations.py ... --dry-run` — visually
	confirm the printed CLI commands look correct, especially that
	`--model.soft-prompt-speeds 0.75 1 1.25 1.5` is space-separated.
	4. ~50-step smoke run of `softprompt_p8` on 1 GPU to confirm the model
	trains without shape / mask / dtype errors.
	5. `profile_action_norms.py` on the source dataset, then update
	`--clean-transl-eps` / `--clean-rot-eps` in build commands to
	data-driven values before kicking off the full sweep.
	6. `eval_libero_8gpu.sh` end-to-end with a single trained checkpoint,
	`SPEED=1.0` on the in-distribution speed first to confirm 8-rank
	coordination works, then iterate over OOD speeds.