VLAwithVariousSpeed / modification_summary.md
Alan0928's picture
Upload folder using huggingface_hub
08ff31f verified
|
Raw
History Blame Contribute Delete
15.2 kB
# Modification Summary (branch `0502_mp_process`)
Snapshot of every edit in this session, grouped by theme. Use this as a
code-review checklist and as a record of what changed *behaviorally* in the
training, evaluation, and data-prep pipelines.
## High-level themes
1. **Multi-process dataset builder + per-build diagnostics** — cleaning &
replay summaries written to `meta/`, with per-speed integrated-motion
error reporting.
2. **Action-norm profiler** for data-driven `clean_*_eps` calibration.
3. **Speed-integration ablation framework** — three model-side strategies
(`text` / `modulation` / `soft_prompt`) selectable via a single config
field, plus a P-length sweep on the soft-prompt arm.
4. **Ablation orchestration**`scripts/run_ablations.py` (end-to-end) and
`scripts/build_ablation_datasets.py` (data-prep only) with build /
norm-stats dedup.
5. **8-GPU LIBERO evaluation** — partitioned eval driver that runs
`libero_spatial / goal / object` plus a 5-way split of `libero_10`,
tracks per-episode step counts, and reports per-suite + global rollups.
6. **`train_pytorch.py` cleanup** — removed silent NFS hang (wandb sample
image block), removed hardcoded WANDB API key, made per-speed wandb
breakdown config-driven.
7. **Four latent bugs fixed** that were silently degrading training or
evaluation, see §7.
## 1. Data processing
### New files
| File | Purpose |
|---|---|
| `scripts/build_libero_speed_dataset_mp.py` | Multi-process build. Per-source-episode workers via `ProcessPoolExecutor` (`spawn`); main process does a sequential fix-up to fill the global `index` column. Same output schema as the single-process variant. CLI adds `--num-workers`. |
| `scripts/profile_action_norms.py` | Profile `‖action[:, :3]‖` and `‖action[:, 3:6]‖` distributions on the source dataset. Prints percentiles + threshold table; suggests P1 / P5 values for `--clean-transl-eps` / `--clean-rot-eps`. Optional JSON output. |
### Modified files
| File | Change |
|---|---|
| `scripts/build_libero_speed_dataset.py` | New `_aggregate_cleaning_stats` (deduped by `source_episode_index`) and `_aggregate_replay_metrics` (per target speed). Writes `meta/cleaning_summary.json` and `meta/replay_summary.json`; prints both at end of build. |
| `src/various_speed/core.py` | `transform_episode` metrics now include `cleaned_any_frames`, `cleaned_both_frames`, plus the four ratios (`*_translation_ratio`, `*_rotation_ratio`, `*_any_ratio`, `*_both_ratio`). |
| `scripts/compute_norm_stats.py` | `main()` accepts `--repo-id` and `--asset-id` overrides. Uses `dataclasses.replace` on frozen `TrainConfig` / nested `LeRobotVariousSpeedLiberoDataConfig` / `AssetsConfig` to apply overrides without registering one TrainConfig per ablation. |
## 2. `train_pytorch.py` cleanup
- Removed the wandb sample-image logging block. It created a second
DataLoader and fetched 256 samples on the first batch, hanging silently
for many minutes on NFS-backed datasets. Loss / lr / grad-norm wandb
logging is unaffected.
- Removed the hardcoded `WANDB_API_KEY` env-var assignment (security: a real
key was committed to the repo). Auth now uses the standard wandb
resolution order. **The leaked key is in git history; rotate it on
wandb**.
- Removed the `"EMA is not supported for PyTorch training"` log line
(noise).
- `speed_specs` (per-speed wandb loss breakdown) and `avg_flow_metrics` key
list are now derived from `config.eval_speed_set`, no longer hardcoded
`0p5 / 1p0 / 2p0`. (Only fires when `observation.flow_control is not None`.)
## 3. Speed-integration ablation: model + config
### New TrainConfig field (`src/openpi/training/config.py`)
```python
eval_speed_set: tuple[float, ...] = (0.5, 1.0, 2.0)
```
Drives the per-speed wandb breakdown in `train_pytorch.py`.
### New LeRobotVariousSpeedLiberoDataConfig field
```python
speed_integration: Literal["text", "modulation", "soft_prompt", "auto"] = "auto"
```
A high-level switch:
| value | behavior | requirement |
|---|---|---|
| `text` | adds `SpeedConditionedPrompt` to data transforms | none |
| `modulation` | model reads raw `observation.speed` -> MLP -> adaRMS in action expert | `Pi0Config.speed_modulation=True` |
| `soft_prompt` | inserts K × P learnable tokens between vision and instruction | `Pi0Config.soft_prompt_p >= 1` and `soft_prompt_speeds` non-empty |
### New Pi0Config fields (`src/openpi/models/pi0_config.py`)
```python
speed_modulation: bool = False
soft_prompt_speeds: tuple[float, ...] = ()
soft_prompt_p: int = 0
```
`flow_control_dim` was removed. `inputs_spec` now declares
`speed=ShapeDtypeStruct([B, 1], float32)` whenever modulation OR
soft_prompt is enabled.
### Observation schema (`src/openpi/models/model.py`)
Added `speed: at.Float[ArrayT, "*b 1"] | None = None`. Both `from_dict`
and `preprocess_observation` (JAX) propagate it.
### PI0Pytorch surgery (`src/openpi/models_pytorch/pi0_pytorch.py`)
- `__init__`:
- When `speed_modulation=True`: registers `speed_mod_mlp_in/out` and
`speed_condition_mlp_in/out` (replaces the old `flow_control_*` /
`flow_condition_*` MLPs). Reads raw `observation.speed`
(shape `(B, 1)`); no log transform is applied.
- When `soft_prompt_p > 0`: registers `soft_prompt_tokens: nn.Parameter`
of shape `(K, P, paligemma_width)` with `N(0, 0.02)` init, plus a
non-persistent buffer `soft_prompt_anchors: tensor(K,)`.
- `_preprocess_observation`: returns a 6-tuple
`(images, image_masks, lang_tokens, lang_masks, state, speed)`.
- `embed_prefix`: accepts `speed=None`; when soft_prompt is enabled,
computes `argmin |speed − anchors|` per batch element and inserts
`(B, P, hidden)` tokens between image and language tokens with full
attention. OOD speeds fall back to the nearest training anchor.
- `embed_suffix`: accepts `speed=None`; when modulation is enabled,
pushes raw speed through `speed_mod_mlp` and fuses with the timestep
embedding via `speed_condition_mlp`.
- `forward`, `sample_actions`, and `denoise_step` plumb `speed` through.
JAX `Pi0` (`src/openpi/models/pi0.py`) was renamed in the same way for
consistency: `flow_control_dim → speed_modulation`, `flow_control_mlp_*
→ speed_mod_mlp_*`, `flow_condition_mlp_* → speed_condition_mlp_*`,
reads `obs.speed`.
### Policy passthrough (`src/openpi/policies/libero_policy.py`)
`LiberoInputs` now passes `data["speed"]` through, alongside the existing
`flow_control` passthrough.
### Smoke tests (`tests/test_soft_prompt_smoke.py`)
Light tests for `Pi0Config` field acceptance and the argmin
nearest-neighbor logic. The full end-to-end forward-pass test requires
PaliGemma weights and is gated to a manual GPU run.
## 4. Ablation orchestration
### `scripts/run_ablations.py` (new)
End-to-end orchestrator: build → norm-stats → train, per ablation.
- `Ablation` dataclass: `name`, `speeds`, `speed_integration`,
`extra_train_args`, `shared_norm_key`.
- 12 default ablations:
- **Speed-set sweep** (5): `g1_baseline`, `g2_coarse`, `g3a_step025`,
`g4_narrow`, `g5_extreme`. All use `text` integration.
- **Speed-integration sweep** (3): `speedint_text`,
`speedint_modulation` (with `--model.flow-control-dim=1`), and
`softprompt_p8` (reused from the P-sweep).
- **Soft-prompt P-length sweep** (5): `softprompt_p{1,4,8,16,32}`. All
declare `shared_norm_key="softprompt_shared"` so they reuse one
`norm_stats.json`.
- For the default 12-ablation table, dedup gives **5 builds, 8 norm-stats,
12 train runs**.
- `--only`, `--skip-build`, `--skip-norm-stats`, `--skip-train`,
`--dry-run` for scoped runs.
### `scripts/build_ablation_datasets.py` (new)
Thin focused wrapper for the data-prep stage only. Imports the same
`ABLATIONS` table from `run_ablations.py`, applies the same build dedup,
exits with a summary mapping ablation names to dataset paths.
## 5. 8-GPU LIBERO evaluation
### New files
| File | Purpose |
|---|---|
| `scripts/eval_libero_speed.py` | Single-GPU LIBERO eval client: connects to a websocket policy server, runs rollouts on a chosen suite or task-id subset, sends `speed` and `speed_label` in the observation element, records per-episode `success` / `steps` / `task_id`, prints per-rank summary, and writes a JSON. |
| `scripts/eval_libero_8gpu.sh` | Driver: dispatches 8 parallel `eval_libero_speed.py` clients. Partition is **GPU 0 → spatial / 1 → goal / 2 → object** (full suites) and **GPU 3-7 → libero_10 split into pairs of tasks**, balancing wall-clock. After all 8 finish, auto-aggregates per-suite and global rollups (success rate, mean steps for successes, mean steps overall). |
### Per-episode tracking
`eval_libero_speed.py` records the **policy steps actually executed**
(excluding the `num_steps_wait` warmup), so:
- successes terminate early when `env.step` returns `done=True` and report
the true step count;
- failures hit `max_steps` for the suite (220 / 280 / 300 / 520 / 400 for
spatial / object / goal / 10 / 90 respectively).
The gap between `mean_steps_success` and `mean_steps_all` is a fast
read-out for failure rate at a glance: `mean_steps_all` rises sharply
when failures push the time-limit cap.
### Output
```
results/libero_eval_<speed>x_<ts>/
spatial_<speed>x.json
goal_<speed>x.json
object_<speed>x.json
long_t0_1_<speed>x.json long_t2_3_<speed>x.json ... long_t8_9_<speed>x.json
logs/<...>.log
videos/<...>/<rollout>.mp4
```
Each per-rank JSON contains a `summary` block (success rate, step
statistics, summary line) and a per-episode list. The driver's final
output is a per-suite rollup and a global line.
## 6. Documentation
- `VARIOUS_SPEED_README.md` — added §2 (action-norm profiling) and §4
(multi-process build + cleaning/replay summary outputs); §8 notes the
wandb-image-logging removal.
- `README_ablation.md` (new) — full ablation workflow doc, including:
the four sweep tables, build/norm dedup behavior, the 8-GPU evaluation
workflow, and the soft-prompt implementation notes.
- `modification_summary.md` (this file).
- `VARIOUS_SPEED_CURRENT_PIPELINE.md` — deleted (superseded).
## 7. Bug fixes worth flagging
### 7.1 `flow_control` was being silently dropped
`src/openpi/models_pytorch/preprocessing_pytorch.py` constructed
`SimpleProcessedObservation` without copying `flow_control`. Result: in
PyTorch training, `observation.flow_control` was always `None`, so the
action expert's modulation MLP always received zeros.
**Implication**: any prior PyTorch run with the (now-removed)
`pi05_libero_various_speed_all_flow_prompt` config did **not** actually
use modulation — it was equivalent to the text-only path. After the
later refactor, the `flow_control` field was eliminated entirely; the
modulation path now reads `observation.speed` directly. The new
`pi05_libero_various_speed_all_modulation` config replaces it.
Fix: pass `speed` through to `SimpleProcessedObservation` (the
`flow_control` field has since been removed).
### 7.2 JAX `preprocess_observation` did not pass `speed`
`src/openpi/models/model.py:preprocess_observation` (JAX path) didn't
propagate the new `speed` field. Even though the JAX trainer is not used
for the soft_prompt sweep, the field should round-trip cleanly to keep
both backends consistent. Fixed.
### 7.3 `--model.soft-prompt-speeds` CLI syntax
`scripts/run_ablations.py` initially emitted
`--model.soft-prompt-speeds=0.75,1,1.25,1.5` (comma-joined). Tyro parses
`tuple[float, ...]` from space-separated argv elements (matching
`--eval-speed-set` style). Fixed: emit the flag and each value as
separate argv elements.
### 7.4 Hardcoded WANDB API key in `init_wandb`
A live key was hardcoded and unconditionally written to
`os.environ["WANDB_API_KEY"]`, overriding each user's own credentials and
attributing all runs to one account. Removed; wandb now uses its standard
auth resolution order. **The committed key is exposed in git history;
revoke and rotate**.
## 8. Behavioral changes you should be aware of
- **`speed_integration` defaults to `"auto"`**, which preserves legacy
behavior of the existing 3 LIBERO speed configs in `config.py`. New
ablation configs should set `speed_integration` explicitly.
- **The legacy `pi05_libero_various_speed_all_flow_prompt` and
`pi05_libero_various_speed_all_flow_noprompt` configs were removed**
(replaced by `pi05_libero_various_speed_all_modulation`). The old
configs were equivalent to text-only training due to the §7.1 bug, so
any checkpoints from those names are not the modulation behavior they
appeared to be.
- **wandb no longer logs sample camera images on first batch**. If you
relied on that for debugging data inputs, run
`scripts/visualize_speed_dataset.py` separately.
- **Per-build `meta/cleaning_summary.json` and `meta/replay_summary.json`**
are new artifacts. Existing downstream consumers should ignore unknown
meta files; verify if you have custom tooling that reads `meta/*.json`.
- **`g2_coarse` and `g4_narrow` speeds were updated mid-session**:
- `g2_coarse`: `[0.5, 1.0, 2.0]``[0.5, 1.0, 1.5, 2.0]`
- `g4_narrow`: `[0.75, 1.0, 1.25]``[0.75, 1.0, 1.25, 1.5]`
- `g4_narrow` now shares its dataset with the entire speed-integration
sweep, so the runner builds it only once.
## 9. Files added / modified / deleted
```
# New
A README_ablation.md
A modification_summary.md
A scripts/build_ablation_datasets.py
A scripts/eval_libero_8gpu.sh
A scripts/eval_libero_speed.py
A scripts/profile_action_norms.py
A scripts/run_ablations.py
A tests/test_soft_prompt_smoke.py
# Modified
M src/openpi/models/model.py
M src/openpi/models/pi0.py
M src/openpi/models/pi0_config.py
M src/openpi/models_pytorch/pi0_pytorch.py
M src/openpi/models_pytorch/preprocessing_pytorch.py
M src/openpi/policies/libero_policy.py
M src/openpi/transforms.py
M src/openpi/training/config.py
M src/various_speed/core.py
M scripts/build_libero_speed_dataset.py
M scripts/build_libero_speed_dataset_mp.py
M scripts/compute_norm_stats.py
M scripts/train_pytorch.py
M VARIOUS_SPEED_README.md
# Deleted
D VARIOUS_SPEED_CURRENT_PIPELINE.md
```
## 10. Verification still needed (manual, on GPU host)
1. `uv run pytest tests/test_soft_prompt_smoke.py -v` — config validation
and nearest-neighbor logic. CPU-only, fast.
2. Single-batch forward pass of `PI0Pytorch` with soft_prompt enabled
(see docstring on
`tests/test_soft_prompt_smoke.py::test_full_forward_pass_manual_only`).
3. `uv run python scripts/run_ablations.py ... --dry-run` — visually
confirm the printed CLI commands look correct, especially that
`--model.soft-prompt-speeds 0.75 1 1.25 1.5` is space-separated.
4. ~50-step smoke run of `softprompt_p8` on 1 GPU to confirm the model
trains without shape / mask / dtype errors.
5. `profile_action_norms.py` on the source dataset, then update
`--clean-transl-eps` / `--clean-rot-eps` in build commands to
data-driven values before kicking off the full sweep.
6. `eval_libero_8gpu.sh` end-to-end with a single trained checkpoint,
`SPEED=1.0` on the in-distribution speed first to confirm 8-rank
coordination works, then iterate over OOD speeds.