File size: 15,215 Bytes
08ff31f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
# Modification Summary (branch `0502_mp_process`)

Snapshot of every edit in this session, grouped by theme. Use this as a
code-review checklist and as a record of what changed *behaviorally* in the
training, evaluation, and data-prep pipelines.

## High-level themes

1. **Multi-process dataset builder + per-build diagnostics** β€” cleaning &
   replay summaries written to `meta/`, with per-speed integrated-motion
   error reporting.
2. **Action-norm profiler** for data-driven `clean_*_eps` calibration.
3. **Speed-integration ablation framework** β€” three model-side strategies
   (`text` / `modulation` / `soft_prompt`) selectable via a single config
   field, plus a P-length sweep on the soft-prompt arm.
4. **Ablation orchestration** β€” `scripts/run_ablations.py` (end-to-end) and
   `scripts/build_ablation_datasets.py` (data-prep only) with build /
   norm-stats dedup.
5. **8-GPU LIBERO evaluation** β€” partitioned eval driver that runs
   `libero_spatial / goal / object` plus a 5-way split of `libero_10`,
   tracks per-episode step counts, and reports per-suite + global rollups.
6. **`train_pytorch.py` cleanup** β€” removed silent NFS hang (wandb sample
   image block), removed hardcoded WANDB API key, made per-speed wandb
   breakdown config-driven.
7. **Four latent bugs fixed** that were silently degrading training or
   evaluation, see Β§7.

## 1. Data processing

### New files

| File | Purpose |
|---|---|
| `scripts/build_libero_speed_dataset_mp.py` | Multi-process build. Per-source-episode workers via `ProcessPoolExecutor` (`spawn`); main process does a sequential fix-up to fill the global `index` column. Same output schema as the single-process variant. CLI adds `--num-workers`. |
| `scripts/profile_action_norms.py` | Profile `β€–action[:, :3]β€–` and `β€–action[:, 3:6]β€–` distributions on the source dataset. Prints percentiles + threshold table; suggests P1 / P5 values for `--clean-transl-eps` / `--clean-rot-eps`. Optional JSON output. |

### Modified files

| File | Change |
|---|---|
| `scripts/build_libero_speed_dataset.py` | New `_aggregate_cleaning_stats` (deduped by `source_episode_index`) and `_aggregate_replay_metrics` (per target speed). Writes `meta/cleaning_summary.json` and `meta/replay_summary.json`; prints both at end of build. |
| `src/various_speed/core.py` | `transform_episode` metrics now include `cleaned_any_frames`, `cleaned_both_frames`, plus the four ratios (`*_translation_ratio`, `*_rotation_ratio`, `*_any_ratio`, `*_both_ratio`). |
| `scripts/compute_norm_stats.py` | `main()` accepts `--repo-id` and `--asset-id` overrides. Uses `dataclasses.replace` on frozen `TrainConfig` / nested `LeRobotVariousSpeedLiberoDataConfig` / `AssetsConfig` to apply overrides without registering one TrainConfig per ablation. |

## 2. `train_pytorch.py` cleanup

- Removed the wandb sample-image logging block. It created a second
  DataLoader and fetched 256 samples on the first batch, hanging silently
  for many minutes on NFS-backed datasets. Loss / lr / grad-norm wandb
  logging is unaffected.
- Removed the hardcoded `WANDB_API_KEY` env-var assignment (security: a real
  key was committed to the repo). Auth now uses the standard wandb
  resolution order. **The leaked key is in git history; rotate it on
  wandb**.
- Removed the `"EMA is not supported for PyTorch training"` log line
  (noise).
- `speed_specs` (per-speed wandb loss breakdown) and `avg_flow_metrics` key
  list are now derived from `config.eval_speed_set`, no longer hardcoded
  `0p5 / 1p0 / 2p0`. (Only fires when `observation.flow_control is not None`.)

## 3. Speed-integration ablation: model + config

### New TrainConfig field (`src/openpi/training/config.py`)

```python
eval_speed_set: tuple[float, ...] = (0.5, 1.0, 2.0)
```

Drives the per-speed wandb breakdown in `train_pytorch.py`.

### New LeRobotVariousSpeedLiberoDataConfig field

```python
speed_integration: Literal["text", "modulation", "soft_prompt", "auto"] = "auto"
```

A high-level switch:

| value | behavior | requirement |
|---|---|---|
| `text` | adds `SpeedConditionedPrompt` to data transforms | none |
| `modulation` | model reads raw `observation.speed` -> MLP -> adaRMS in action expert | `Pi0Config.speed_modulation=True` |
| `soft_prompt` | inserts K Γ— P learnable tokens between vision and instruction | `Pi0Config.soft_prompt_p >= 1` and `soft_prompt_speeds` non-empty |

### New Pi0Config fields (`src/openpi/models/pi0_config.py`)

```python
speed_modulation: bool = False
soft_prompt_speeds: tuple[float, ...] = ()
soft_prompt_p: int = 0
```

`flow_control_dim` was removed. `inputs_spec` now declares
`speed=ShapeDtypeStruct([B, 1], float32)` whenever modulation OR
soft_prompt is enabled.

### Observation schema (`src/openpi/models/model.py`)

Added `speed: at.Float[ArrayT, "*b 1"] | None = None`. Both `from_dict`
and `preprocess_observation` (JAX) propagate it.

### PI0Pytorch surgery (`src/openpi/models_pytorch/pi0_pytorch.py`)

- `__init__`:
  - When `speed_modulation=True`: registers `speed_mod_mlp_in/out` and
    `speed_condition_mlp_in/out` (replaces the old `flow_control_*` /
    `flow_condition_*` MLPs). Reads raw `observation.speed`
    (shape `(B, 1)`); no log transform is applied.
  - When `soft_prompt_p > 0`: registers `soft_prompt_tokens: nn.Parameter`
    of shape `(K, P, paligemma_width)` with `N(0, 0.02)` init, plus a
    non-persistent buffer `soft_prompt_anchors: tensor(K,)`.
- `_preprocess_observation`: returns a 6-tuple
  `(images, image_masks, lang_tokens, lang_masks, state, speed)`.
- `embed_prefix`: accepts `speed=None`; when soft_prompt is enabled,
  computes `argmin |speed βˆ’ anchors|` per batch element and inserts
  `(B, P, hidden)` tokens between image and language tokens with full
  attention. OOD speeds fall back to the nearest training anchor.
- `embed_suffix`: accepts `speed=None`; when modulation is enabled,
  pushes raw speed through `speed_mod_mlp` and fuses with the timestep
  embedding via `speed_condition_mlp`.
- `forward`, `sample_actions`, and `denoise_step` plumb `speed` through.

JAX `Pi0` (`src/openpi/models/pi0.py`) was renamed in the same way for
consistency: `flow_control_dim β†’ speed_modulation`, `flow_control_mlp_*
β†’ speed_mod_mlp_*`, `flow_condition_mlp_* β†’ speed_condition_mlp_*`,
reads `obs.speed`.

### Policy passthrough (`src/openpi/policies/libero_policy.py`)

`LiberoInputs` now passes `data["speed"]` through, alongside the existing
`flow_control` passthrough.

### Smoke tests (`tests/test_soft_prompt_smoke.py`)

Light tests for `Pi0Config` field acceptance and the argmin
nearest-neighbor logic. The full end-to-end forward-pass test requires
PaliGemma weights and is gated to a manual GPU run.

## 4. Ablation orchestration

### `scripts/run_ablations.py` (new)

End-to-end orchestrator: build β†’ norm-stats β†’ train, per ablation.

- `Ablation` dataclass: `name`, `speeds`, `speed_integration`,
  `extra_train_args`, `shared_norm_key`.
- 12 default ablations:
  - **Speed-set sweep** (5): `g1_baseline`, `g2_coarse`, `g3a_step025`,
    `g4_narrow`, `g5_extreme`. All use `text` integration.
  - **Speed-integration sweep** (3): `speedint_text`,
    `speedint_modulation` (with `--model.flow-control-dim=1`), and
    `softprompt_p8` (reused from the P-sweep).
  - **Soft-prompt P-length sweep** (5): `softprompt_p{1,4,8,16,32}`. All
    declare `shared_norm_key="softprompt_shared"` so they reuse one
    `norm_stats.json`.
- For the default 12-ablation table, dedup gives **5 builds, 8 norm-stats,
  12 train runs**.
- `--only`, `--skip-build`, `--skip-norm-stats`, `--skip-train`,
  `--dry-run` for scoped runs.

### `scripts/build_ablation_datasets.py` (new)

Thin focused wrapper for the data-prep stage only. Imports the same
`ABLATIONS` table from `run_ablations.py`, applies the same build dedup,
exits with a summary mapping ablation names to dataset paths.

## 5. 8-GPU LIBERO evaluation

### New files

| File | Purpose |
|---|---|
| `scripts/eval_libero_speed.py` | Single-GPU LIBERO eval client: connects to a websocket policy server, runs rollouts on a chosen suite or task-id subset, sends `speed` and `speed_label` in the observation element, records per-episode `success` / `steps` / `task_id`, prints per-rank summary, and writes a JSON. |
| `scripts/eval_libero_8gpu.sh` | Driver: dispatches 8 parallel `eval_libero_speed.py` clients. Partition is **GPU 0 β†’ spatial / 1 β†’ goal / 2 β†’ object** (full suites) and **GPU 3-7 β†’ libero_10 split into pairs of tasks**, balancing wall-clock. After all 8 finish, auto-aggregates per-suite and global rollups (success rate, mean steps for successes, mean steps overall). |

### Per-episode tracking

`eval_libero_speed.py` records the **policy steps actually executed**
(excluding the `num_steps_wait` warmup), so:

- successes terminate early when `env.step` returns `done=True` and report
  the true step count;
- failures hit `max_steps` for the suite (220 / 280 / 300 / 520 / 400 for
  spatial / object / goal / 10 / 90 respectively).

The gap between `mean_steps_success` and `mean_steps_all` is a fast
read-out for failure rate at a glance: `mean_steps_all` rises sharply
when failures push the time-limit cap.

### Output

```
results/libero_eval_<speed>x_<ts>/
  spatial_<speed>x.json
  goal_<speed>x.json
  object_<speed>x.json
  long_t0_1_<speed>x.json   long_t2_3_<speed>x.json   ...   long_t8_9_<speed>x.json
  logs/<...>.log
  videos/<...>/<rollout>.mp4
```

Each per-rank JSON contains a `summary` block (success rate, step
statistics, summary line) and a per-episode list. The driver's final
output is a per-suite rollup and a global line.

## 6. Documentation

- `VARIOUS_SPEED_README.md` β€” added Β§2 (action-norm profiling) and Β§4
  (multi-process build + cleaning/replay summary outputs); Β§8 notes the
  wandb-image-logging removal.
- `README_ablation.md` (new) β€” full ablation workflow doc, including:
  the four sweep tables, build/norm dedup behavior, the 8-GPU evaluation
  workflow, and the soft-prompt implementation notes.
- `modification_summary.md` (this file).
- `VARIOUS_SPEED_CURRENT_PIPELINE.md` β€” deleted (superseded).

## 7. Bug fixes worth flagging

### 7.1 `flow_control` was being silently dropped

`src/openpi/models_pytorch/preprocessing_pytorch.py` constructed
`SimpleProcessedObservation` without copying `flow_control`. Result: in
PyTorch training, `observation.flow_control` was always `None`, so the
action expert's modulation MLP always received zeros.

**Implication**: any prior PyTorch run with the (now-removed)
`pi05_libero_various_speed_all_flow_prompt` config did **not** actually
use modulation β€” it was equivalent to the text-only path. After the
later refactor, the `flow_control` field was eliminated entirely; the
modulation path now reads `observation.speed` directly. The new
`pi05_libero_various_speed_all_modulation` config replaces it.

Fix: pass `speed` through to `SimpleProcessedObservation` (the
`flow_control` field has since been removed).

### 7.2 JAX `preprocess_observation` did not pass `speed`

`src/openpi/models/model.py:preprocess_observation` (JAX path) didn't
propagate the new `speed` field. Even though the JAX trainer is not used
for the soft_prompt sweep, the field should round-trip cleanly to keep
both backends consistent. Fixed.

### 7.3 `--model.soft-prompt-speeds` CLI syntax

`scripts/run_ablations.py` initially emitted
`--model.soft-prompt-speeds=0.75,1,1.25,1.5` (comma-joined). Tyro parses
`tuple[float, ...]` from space-separated argv elements (matching
`--eval-speed-set` style). Fixed: emit the flag and each value as
separate argv elements.

### 7.4 Hardcoded WANDB API key in `init_wandb`

A live key was hardcoded and unconditionally written to
`os.environ["WANDB_API_KEY"]`, overriding each user's own credentials and
attributing all runs to one account. Removed; wandb now uses its standard
auth resolution order. **The committed key is exposed in git history;
revoke and rotate**.

## 8. Behavioral changes you should be aware of

- **`speed_integration` defaults to `"auto"`**, which preserves legacy
  behavior of the existing 3 LIBERO speed configs in `config.py`. New
  ablation configs should set `speed_integration` explicitly.
- **The legacy `pi05_libero_various_speed_all_flow_prompt` and
  `pi05_libero_various_speed_all_flow_noprompt` configs were removed**
  (replaced by `pi05_libero_various_speed_all_modulation`). The old
  configs were equivalent to text-only training due to the Β§7.1 bug, so
  any checkpoints from those names are not the modulation behavior they
  appeared to be.
- **wandb no longer logs sample camera images on first batch**. If you
  relied on that for debugging data inputs, run
  `scripts/visualize_speed_dataset.py` separately.
- **Per-build `meta/cleaning_summary.json` and `meta/replay_summary.json`**
  are new artifacts. Existing downstream consumers should ignore unknown
  meta files; verify if you have custom tooling that reads `meta/*.json`.
- **`g2_coarse` and `g4_narrow` speeds were updated mid-session**:
  - `g2_coarse`: `[0.5, 1.0, 2.0]` β†’ `[0.5, 1.0, 1.5, 2.0]`
  - `g4_narrow`: `[0.75, 1.0, 1.25]` β†’ `[0.75, 1.0, 1.25, 1.5]`
  - `g4_narrow` now shares its dataset with the entire speed-integration
    sweep, so the runner builds it only once.

## 9. Files added / modified / deleted

```
# New
A  README_ablation.md
A  modification_summary.md
A  scripts/build_ablation_datasets.py
A  scripts/eval_libero_8gpu.sh
A  scripts/eval_libero_speed.py
A  scripts/profile_action_norms.py
A  scripts/run_ablations.py
A  tests/test_soft_prompt_smoke.py

# Modified
M  src/openpi/models/model.py
M  src/openpi/models/pi0.py
M  src/openpi/models/pi0_config.py
M  src/openpi/models_pytorch/pi0_pytorch.py
M  src/openpi/models_pytorch/preprocessing_pytorch.py
M  src/openpi/policies/libero_policy.py
M  src/openpi/transforms.py
M  src/openpi/training/config.py
M  src/various_speed/core.py
M  scripts/build_libero_speed_dataset.py
M  scripts/build_libero_speed_dataset_mp.py
M  scripts/compute_norm_stats.py
M  scripts/train_pytorch.py
M  VARIOUS_SPEED_README.md

# Deleted
D  VARIOUS_SPEED_CURRENT_PIPELINE.md
```

## 10. Verification still needed (manual, on GPU host)

1. `uv run pytest tests/test_soft_prompt_smoke.py -v` β€” config validation
   and nearest-neighbor logic. CPU-only, fast.
2. Single-batch forward pass of `PI0Pytorch` with soft_prompt enabled
   (see docstring on
   `tests/test_soft_prompt_smoke.py::test_full_forward_pass_manual_only`).
3. `uv run python scripts/run_ablations.py ... --dry-run` β€” visually
   confirm the printed CLI commands look correct, especially that
   `--model.soft-prompt-speeds 0.75 1 1.25 1.5` is space-separated.
4. ~50-step smoke run of `softprompt_p8` on 1 GPU to confirm the model
   trains without shape / mask / dtype errors.
5. `profile_action_norms.py` on the source dataset, then update
   `--clean-transl-eps` / `--clean-rot-eps` in build commands to
   data-driven values before kicking off the full sweep.
6. `eval_libero_8gpu.sh` end-to-end with a single trained checkpoint,
   `SPEED=1.0` on the in-distribution speed first to confirm 8-rank
   coordination works, then iterate over OOD speeds.