File size: 15,215 Bytes
08ff31f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 | # Modification Summary (branch `0502_mp_process`)
Snapshot of every edit in this session, grouped by theme. Use this as a
code-review checklist and as a record of what changed *behaviorally* in the
training, evaluation, and data-prep pipelines.
## High-level themes
1. **Multi-process dataset builder + per-build diagnostics** β cleaning &
replay summaries written to `meta/`, with per-speed integrated-motion
error reporting.
2. **Action-norm profiler** for data-driven `clean_*_eps` calibration.
3. **Speed-integration ablation framework** β three model-side strategies
(`text` / `modulation` / `soft_prompt`) selectable via a single config
field, plus a P-length sweep on the soft-prompt arm.
4. **Ablation orchestration** β `scripts/run_ablations.py` (end-to-end) and
`scripts/build_ablation_datasets.py` (data-prep only) with build /
norm-stats dedup.
5. **8-GPU LIBERO evaluation** β partitioned eval driver that runs
`libero_spatial / goal / object` plus a 5-way split of `libero_10`,
tracks per-episode step counts, and reports per-suite + global rollups.
6. **`train_pytorch.py` cleanup** β removed silent NFS hang (wandb sample
image block), removed hardcoded WANDB API key, made per-speed wandb
breakdown config-driven.
7. **Four latent bugs fixed** that were silently degrading training or
evaluation, see Β§7.
## 1. Data processing
### New files
| File | Purpose |
|---|---|
| `scripts/build_libero_speed_dataset_mp.py` | Multi-process build. Per-source-episode workers via `ProcessPoolExecutor` (`spawn`); main process does a sequential fix-up to fill the global `index` column. Same output schema as the single-process variant. CLI adds `--num-workers`. |
| `scripts/profile_action_norms.py` | Profile `βaction[:, :3]β` and `βaction[:, 3:6]β` distributions on the source dataset. Prints percentiles + threshold table; suggests P1 / P5 values for `--clean-transl-eps` / `--clean-rot-eps`. Optional JSON output. |
### Modified files
| File | Change |
|---|---|
| `scripts/build_libero_speed_dataset.py` | New `_aggregate_cleaning_stats` (deduped by `source_episode_index`) and `_aggregate_replay_metrics` (per target speed). Writes `meta/cleaning_summary.json` and `meta/replay_summary.json`; prints both at end of build. |
| `src/various_speed/core.py` | `transform_episode` metrics now include `cleaned_any_frames`, `cleaned_both_frames`, plus the four ratios (`*_translation_ratio`, `*_rotation_ratio`, `*_any_ratio`, `*_both_ratio`). |
| `scripts/compute_norm_stats.py` | `main()` accepts `--repo-id` and `--asset-id` overrides. Uses `dataclasses.replace` on frozen `TrainConfig` / nested `LeRobotVariousSpeedLiberoDataConfig` / `AssetsConfig` to apply overrides without registering one TrainConfig per ablation. |
## 2. `train_pytorch.py` cleanup
- Removed the wandb sample-image logging block. It created a second
DataLoader and fetched 256 samples on the first batch, hanging silently
for many minutes on NFS-backed datasets. Loss / lr / grad-norm wandb
logging is unaffected.
- Removed the hardcoded `WANDB_API_KEY` env-var assignment (security: a real
key was committed to the repo). Auth now uses the standard wandb
resolution order. **The leaked key is in git history; rotate it on
wandb**.
- Removed the `"EMA is not supported for PyTorch training"` log line
(noise).
- `speed_specs` (per-speed wandb loss breakdown) and `avg_flow_metrics` key
list are now derived from `config.eval_speed_set`, no longer hardcoded
`0p5 / 1p0 / 2p0`. (Only fires when `observation.flow_control is not None`.)
## 3. Speed-integration ablation: model + config
### New TrainConfig field (`src/openpi/training/config.py`)
```python
eval_speed_set: tuple[float, ...] = (0.5, 1.0, 2.0)
```
Drives the per-speed wandb breakdown in `train_pytorch.py`.
### New LeRobotVariousSpeedLiberoDataConfig field
```python
speed_integration: Literal["text", "modulation", "soft_prompt", "auto"] = "auto"
```
A high-level switch:
| value | behavior | requirement |
|---|---|---|
| `text` | adds `SpeedConditionedPrompt` to data transforms | none |
| `modulation` | model reads raw `observation.speed` -> MLP -> adaRMS in action expert | `Pi0Config.speed_modulation=True` |
| `soft_prompt` | inserts K Γ P learnable tokens between vision and instruction | `Pi0Config.soft_prompt_p >= 1` and `soft_prompt_speeds` non-empty |
### New Pi0Config fields (`src/openpi/models/pi0_config.py`)
```python
speed_modulation: bool = False
soft_prompt_speeds: tuple[float, ...] = ()
soft_prompt_p: int = 0
```
`flow_control_dim` was removed. `inputs_spec` now declares
`speed=ShapeDtypeStruct([B, 1], float32)` whenever modulation OR
soft_prompt is enabled.
### Observation schema (`src/openpi/models/model.py`)
Added `speed: at.Float[ArrayT, "*b 1"] | None = None`. Both `from_dict`
and `preprocess_observation` (JAX) propagate it.
### PI0Pytorch surgery (`src/openpi/models_pytorch/pi0_pytorch.py`)
- `__init__`:
- When `speed_modulation=True`: registers `speed_mod_mlp_in/out` and
`speed_condition_mlp_in/out` (replaces the old `flow_control_*` /
`flow_condition_*` MLPs). Reads raw `observation.speed`
(shape `(B, 1)`); no log transform is applied.
- When `soft_prompt_p > 0`: registers `soft_prompt_tokens: nn.Parameter`
of shape `(K, P, paligemma_width)` with `N(0, 0.02)` init, plus a
non-persistent buffer `soft_prompt_anchors: tensor(K,)`.
- `_preprocess_observation`: returns a 6-tuple
`(images, image_masks, lang_tokens, lang_masks, state, speed)`.
- `embed_prefix`: accepts `speed=None`; when soft_prompt is enabled,
computes `argmin |speed β anchors|` per batch element and inserts
`(B, P, hidden)` tokens between image and language tokens with full
attention. OOD speeds fall back to the nearest training anchor.
- `embed_suffix`: accepts `speed=None`; when modulation is enabled,
pushes raw speed through `speed_mod_mlp` and fuses with the timestep
embedding via `speed_condition_mlp`.
- `forward`, `sample_actions`, and `denoise_step` plumb `speed` through.
JAX `Pi0` (`src/openpi/models/pi0.py`) was renamed in the same way for
consistency: `flow_control_dim β speed_modulation`, `flow_control_mlp_*
β speed_mod_mlp_*`, `flow_condition_mlp_* β speed_condition_mlp_*`,
reads `obs.speed`.
### Policy passthrough (`src/openpi/policies/libero_policy.py`)
`LiberoInputs` now passes `data["speed"]` through, alongside the existing
`flow_control` passthrough.
### Smoke tests (`tests/test_soft_prompt_smoke.py`)
Light tests for `Pi0Config` field acceptance and the argmin
nearest-neighbor logic. The full end-to-end forward-pass test requires
PaliGemma weights and is gated to a manual GPU run.
## 4. Ablation orchestration
### `scripts/run_ablations.py` (new)
End-to-end orchestrator: build β norm-stats β train, per ablation.
- `Ablation` dataclass: `name`, `speeds`, `speed_integration`,
`extra_train_args`, `shared_norm_key`.
- 12 default ablations:
- **Speed-set sweep** (5): `g1_baseline`, `g2_coarse`, `g3a_step025`,
`g4_narrow`, `g5_extreme`. All use `text` integration.
- **Speed-integration sweep** (3): `speedint_text`,
`speedint_modulation` (with `--model.flow-control-dim=1`), and
`softprompt_p8` (reused from the P-sweep).
- **Soft-prompt P-length sweep** (5): `softprompt_p{1,4,8,16,32}`. All
declare `shared_norm_key="softprompt_shared"` so they reuse one
`norm_stats.json`.
- For the default 12-ablation table, dedup gives **5 builds, 8 norm-stats,
12 train runs**.
- `--only`, `--skip-build`, `--skip-norm-stats`, `--skip-train`,
`--dry-run` for scoped runs.
### `scripts/build_ablation_datasets.py` (new)
Thin focused wrapper for the data-prep stage only. Imports the same
`ABLATIONS` table from `run_ablations.py`, applies the same build dedup,
exits with a summary mapping ablation names to dataset paths.
## 5. 8-GPU LIBERO evaluation
### New files
| File | Purpose |
|---|---|
| `scripts/eval_libero_speed.py` | Single-GPU LIBERO eval client: connects to a websocket policy server, runs rollouts on a chosen suite or task-id subset, sends `speed` and `speed_label` in the observation element, records per-episode `success` / `steps` / `task_id`, prints per-rank summary, and writes a JSON. |
| `scripts/eval_libero_8gpu.sh` | Driver: dispatches 8 parallel `eval_libero_speed.py` clients. Partition is **GPU 0 β spatial / 1 β goal / 2 β object** (full suites) and **GPU 3-7 β libero_10 split into pairs of tasks**, balancing wall-clock. After all 8 finish, auto-aggregates per-suite and global rollups (success rate, mean steps for successes, mean steps overall). |
### Per-episode tracking
`eval_libero_speed.py` records the **policy steps actually executed**
(excluding the `num_steps_wait` warmup), so:
- successes terminate early when `env.step` returns `done=True` and report
the true step count;
- failures hit `max_steps` for the suite (220 / 280 / 300 / 520 / 400 for
spatial / object / goal / 10 / 90 respectively).
The gap between `mean_steps_success` and `mean_steps_all` is a fast
read-out for failure rate at a glance: `mean_steps_all` rises sharply
when failures push the time-limit cap.
### Output
```
results/libero_eval_<speed>x_<ts>/
spatial_<speed>x.json
goal_<speed>x.json
object_<speed>x.json
long_t0_1_<speed>x.json long_t2_3_<speed>x.json ... long_t8_9_<speed>x.json
logs/<...>.log
videos/<...>/<rollout>.mp4
```
Each per-rank JSON contains a `summary` block (success rate, step
statistics, summary line) and a per-episode list. The driver's final
output is a per-suite rollup and a global line.
## 6. Documentation
- `VARIOUS_SPEED_README.md` β added Β§2 (action-norm profiling) and Β§4
(multi-process build + cleaning/replay summary outputs); Β§8 notes the
wandb-image-logging removal.
- `README_ablation.md` (new) β full ablation workflow doc, including:
the four sweep tables, build/norm dedup behavior, the 8-GPU evaluation
workflow, and the soft-prompt implementation notes.
- `modification_summary.md` (this file).
- `VARIOUS_SPEED_CURRENT_PIPELINE.md` β deleted (superseded).
## 7. Bug fixes worth flagging
### 7.1 `flow_control` was being silently dropped
`src/openpi/models_pytorch/preprocessing_pytorch.py` constructed
`SimpleProcessedObservation` without copying `flow_control`. Result: in
PyTorch training, `observation.flow_control` was always `None`, so the
action expert's modulation MLP always received zeros.
**Implication**: any prior PyTorch run with the (now-removed)
`pi05_libero_various_speed_all_flow_prompt` config did **not** actually
use modulation β it was equivalent to the text-only path. After the
later refactor, the `flow_control` field was eliminated entirely; the
modulation path now reads `observation.speed` directly. The new
`pi05_libero_various_speed_all_modulation` config replaces it.
Fix: pass `speed` through to `SimpleProcessedObservation` (the
`flow_control` field has since been removed).
### 7.2 JAX `preprocess_observation` did not pass `speed`
`src/openpi/models/model.py:preprocess_observation` (JAX path) didn't
propagate the new `speed` field. Even though the JAX trainer is not used
for the soft_prompt sweep, the field should round-trip cleanly to keep
both backends consistent. Fixed.
### 7.3 `--model.soft-prompt-speeds` CLI syntax
`scripts/run_ablations.py` initially emitted
`--model.soft-prompt-speeds=0.75,1,1.25,1.5` (comma-joined). Tyro parses
`tuple[float, ...]` from space-separated argv elements (matching
`--eval-speed-set` style). Fixed: emit the flag and each value as
separate argv elements.
### 7.4 Hardcoded WANDB API key in `init_wandb`
A live key was hardcoded and unconditionally written to
`os.environ["WANDB_API_KEY"]`, overriding each user's own credentials and
attributing all runs to one account. Removed; wandb now uses its standard
auth resolution order. **The committed key is exposed in git history;
revoke and rotate**.
## 8. Behavioral changes you should be aware of
- **`speed_integration` defaults to `"auto"`**, which preserves legacy
behavior of the existing 3 LIBERO speed configs in `config.py`. New
ablation configs should set `speed_integration` explicitly.
- **The legacy `pi05_libero_various_speed_all_flow_prompt` and
`pi05_libero_various_speed_all_flow_noprompt` configs were removed**
(replaced by `pi05_libero_various_speed_all_modulation`). The old
configs were equivalent to text-only training due to the Β§7.1 bug, so
any checkpoints from those names are not the modulation behavior they
appeared to be.
- **wandb no longer logs sample camera images on first batch**. If you
relied on that for debugging data inputs, run
`scripts/visualize_speed_dataset.py` separately.
- **Per-build `meta/cleaning_summary.json` and `meta/replay_summary.json`**
are new artifacts. Existing downstream consumers should ignore unknown
meta files; verify if you have custom tooling that reads `meta/*.json`.
- **`g2_coarse` and `g4_narrow` speeds were updated mid-session**:
- `g2_coarse`: `[0.5, 1.0, 2.0]` β `[0.5, 1.0, 1.5, 2.0]`
- `g4_narrow`: `[0.75, 1.0, 1.25]` β `[0.75, 1.0, 1.25, 1.5]`
- `g4_narrow` now shares its dataset with the entire speed-integration
sweep, so the runner builds it only once.
## 9. Files added / modified / deleted
```
# New
A README_ablation.md
A modification_summary.md
A scripts/build_ablation_datasets.py
A scripts/eval_libero_8gpu.sh
A scripts/eval_libero_speed.py
A scripts/profile_action_norms.py
A scripts/run_ablations.py
A tests/test_soft_prompt_smoke.py
# Modified
M src/openpi/models/model.py
M src/openpi/models/pi0.py
M src/openpi/models/pi0_config.py
M src/openpi/models_pytorch/pi0_pytorch.py
M src/openpi/models_pytorch/preprocessing_pytorch.py
M src/openpi/policies/libero_policy.py
M src/openpi/transforms.py
M src/openpi/training/config.py
M src/various_speed/core.py
M scripts/build_libero_speed_dataset.py
M scripts/build_libero_speed_dataset_mp.py
M scripts/compute_norm_stats.py
M scripts/train_pytorch.py
M VARIOUS_SPEED_README.md
# Deleted
D VARIOUS_SPEED_CURRENT_PIPELINE.md
```
## 10. Verification still needed (manual, on GPU host)
1. `uv run pytest tests/test_soft_prompt_smoke.py -v` β config validation
and nearest-neighbor logic. CPU-only, fast.
2. Single-batch forward pass of `PI0Pytorch` with soft_prompt enabled
(see docstring on
`tests/test_soft_prompt_smoke.py::test_full_forward_pass_manual_only`).
3. `uv run python scripts/run_ablations.py ... --dry-run` β visually
confirm the printed CLI commands look correct, especially that
`--model.soft-prompt-speeds 0.75 1 1.25 1.5` is space-separated.
4. ~50-step smoke run of `softprompt_p8` on 1 GPU to confirm the model
trains without shape / mask / dtype errors.
5. `profile_action_norms.py` on the source dataset, then update
`--clean-transl-eps` / `--clean-rot-eps` in build commands to
data-driven values before kicking off the full sweep.
6. `eval_libero_8gpu.sh` end-to-end with a single trained checkpoint,
`SPEED=1.0` on the in-distribution speed first to confirm 8-rank
coordination works, then iterate over OOD speeds.
|