Modification Summary (branch 0502_mp_process)
Snapshot of every edit in this session, grouped by theme. Use this as a code-review checklist and as a record of what changed behaviorally in the training, evaluation, and data-prep pipelines.
High-level themes
- Multi-process dataset builder + per-build diagnostics β cleaning &
replay summaries written to
meta/, with per-speed integrated-motion error reporting. - Action-norm profiler for data-driven
clean_*_epscalibration. - Speed-integration ablation framework β three model-side strategies
(
text/modulation/soft_prompt) selectable via a single config field, plus a P-length sweep on the soft-prompt arm. - Ablation orchestration β
scripts/run_ablations.py(end-to-end) andscripts/build_ablation_datasets.py(data-prep only) with build / norm-stats dedup. - 8-GPU LIBERO evaluation β partitioned eval driver that runs
libero_spatial / goal / objectplus a 5-way split oflibero_10, tracks per-episode step counts, and reports per-suite + global rollups. train_pytorch.pycleanup β removed silent NFS hang (wandb sample image block), removed hardcoded WANDB API key, made per-speed wandb breakdown config-driven.- Four latent bugs fixed that were silently degrading training or evaluation, see Β§7.
1. Data processing
New files
| File | Purpose |
|---|---|
scripts/build_libero_speed_dataset_mp.py |
Multi-process build. Per-source-episode workers via ProcessPoolExecutor (spawn); main process does a sequential fix-up to fill the global index column. Same output schema as the single-process variant. CLI adds --num-workers. |
scripts/profile_action_norms.py |
Profile βaction[:, :3]β and βaction[:, 3:6]β distributions on the source dataset. Prints percentiles + threshold table; suggests P1 / P5 values for --clean-transl-eps / --clean-rot-eps. Optional JSON output. |
Modified files
| File | Change |
|---|---|
scripts/build_libero_speed_dataset.py |
New _aggregate_cleaning_stats (deduped by source_episode_index) and _aggregate_replay_metrics (per target speed). Writes meta/cleaning_summary.json and meta/replay_summary.json; prints both at end of build. |
src/various_speed/core.py |
transform_episode metrics now include cleaned_any_frames, cleaned_both_frames, plus the four ratios (*_translation_ratio, *_rotation_ratio, *_any_ratio, *_both_ratio). |
scripts/compute_norm_stats.py |
main() accepts --repo-id and --asset-id overrides. Uses dataclasses.replace on frozen TrainConfig / nested LeRobotVariousSpeedLiberoDataConfig / AssetsConfig to apply overrides without registering one TrainConfig per ablation. |
2. train_pytorch.py cleanup
- Removed the wandb sample-image logging block. It created a second DataLoader and fetched 256 samples on the first batch, hanging silently for many minutes on NFS-backed datasets. Loss / lr / grad-norm wandb logging is unaffected.
- Removed the hardcoded
WANDB_API_KEYenv-var assignment (security: a real key was committed to the repo). Auth now uses the standard wandb resolution order. The leaked key is in git history; rotate it on wandb. - Removed the
"EMA is not supported for PyTorch training"log line (noise). speed_specs(per-speed wandb loss breakdown) andavg_flow_metricskey list are now derived fromconfig.eval_speed_set, no longer hardcoded0p5 / 1p0 / 2p0. (Only fires whenobservation.flow_control is not None.)
3. Speed-integration ablation: model + config
New TrainConfig field (src/openpi/training/config.py)
eval_speed_set: tuple[float, ...] = (0.5, 1.0, 2.0)
Drives the per-speed wandb breakdown in train_pytorch.py.
New LeRobotVariousSpeedLiberoDataConfig field
speed_integration: Literal["text", "modulation", "soft_prompt", "auto"] = "auto"
A high-level switch:
| value | behavior | requirement |
|---|---|---|
text |
adds SpeedConditionedPrompt to data transforms |
none |
modulation |
model reads raw observation.speed -> MLP -> adaRMS in action expert |
Pi0Config.speed_modulation=True |
soft_prompt |
inserts K Γ P learnable tokens between vision and instruction | Pi0Config.soft_prompt_p >= 1 and soft_prompt_speeds non-empty |
New Pi0Config fields (src/openpi/models/pi0_config.py)
speed_modulation: bool = False
soft_prompt_speeds: tuple[float, ...] = ()
soft_prompt_p: int = 0
flow_control_dim was removed. inputs_spec now declares
speed=ShapeDtypeStruct([B, 1], float32) whenever modulation OR
soft_prompt is enabled.
Observation schema (src/openpi/models/model.py)
Added speed: at.Float[ArrayT, "*b 1"] | None = None. Both from_dict
and preprocess_observation (JAX) propagate it.
PI0Pytorch surgery (src/openpi/models_pytorch/pi0_pytorch.py)
__init__:- When
speed_modulation=True: registersspeed_mod_mlp_in/outandspeed_condition_mlp_in/out(replaces the oldflow_control_*/flow_condition_*MLPs). Reads rawobservation.speed(shape(B, 1)); no log transform is applied. - When
soft_prompt_p > 0: registerssoft_prompt_tokens: nn.Parameterof shape(K, P, paligemma_width)withN(0, 0.02)init, plus a non-persistent buffersoft_prompt_anchors: tensor(K,).
- When
_preprocess_observation: returns a 6-tuple(images, image_masks, lang_tokens, lang_masks, state, speed).embed_prefix: acceptsspeed=None; when soft_prompt is enabled, computesargmin |speed β anchors|per batch element and inserts(B, P, hidden)tokens between image and language tokens with full attention. OOD speeds fall back to the nearest training anchor.embed_suffix: acceptsspeed=None; when modulation is enabled, pushes raw speed throughspeed_mod_mlpand fuses with the timestep embedding viaspeed_condition_mlp.forward,sample_actions, anddenoise_stepplumbspeedthrough.
JAX Pi0 (src/openpi/models/pi0.py) was renamed in the same way for
consistency: flow_control_dim β speed_modulation, flow_control_mlp_* β speed_mod_mlp_*, flow_condition_mlp_* β speed_condition_mlp_*,
reads obs.speed.
Policy passthrough (src/openpi/policies/libero_policy.py)
LiberoInputs now passes data["speed"] through, alongside the existing
flow_control passthrough.
Smoke tests (tests/test_soft_prompt_smoke.py)
Light tests for Pi0Config field acceptance and the argmin
nearest-neighbor logic. The full end-to-end forward-pass test requires
PaliGemma weights and is gated to a manual GPU run.
4. Ablation orchestration
scripts/run_ablations.py (new)
End-to-end orchestrator: build β norm-stats β train, per ablation.
Ablationdataclass:name,speeds,speed_integration,extra_train_args,shared_norm_key.- 12 default ablations:
- Speed-set sweep (5):
g1_baseline,g2_coarse,g3a_step025,g4_narrow,g5_extreme. All usetextintegration. - Speed-integration sweep (3):
speedint_text,speedint_modulation(with--model.flow-control-dim=1), andsoftprompt_p8(reused from the P-sweep). - Soft-prompt P-length sweep (5):
softprompt_p{1,4,8,16,32}. All declareshared_norm_key="softprompt_shared"so they reuse onenorm_stats.json.
- Speed-set sweep (5):
- For the default 12-ablation table, dedup gives 5 builds, 8 norm-stats, 12 train runs.
--only,--skip-build,--skip-norm-stats,--skip-train,--dry-runfor scoped runs.
scripts/build_ablation_datasets.py (new)
Thin focused wrapper for the data-prep stage only. Imports the same
ABLATIONS table from run_ablations.py, applies the same build dedup,
exits with a summary mapping ablation names to dataset paths.
5. 8-GPU LIBERO evaluation
New files
| File | Purpose |
|---|---|
scripts/eval_libero_speed.py |
Single-GPU LIBERO eval client: connects to a websocket policy server, runs rollouts on a chosen suite or task-id subset, sends speed and speed_label in the observation element, records per-episode success / steps / task_id, prints per-rank summary, and writes a JSON. |
scripts/eval_libero_8gpu.sh |
Driver: dispatches 8 parallel eval_libero_speed.py clients. Partition is GPU 0 β spatial / 1 β goal / 2 β object (full suites) and GPU 3-7 β libero_10 split into pairs of tasks, balancing wall-clock. After all 8 finish, auto-aggregates per-suite and global rollups (success rate, mean steps for successes, mean steps overall). |
Per-episode tracking
eval_libero_speed.py records the policy steps actually executed
(excluding the num_steps_wait warmup), so:
- successes terminate early when
env.stepreturnsdone=Trueand report the true step count; - failures hit
max_stepsfor the suite (220 / 280 / 300 / 520 / 400 for spatial / object / goal / 10 / 90 respectively).
The gap between mean_steps_success and mean_steps_all is a fast
read-out for failure rate at a glance: mean_steps_all rises sharply
when failures push the time-limit cap.
Output
results/libero_eval_<speed>x_<ts>/
spatial_<speed>x.json
goal_<speed>x.json
object_<speed>x.json
long_t0_1_<speed>x.json long_t2_3_<speed>x.json ... long_t8_9_<speed>x.json
logs/<...>.log
videos/<...>/<rollout>.mp4
Each per-rank JSON contains a summary block (success rate, step
statistics, summary line) and a per-episode list. The driver's final
output is a per-suite rollup and a global line.
6. Documentation
VARIOUS_SPEED_README.mdβ added Β§2 (action-norm profiling) and Β§4 (multi-process build + cleaning/replay summary outputs); Β§8 notes the wandb-image-logging removal.README_ablation.md(new) β full ablation workflow doc, including: the four sweep tables, build/norm dedup behavior, the 8-GPU evaluation workflow, and the soft-prompt implementation notes.modification_summary.md(this file).VARIOUS_SPEED_CURRENT_PIPELINE.mdβ deleted (superseded).
7. Bug fixes worth flagging
7.1 flow_control was being silently dropped
src/openpi/models_pytorch/preprocessing_pytorch.py constructed
SimpleProcessedObservation without copying flow_control. Result: in
PyTorch training, observation.flow_control was always None, so the
action expert's modulation MLP always received zeros.
Implication: any prior PyTorch run with the (now-removed)
pi05_libero_various_speed_all_flow_prompt config did not actually
use modulation β it was equivalent to the text-only path. After the
later refactor, the flow_control field was eliminated entirely; the
modulation path now reads observation.speed directly. The new
pi05_libero_various_speed_all_modulation config replaces it.
Fix: pass speed through to SimpleProcessedObservation (the
flow_control field has since been removed).
7.2 JAX preprocess_observation did not pass speed
src/openpi/models/model.py:preprocess_observation (JAX path) didn't
propagate the new speed field. Even though the JAX trainer is not used
for the soft_prompt sweep, the field should round-trip cleanly to keep
both backends consistent. Fixed.
7.3 --model.soft-prompt-speeds CLI syntax
scripts/run_ablations.py initially emitted
--model.soft-prompt-speeds=0.75,1,1.25,1.5 (comma-joined). Tyro parses
tuple[float, ...] from space-separated argv elements (matching
--eval-speed-set style). Fixed: emit the flag and each value as
separate argv elements.
7.4 Hardcoded WANDB API key in init_wandb
A live key was hardcoded and unconditionally written to
os.environ["WANDB_API_KEY"], overriding each user's own credentials and
attributing all runs to one account. Removed; wandb now uses its standard
auth resolution order. The committed key is exposed in git history;
revoke and rotate.
8. Behavioral changes you should be aware of
speed_integrationdefaults to"auto", which preserves legacy behavior of the existing 3 LIBERO speed configs inconfig.py. New ablation configs should setspeed_integrationexplicitly.- The legacy
pi05_libero_various_speed_all_flow_promptandpi05_libero_various_speed_all_flow_nopromptconfigs were removed (replaced bypi05_libero_various_speed_all_modulation). The old configs were equivalent to text-only training due to the Β§7.1 bug, so any checkpoints from those names are not the modulation behavior they appeared to be. - wandb no longer logs sample camera images on first batch. If you
relied on that for debugging data inputs, run
scripts/visualize_speed_dataset.pyseparately. - Per-build
meta/cleaning_summary.jsonandmeta/replay_summary.jsonare new artifacts. Existing downstream consumers should ignore unknown meta files; verify if you have custom tooling that readsmeta/*.json. g2_coarseandg4_narrowspeeds were updated mid-session:g2_coarse:[0.5, 1.0, 2.0]β[0.5, 1.0, 1.5, 2.0]g4_narrow:[0.75, 1.0, 1.25]β[0.75, 1.0, 1.25, 1.5]g4_narrownow shares its dataset with the entire speed-integration sweep, so the runner builds it only once.
9. Files added / modified / deleted
# New
A README_ablation.md
A modification_summary.md
A scripts/build_ablation_datasets.py
A scripts/eval_libero_8gpu.sh
A scripts/eval_libero_speed.py
A scripts/profile_action_norms.py
A scripts/run_ablations.py
A tests/test_soft_prompt_smoke.py
# Modified
M src/openpi/models/model.py
M src/openpi/models/pi0.py
M src/openpi/models/pi0_config.py
M src/openpi/models_pytorch/pi0_pytorch.py
M src/openpi/models_pytorch/preprocessing_pytorch.py
M src/openpi/policies/libero_policy.py
M src/openpi/transforms.py
M src/openpi/training/config.py
M src/various_speed/core.py
M scripts/build_libero_speed_dataset.py
M scripts/build_libero_speed_dataset_mp.py
M scripts/compute_norm_stats.py
M scripts/train_pytorch.py
M VARIOUS_SPEED_README.md
# Deleted
D VARIOUS_SPEED_CURRENT_PIPELINE.md
10. Verification still needed (manual, on GPU host)
uv run pytest tests/test_soft_prompt_smoke.py -vβ config validation and nearest-neighbor logic. CPU-only, fast.- Single-batch forward pass of
PI0Pytorchwith soft_prompt enabled (see docstring ontests/test_soft_prompt_smoke.py::test_full_forward_pass_manual_only). uv run python scripts/run_ablations.py ... --dry-runβ visually confirm the printed CLI commands look correct, especially that--model.soft-prompt-speeds 0.75 1 1.25 1.5is space-separated.- ~50-step smoke run of
softprompt_p8on 1 GPU to confirm the model trains without shape / mask / dtype errors. profile_action_norms.pyon the source dataset, then update--clean-transl-eps/--clean-rot-epsin build commands to data-driven values before kicking off the full sweep.eval_libero_8gpu.shend-to-end with a single trained checkpoint,SPEED=1.0on the in-distribution speed first to confirm 8-rank coordination works, then iterate over OOD speeds.