VLAwithVariousSpeed / modification_summary.md
Alan0928's picture
Upload folder using huggingface_hub
08ff31f verified
|
Raw
History Blame Contribute Delete
15.2 kB

Modification Summary (branch 0502_mp_process)

Snapshot of every edit in this session, grouped by theme. Use this as a code-review checklist and as a record of what changed behaviorally in the training, evaluation, and data-prep pipelines.

High-level themes

  1. Multi-process dataset builder + per-build diagnostics β€” cleaning & replay summaries written to meta/, with per-speed integrated-motion error reporting.
  2. Action-norm profiler for data-driven clean_*_eps calibration.
  3. Speed-integration ablation framework β€” three model-side strategies (text / modulation / soft_prompt) selectable via a single config field, plus a P-length sweep on the soft-prompt arm.
  4. Ablation orchestration β€” scripts/run_ablations.py (end-to-end) and scripts/build_ablation_datasets.py (data-prep only) with build / norm-stats dedup.
  5. 8-GPU LIBERO evaluation β€” partitioned eval driver that runs libero_spatial / goal / object plus a 5-way split of libero_10, tracks per-episode step counts, and reports per-suite + global rollups.
  6. train_pytorch.py cleanup β€” removed silent NFS hang (wandb sample image block), removed hardcoded WANDB API key, made per-speed wandb breakdown config-driven.
  7. Four latent bugs fixed that were silently degrading training or evaluation, see Β§7.

1. Data processing

New files

File Purpose
scripts/build_libero_speed_dataset_mp.py Multi-process build. Per-source-episode workers via ProcessPoolExecutor (spawn); main process does a sequential fix-up to fill the global index column. Same output schema as the single-process variant. CLI adds --num-workers.
scripts/profile_action_norms.py Profile β€–action[:, :3]β€– and β€–action[:, 3:6]β€– distributions on the source dataset. Prints percentiles + threshold table; suggests P1 / P5 values for --clean-transl-eps / --clean-rot-eps. Optional JSON output.

Modified files

File Change
scripts/build_libero_speed_dataset.py New _aggregate_cleaning_stats (deduped by source_episode_index) and _aggregate_replay_metrics (per target speed). Writes meta/cleaning_summary.json and meta/replay_summary.json; prints both at end of build.
src/various_speed/core.py transform_episode metrics now include cleaned_any_frames, cleaned_both_frames, plus the four ratios (*_translation_ratio, *_rotation_ratio, *_any_ratio, *_both_ratio).
scripts/compute_norm_stats.py main() accepts --repo-id and --asset-id overrides. Uses dataclasses.replace on frozen TrainConfig / nested LeRobotVariousSpeedLiberoDataConfig / AssetsConfig to apply overrides without registering one TrainConfig per ablation.

2. train_pytorch.py cleanup

  • Removed the wandb sample-image logging block. It created a second DataLoader and fetched 256 samples on the first batch, hanging silently for many minutes on NFS-backed datasets. Loss / lr / grad-norm wandb logging is unaffected.
  • Removed the hardcoded WANDB_API_KEY env-var assignment (security: a real key was committed to the repo). Auth now uses the standard wandb resolution order. The leaked key is in git history; rotate it on wandb.
  • Removed the "EMA is not supported for PyTorch training" log line (noise).
  • speed_specs (per-speed wandb loss breakdown) and avg_flow_metrics key list are now derived from config.eval_speed_set, no longer hardcoded 0p5 / 1p0 / 2p0. (Only fires when observation.flow_control is not None.)

3. Speed-integration ablation: model + config

New TrainConfig field (src/openpi/training/config.py)

eval_speed_set: tuple[float, ...] = (0.5, 1.0, 2.0)

Drives the per-speed wandb breakdown in train_pytorch.py.

New LeRobotVariousSpeedLiberoDataConfig field

speed_integration: Literal["text", "modulation", "soft_prompt", "auto"] = "auto"

A high-level switch:

value behavior requirement
text adds SpeedConditionedPrompt to data transforms none
modulation model reads raw observation.speed -> MLP -> adaRMS in action expert Pi0Config.speed_modulation=True
soft_prompt inserts K Γ— P learnable tokens between vision and instruction Pi0Config.soft_prompt_p >= 1 and soft_prompt_speeds non-empty

New Pi0Config fields (src/openpi/models/pi0_config.py)

speed_modulation: bool = False
soft_prompt_speeds: tuple[float, ...] = ()
soft_prompt_p: int = 0

flow_control_dim was removed. inputs_spec now declares speed=ShapeDtypeStruct([B, 1], float32) whenever modulation OR soft_prompt is enabled.

Observation schema (src/openpi/models/model.py)

Added speed: at.Float[ArrayT, "*b 1"] | None = None. Both from_dict and preprocess_observation (JAX) propagate it.

PI0Pytorch surgery (src/openpi/models_pytorch/pi0_pytorch.py)

  • __init__:
    • When speed_modulation=True: registers speed_mod_mlp_in/out and speed_condition_mlp_in/out (replaces the old flow_control_* / flow_condition_* MLPs). Reads raw observation.speed (shape (B, 1)); no log transform is applied.
    • When soft_prompt_p > 0: registers soft_prompt_tokens: nn.Parameter of shape (K, P, paligemma_width) with N(0, 0.02) init, plus a non-persistent buffer soft_prompt_anchors: tensor(K,).
  • _preprocess_observation: returns a 6-tuple (images, image_masks, lang_tokens, lang_masks, state, speed).
  • embed_prefix: accepts speed=None; when soft_prompt is enabled, computes argmin |speed βˆ’ anchors| per batch element and inserts (B, P, hidden) tokens between image and language tokens with full attention. OOD speeds fall back to the nearest training anchor.
  • embed_suffix: accepts speed=None; when modulation is enabled, pushes raw speed through speed_mod_mlp and fuses with the timestep embedding via speed_condition_mlp.
  • forward, sample_actions, and denoise_step plumb speed through.

JAX Pi0 (src/openpi/models/pi0.py) was renamed in the same way for consistency: flow_control_dim β†’ speed_modulation, flow_control_mlp_* β†’ speed_mod_mlp_*, flow_condition_mlp_* β†’ speed_condition_mlp_*, reads obs.speed.

Policy passthrough (src/openpi/policies/libero_policy.py)

LiberoInputs now passes data["speed"] through, alongside the existing flow_control passthrough.

Smoke tests (tests/test_soft_prompt_smoke.py)

Light tests for Pi0Config field acceptance and the argmin nearest-neighbor logic. The full end-to-end forward-pass test requires PaliGemma weights and is gated to a manual GPU run.

4. Ablation orchestration

scripts/run_ablations.py (new)

End-to-end orchestrator: build β†’ norm-stats β†’ train, per ablation.

  • Ablation dataclass: name, speeds, speed_integration, extra_train_args, shared_norm_key.
  • 12 default ablations:
    • Speed-set sweep (5): g1_baseline, g2_coarse, g3a_step025, g4_narrow, g5_extreme. All use text integration.
    • Speed-integration sweep (3): speedint_text, speedint_modulation (with --model.flow-control-dim=1), and softprompt_p8 (reused from the P-sweep).
    • Soft-prompt P-length sweep (5): softprompt_p{1,4,8,16,32}. All declare shared_norm_key="softprompt_shared" so they reuse one norm_stats.json.
  • For the default 12-ablation table, dedup gives 5 builds, 8 norm-stats, 12 train runs.
  • --only, --skip-build, --skip-norm-stats, --skip-train, --dry-run for scoped runs.

scripts/build_ablation_datasets.py (new)

Thin focused wrapper for the data-prep stage only. Imports the same ABLATIONS table from run_ablations.py, applies the same build dedup, exits with a summary mapping ablation names to dataset paths.

5. 8-GPU LIBERO evaluation

New files

File Purpose
scripts/eval_libero_speed.py Single-GPU LIBERO eval client: connects to a websocket policy server, runs rollouts on a chosen suite or task-id subset, sends speed and speed_label in the observation element, records per-episode success / steps / task_id, prints per-rank summary, and writes a JSON.
scripts/eval_libero_8gpu.sh Driver: dispatches 8 parallel eval_libero_speed.py clients. Partition is GPU 0 β†’ spatial / 1 β†’ goal / 2 β†’ object (full suites) and GPU 3-7 β†’ libero_10 split into pairs of tasks, balancing wall-clock. After all 8 finish, auto-aggregates per-suite and global rollups (success rate, mean steps for successes, mean steps overall).

Per-episode tracking

eval_libero_speed.py records the policy steps actually executed (excluding the num_steps_wait warmup), so:

  • successes terminate early when env.step returns done=True and report the true step count;
  • failures hit max_steps for the suite (220 / 280 / 300 / 520 / 400 for spatial / object / goal / 10 / 90 respectively).

The gap between mean_steps_success and mean_steps_all is a fast read-out for failure rate at a glance: mean_steps_all rises sharply when failures push the time-limit cap.

Output

results/libero_eval_<speed>x_<ts>/
  spatial_<speed>x.json
  goal_<speed>x.json
  object_<speed>x.json
  long_t0_1_<speed>x.json   long_t2_3_<speed>x.json   ...   long_t8_9_<speed>x.json
  logs/<...>.log
  videos/<...>/<rollout>.mp4

Each per-rank JSON contains a summary block (success rate, step statistics, summary line) and a per-episode list. The driver's final output is a per-suite rollup and a global line.

6. Documentation

  • VARIOUS_SPEED_README.md β€” added Β§2 (action-norm profiling) and Β§4 (multi-process build + cleaning/replay summary outputs); Β§8 notes the wandb-image-logging removal.
  • README_ablation.md (new) β€” full ablation workflow doc, including: the four sweep tables, build/norm dedup behavior, the 8-GPU evaluation workflow, and the soft-prompt implementation notes.
  • modification_summary.md (this file).
  • VARIOUS_SPEED_CURRENT_PIPELINE.md β€” deleted (superseded).

7. Bug fixes worth flagging

7.1 flow_control was being silently dropped

src/openpi/models_pytorch/preprocessing_pytorch.py constructed SimpleProcessedObservation without copying flow_control. Result: in PyTorch training, observation.flow_control was always None, so the action expert's modulation MLP always received zeros.

Implication: any prior PyTorch run with the (now-removed) pi05_libero_various_speed_all_flow_prompt config did not actually use modulation β€” it was equivalent to the text-only path. After the later refactor, the flow_control field was eliminated entirely; the modulation path now reads observation.speed directly. The new pi05_libero_various_speed_all_modulation config replaces it.

Fix: pass speed through to SimpleProcessedObservation (the flow_control field has since been removed).

7.2 JAX preprocess_observation did not pass speed

src/openpi/models/model.py:preprocess_observation (JAX path) didn't propagate the new speed field. Even though the JAX trainer is not used for the soft_prompt sweep, the field should round-trip cleanly to keep both backends consistent. Fixed.

7.3 --model.soft-prompt-speeds CLI syntax

scripts/run_ablations.py initially emitted --model.soft-prompt-speeds=0.75,1,1.25,1.5 (comma-joined). Tyro parses tuple[float, ...] from space-separated argv elements (matching --eval-speed-set style). Fixed: emit the flag and each value as separate argv elements.

7.4 Hardcoded WANDB API key in init_wandb

A live key was hardcoded and unconditionally written to os.environ["WANDB_API_KEY"], overriding each user's own credentials and attributing all runs to one account. Removed; wandb now uses its standard auth resolution order. The committed key is exposed in git history; revoke and rotate.

8. Behavioral changes you should be aware of

  • speed_integration defaults to "auto", which preserves legacy behavior of the existing 3 LIBERO speed configs in config.py. New ablation configs should set speed_integration explicitly.
  • The legacy pi05_libero_various_speed_all_flow_prompt and pi05_libero_various_speed_all_flow_noprompt configs were removed (replaced by pi05_libero_various_speed_all_modulation). The old configs were equivalent to text-only training due to the Β§7.1 bug, so any checkpoints from those names are not the modulation behavior they appeared to be.
  • wandb no longer logs sample camera images on first batch. If you relied on that for debugging data inputs, run scripts/visualize_speed_dataset.py separately.
  • Per-build meta/cleaning_summary.json and meta/replay_summary.json are new artifacts. Existing downstream consumers should ignore unknown meta files; verify if you have custom tooling that reads meta/*.json.
  • g2_coarse and g4_narrow speeds were updated mid-session:
    • g2_coarse: [0.5, 1.0, 2.0] β†’ [0.5, 1.0, 1.5, 2.0]
    • g4_narrow: [0.75, 1.0, 1.25] β†’ [0.75, 1.0, 1.25, 1.5]
    • g4_narrow now shares its dataset with the entire speed-integration sweep, so the runner builds it only once.

9. Files added / modified / deleted

# New
A  README_ablation.md
A  modification_summary.md
A  scripts/build_ablation_datasets.py
A  scripts/eval_libero_8gpu.sh
A  scripts/eval_libero_speed.py
A  scripts/profile_action_norms.py
A  scripts/run_ablations.py
A  tests/test_soft_prompt_smoke.py

# Modified
M  src/openpi/models/model.py
M  src/openpi/models/pi0.py
M  src/openpi/models/pi0_config.py
M  src/openpi/models_pytorch/pi0_pytorch.py
M  src/openpi/models_pytorch/preprocessing_pytorch.py
M  src/openpi/policies/libero_policy.py
M  src/openpi/transforms.py
M  src/openpi/training/config.py
M  src/various_speed/core.py
M  scripts/build_libero_speed_dataset.py
M  scripts/build_libero_speed_dataset_mp.py
M  scripts/compute_norm_stats.py
M  scripts/train_pytorch.py
M  VARIOUS_SPEED_README.md

# Deleted
D  VARIOUS_SPEED_CURRENT_PIPELINE.md

10. Verification still needed (manual, on GPU host)

  1. uv run pytest tests/test_soft_prompt_smoke.py -v β€” config validation and nearest-neighbor logic. CPU-only, fast.
  2. Single-batch forward pass of PI0Pytorch with soft_prompt enabled (see docstring on tests/test_soft_prompt_smoke.py::test_full_forward_pass_manual_only).
  3. uv run python scripts/run_ablations.py ... --dry-run β€” visually confirm the printed CLI commands look correct, especially that --model.soft-prompt-speeds 0.75 1 1.25 1.5 is space-separated.
  4. ~50-step smoke run of softprompt_p8 on 1 GPU to confirm the model trains without shape / mask / dtype errors.
  5. profile_action_norms.py on the source dataset, then update --clean-transl-eps / --clean-rot-eps in build commands to data-driven values before kicking off the full sweep.
  6. eval_libero_8gpu.sh end-to-end with a single trained checkpoint, SPEED=1.0 on the in-distribution speed first to confirm 8-rank coordination works, then iterate over OOD speeds.