| --- |
| tags: |
| - robotics |
| - rlbench |
| - benchmarking |
| - label-validation |
| --- |
| |
| # VLAdaptorBench |
|
|
| This repository contains the benchmark setup, metric code, debug history, and validation artifacts for the proposed VLA + adaptor label study on `bimanual_take_tray_out_of_oven`. |
|
|
| This is still a label-validation repository, not a policy repository. No `pi0.5` integration is included here. |
|
|
| ## Current Status |
|
|
| The latest work behind this upload produced: |
|
|
| - `metric_iter30_full100_single_pass_full_logging_fixed_templates_merged` |
| - merged 100-episode dense/fuller-logging result tree from the single-pass fixed-template run |
|
|
| The current Hub upload includes: |
|
|
| - `artifacts/results/metric_iter31_sample10_all_metrics_verify/` |
| - compact 10-episode verification subset with `all_metrics` GIFs only |
| - the fast `all_metrics`-only render path in: |
| - `code/scripts/render_oven_metric_frame.py` |
| - `code/scripts/render_oven_metric_gifs.py` |
|
|
| The new sample verification bundle is meant to be the quickest remote sanity-check entry point. It includes the sampled dense/keyframe tables, per-episode metrics, fuller debug sidecars, fixed templates, selection metadata, and one compact full-metrics GIF per sampled episode. |
|
|
| The earlier `metric_iter29_ep0_single_pass_full_logging_fixed_templates` validation pass for episode 0 remains the detailed single-episode reference for the fuller debug logging and the debug-aware GIF renderer. |
|
|
| That run keeps the trusted `iter24` template bundle fixed, adds the fuller dense/debug logging in a single pass, and regenerates the episode-0 visualization suite from the richer artifact. It is the current reference for: |
|
|
| - the `episode0.debug.jsonl` sidecar with per-frame `p_pre` and `p_ext` internals |
| - the single-pass dense CSV with fuller logged sub-metrics |
| - the updated `path_quality_focus` GIF that now exposes the `p_ext` milestone search, milestone scores, and planner outcomes directly in the visualization |
|
|
| The earlier `metric_iter24_*_door_contact_geom` reruns for episodes 0 and 1 remain the trusted baseline for the repaired oven metrics. |
|
|
| That rerun fixes the main simulator-state bugs that were still contaminating the oven metrics: |
|
|
| 1. The reveal-to-retrieve transition used to occur too late, effectively at grasp time. |
| 2. The visibility metric used to drop to zero around frame 232 even when the tray grasp region was clearly visible in `wrist_left`. |
| 3. `p_pre` stayed near zero until grasp. |
| 4. Extraction labels could flicker or drift because oracle rollouts were not restoring the simulator state exactly. |
| 5. The old dense runner's restore-heavy path could still bias later frames after an oracle call. |
|
|
| The current code addresses those issues by: |
|
|
| - decoding RLBench mask PNGs correctly before converting them back to simulator handles |
| - scoring visibility directly from mask-handle agreement instead of the old depth/z heuristic |
| - inferring tray mask handles from grasp-region projections |
| - deriving a late-window pregrasp approach template instead of accidentally including frame-8 arm poses |
| - adding explicit `pregrasp_progress`, `pregrasp_distance`, `pregrasp_speed`, and `phase_score` |
| - making the repair path batch frames sequentially per worker so late-frame rows do not drift |
| - snapshotting and restoring exact arm joints, gripper joints, and the full grasped-object subtree |
| - supporting and now preferring `--independent-replay` for the authoritative dense study |
| - tightening `y_pre` so it stays on once the retriever is clearly inside the pregrasp corridor |
| - retuning `phase_score` so it tracks the reveal-to-retrieve handoff instead of generic early motion |
| - recomputing intervention validity from isolated per-frame env replays instead of the old live-cache path |
| - sampling intervention states earlier in the reveal phase so pre-ready extract checks are not contaminated by borderline near-ready states |
| - confirming extraction feasibility with repeated planner checks inside the extract oracle so one lucky planner sample is less likely to flip a label |
|
|
| The old `iter4_*`, `iter6_*`, `iter19_*`, and `iter22_*` outputs are still useful historical checkpoints, but the current authoritative outputs are: |
|
|
| - `artifacts/results/metric_iter24_ep0_door_contact_geom/` |
| - `artifacts/results/metric_iter24_ep1_door_contact_geom/` |
| - `artifacts/results/metric_iter29_ep0_single_pass_full_logging_fixed_templates/` |
|
|
| The main new fix in `iter24` is the assisted-door contact scoring inside `p_pre`: |
|
|
| - the old `ignore_collisions=True` branch treated oven-door contact as name-whitelisted and only checked the final door angle change |
| - the new scorer traces door contacts step-by-step, estimates the local door-surface normal from simulator geometry, scores whether the retriever is sliding along the door or pushing it open, and penalizes direct head-on contact or door-closing motion |
| - this specifically removes the false closed-door `p_pre` spike in episode 0 around frames `43-56` without collapsing the later pregrasp rise once the door is actually opening |
|
|
| The current repo state should therefore be treated as the repaired benchmark snapshot with geometry-aware door assistance, not the final metric design. |
|
|
| Brief caveat: the current `y_ready` label still gates on low oven-door angular speed after extraction feasibility persists. In this task, the retriever arm can legitimately nudge the door while already committing to retrieval, so `y_ready` can still switch later than the true reveal-to-retrieve boundary. For the current oven benchmark, `y_ready` should therefore not be treated as a decisive validation metric or a trusted phase-switch target. |
|
|
| The oven task also has a highly structured reveal-to-retrieve handoff in the expert demos: both arms reposition, the revealer opens and clears the door, then the retriever commits. Because that phase pattern is so standardized, good results on this task are most useful as a task-specific smoke test or a "does the adaptor beat a base finetune here?" check, not as strong evidence of general reveal-and-retrieve reasoning. |
|
|
| ## What Is In This Upload |
|
|
| - `code/rr_label_study/` |
| - Core metric code. |
| - Dense replay, visibility scoring, pregrasp/extraction oracles, keyframe extraction, intervention checks, and summary metric computation. |
| - `code/scripts/` |
| - Study runners and helpers. |
| - `run_oven_label_study.py`: dense/keyframe study runner. |
| - `launch_parallel_oven_label_study.py`: multi-display worker launcher. |
| - `recompute_oven_pregrasp_parallel.py`: targeted dense rerun for repaired `p_pre` labels. |
| - `run_oven_pregrasp_batch.py`: sequential per-worker pregrasp recomputation helper. |
| - `refresh_saved_oven_study.py`: recompute keyframes, per-episode metrics, intervention stats, and summary JSONs from saved dense CSVs after metric-code changes. |
| - `run_oven_single_frame.py`: single-frame recomputation helper. |
| - `run_oven_frame_batch.py`: new sequential batch recomputation helper used to avoid late-frame drift. |
| - `repair_oven_episode_dense.py`: batched repair pass for suspicious dense rows. |
| - `render_oven_metric_frame.py`: per-frame visualization renderer. |
| - `render_oven_metric_gifs.py`: GIF renderer. |
| - The visualization renderer now accepts either legacy `templates.pkl` files or the newer authoritative `templates.json` bundles. |
| - `artifacts/results/` |
| - Full debug history, including stale runs and current validation outputs. |
| - `runtime_assets/` |
| - Archived runtime assets needed to recreate this setup on another machine. |
| - Includes the local oven-task dataset snapshot and the local `coppelia_sim` extraction used on this machine. |
| - `environment/` |
| - Machine snapshot, env export, pip freeze, setup helpers, and dataset notes. |
| - `external/` |
| - Local source snapshots of RLBench, PyRep, PerAct bimanual, and YARR used for this work. |
| - `MANIFEST.txt` |
| - Flat file listing of the upload contents. |
|
|
| ## Latest Metric Fixes |
|
|
| The latest code changes are in: |
|
|
| - `code/rr_label_study/oven_study.py` |
| - `code/scripts/recompute_oven_pregrasp_parallel.py` |
| - `code/scripts/run_oven_pregrasp_batch.py` |
| - `code/scripts/repair_oven_episode_dense.py` |
| - `code/scripts/run_oven_frame_batch.py` |
| - `code/scripts/render_oven_metric_frame.py` |
|
|
| The important changes are: |
|
|
| ### 1. Visibility metric repair |
|
|
| - `_load_mask()` now rescales stored mask PNGs back to `[0, 1]` before calling `rgb_handles_to_mask`. |
| - Visibility is now computed by projecting grasp-region or whole-tray points into each camera and checking whether the decoded mask handle at the projected pixel matches the inferred tray handles. |
| - Template derivation now infers `mask_handle_ids` from reference frames near the actual pregrasp/grasp window. |
|
|
| This fixes the old failure where visibility dropped to zero even when the tray lip was visibly present in the wrist camera. |
|
|
| ### 2. Pregrasp/path metric repair |
|
|
| - Template extraction now detects the pregrasp approach onset in a bounded late window before grasp instead of taking the first small negative slope in the entire episode. |
| - The current template approach frames for episode 0 are now: |
| - `177, 187, 197, 208, 218, 229, 232` |
| - `p_pre` now uses the last few approach templates plus explicit geometric progress toward the pregrasp pose instead of only brittle planner success. |
| - `y_pre` now treats "already inside the pregrasp corridor" as success, which is appropriate for this oracle study. |
| - The assisted pregrasp branch no longer treats oven-door collisions as a binary whitelist: |
| - it traces per-step door contacts under `ignore_collisions=True` |
| - estimates a local door-surface normal from the contacted simulator shape |
| - rewards tangential or door-opening contact |
| - penalizes head-on or door-closing contact |
| - requires a minimum geometry-aware door-contact quality before assisted `p_pre` credit is given |
|
|
| ### 3. Replay/repair correctness |
|
|
| - The old isolated repair path replayed every suspicious frame from a fresh reset, which could corrupt late rows. |
| - The new helper `run_oven_frame_batch.py` computes frame rows sequentially inside a single env per worker. |
| - `repair_oven_episode_dense.py` now distributes frame batches, not individual frames, across displays. |
| - `SimulatorSnapshot` now restores: |
| - arm joint trees and explicit joint positions |
| - gripper joint trees and explicit joint positions |
| - the full subtree under any grasped object |
| - grasp attachments with the original release parent |
| - `ReplayCache` now keeps retrying stable grasp attachment while the demo gripper remains closed. |
|
|
| This fixed the major replay bug where post-oracle restores could leave the arm, gripper, or grasped tray in a subtly different state than the true demo frame. |
|
|
| ### 4. Earlier phase signal |
|
|
| - The code now records: |
| - `pregrasp_progress` |
| - `pregrasp_distance` |
| - `pregrasp_speed` |
| - `phase_score` |
| - `phase_score` is now dominated by actual approach progress and `p_pre`, with a stricter threshold (`0.5`) so it no longer flips during the early reveal phase. |
| - `y_retrieve` is still oracle-like and monotone, but the metric side now has a cleaner approach-sensitive signal for early switching. |
|
|
| ### 5. Independent replay |
|
|
| - `run_oven_label_study.py` already exposed `--independent-replay`. |
| - `launch_parallel_oven_label_study.py` now passes that flag through to worker runs. |
| - For the current oven study, independent replay is the trustworthy dense mode because it avoids cross-frame contamination from oracle rollouts. |
|
|
| ### 6. Intervention validity repair |
|
|
| - The old intervention summary reused the dense-study replay cache, which could still corrupt post-ready extract checks. |
| - `_interventional_validity()` now evaluates each sampled intervention state from a fresh env/replay instance. |
| - `refresh_saved_oven_study.py` now supports `--dataset-root` so intervention metrics can be recomputed instead of copied forward from stale JSON. |
| - The refined intervention protocol now samples pre-ready states at `ready_onset-20` and `ready_onset-10` instead of `ready_onset-10` and `ready_onset-5`, which avoids counting borderline almost-ready states as generic reveal-phase interventions. |
|
|
| ### 7. Extraction-oracle hardening |
|
|
| - `_extract_score_and_success()` now uses repeated planner checks before marking a milestone as feasible. |
| - The current configuration is intentionally modest: |
| - `DEFAULT_PLAN_ATTEMPTS = 2` |
| - `DEFAULT_PLAN_MIN_SUCCESSES = 2` |
| - This only hardens the extraction oracle, not the pregrasp score, so the dense study remains tractable while the noisy pre-ready extract successes are suppressed. |
|
|
| ## Latest Validated Artifacts |
|
|
| The current trustworthy artifacts are: |
|
|
| - `artifacts/results/metric_iter24_ep0_door_contact_geom/episode0.dense.csv` |
| - `artifacts/results/metric_iter24_ep0_door_contact_geom/episode0.keyframes.csv` |
| - `artifacts/results/metric_iter24_ep0_door_contact_geom/episode0.metrics.json` |
| - `artifacts/results/metric_iter24_ep0_door_contact_geom/summary.json` |
| - `artifacts/results/metric_iter24_ep0_door_contact_geom/visualizations/episode0_all_metrics.gif` |
| - `artifacts/results/metric_iter24_ep0_door_contact_geom/visualizations/episode0_visibility_focus.gif` |
| - `artifacts/results/metric_iter24_ep0_door_contact_geom/visualizations/episode0_path_quality_focus.gif` |
| - `artifacts/results/metric_iter24_ep1_door_contact_geom/episode1.dense.csv` |
| - `artifacts/results/metric_iter24_ep1_door_contact_geom/episode1.keyframes.csv` |
| - `artifacts/results/metric_iter24_ep1_door_contact_geom/episode1.metrics.json` |
| - `artifacts/results/metric_iter24_ep1_door_contact_geom/summary.json` |
| - `artifacts/results/metric_iter24_ep1_door_contact_geom/visualizations/episode1_all_metrics.gif` |
| - `artifacts/results/metric_iter24_ep1_door_contact_geom/visualizations/episode1_visibility_focus.gif` |
| - `artifacts/results/metric_iter24_ep1_door_contact_geom/visualizations/episode1_path_quality_focus.gif` |
|
|
| - `artifacts/results/oven_episode0_iter4_templates/templates.json` |
| - `artifacts/results/oven_episode0_iter4_templates/templates.pkl` |
| - `artifacts/results/oven_episode0_iter4_batch/iter4_batch_comparison.csv` |
| - `artifacts/results/oven_episode0_iter4_batch/frames/` |
| - `artifacts/results/oven_episode0_iter4_clean/iter4_targeted_comparison.csv` |
| - `artifacts/results/oven_episode0_iter4_dense_geom_170_234.csv` |
| - `artifacts/results/oven_episode0_iter6_visual_checks/boundary_rgb_contact_sheet.png` |
| - `artifacts/results/oven_episode0_iter6_independent_full/episode0.dense.csv` |
| - `artifacts/results/oven_episode0_iter6_independent_full/episode0.keyframes.csv` |
| - `artifacts/results/oven_episode0_iter6_independent_full/episode0.metrics.json` |
| - `artifacts/results/oven_episode0_iter6_independent_full/summary.json` |
| - `artifacts/results/oven_episode0_iter6_visual_checks/early_visibility_contact_sheet.png` |
| - `artifacts/results/oven_episode0_iter16_gif_suite/episode0.dense.csv` |
| - `artifacts/results/oven_episode0_iter16_gif_suite/episode0.metrics.json` |
| - `artifacts/results/oven_episode0_iter16_gif_suite/visualizations/episode0_all_metrics.gif` |
| - `artifacts/results/oven_episode0_iter16_gif_suite/visualizations/episode0_visibility_focus.gif` |
| - `artifacts/results/oven_episode0_iter16_gif_suite/visualizations/episode0_path_quality_focus.gif` |
| - `artifacts/results/manual_metric_checks/episode0_frame210_visibility.png` |
| - `artifacts/results/manual_metric_checks/episode0_frame232_visibility.png` |
| - `artifacts/results/manual_metric_checks/episode0_frame210_path.png` |
| - `artifacts/results/manual_metric_checks/episode6_frame230_path.png` |
| - `artifacts/results/iter12_parallel_smoke_8ep_refined/parallel_summary.json` |
| - `artifacts/results/iter12_parallel_smoke_8ep_refined/parallel_workers.json` |
|
|
| The `iter6_independent_full` CSVs and JSON summaries have been refreshed with the latest `phase_score` logic via `code/scripts/refresh_saved_oven_study.py`. |
|
|
| ## Key Verified Findings |
|
|
| From the current independent-replay validation on episode 0: |
|
|
| - Visibility over the dense 170-234 window is clean: |
| - min `three_view_visibility = 1.0` |
| - min `full_view_visibility = 1.0` |
| - Pregrasp progress now rises well before grasp and stays predictive: |
| - frame `210`: `pregrasp_progress ≈ 0.451`, `p_pre ≈ 0.185`, `y_pre = 0` |
| - frame `215`: `pregrasp_progress ≈ 0.568`, `p_pre ≈ 0.375`, `y_pre = 1` |
| - frame `220`: `pregrasp_progress ≈ 0.702`, `p_pre ≈ 0.496`, `y_pre = 1` |
| - frame `225`: `pregrasp_progress ≈ 0.847`, `p_pre ≈ 0.559`, `y_pre = 1` |
| - frame `230`: `pregrasp_progress ≈ 0.950`, `p_pre ≈ 0.654`, `y_pre = 1` |
| - Extraction feasibility is now separated from pregrasp: |
| - frame `230`: `p_ext ≈ 0.0007`, `y_ext = 0` |
| - frame `232`: `p_ext = 1.0`, `y_ext = 1` |
| - frame `234`: `p_ext = 1.0`, `y_ext = 1` |
| - In the refreshed full independent episode-0 run: |
| - `ppre_cross_frame = 216` |
| - `pext_cross_frame = 232` |
| - `phase_cross_frame = 214` |
| - `retrieve_cross_frame = 215` |
| - `ready_cross_frame = 234` |
| - `single_switch_rate = 1.0` |
| - `reversion_rate = 0.0` |
| - `auroc_ppre_ypre ≈ 0.761` |
| - `auprc_ppre_ypre ≈ 0.903` |
| - `auroc_pext_yext = 1.0` |
| - `auprc_pext_yext = 1.0` |
| - `auroc_phase_yretrieve = 1.0` |
| - `auprc_phase_yretrieve = 1.0` |
| - `f1_phase_yretrieve ≈ 0.996` |
| - `auroc_phase_yready ≈ 0.998` |
| - `f1_phase_yready ≈ 0.905` |
| - In the refreshed isolated intervention check on episode 0: |
| - pre-ready `open_more` increases `p_ext` on `2/2` sampled states |
| - pre-ready `extract` succeeds on `0/2` |
| - post-ready `extract` succeeds on `2/2` |
| - post-ready `open_more` and `hold_open` both have low marginal gain on `2/2` |
| - The refreshed phase columns now place: |
| - `first phase_switch` at frame `214` |
| - `first y_retrieve` at frame `215` |
| - `first y_ready` at frame `234` |
| - The refined 8-episode independent-replay smoke in `artifacts/results/iter12_parallel_smoke_8ep_refined/` shows: |
| - `single_switch_rate = 1.0` |
| - `reversion_rate = 0.0` |
| - mean `auroc_ppre_ypre ≈ 0.809` |
| - mean `auprc_ppre_ypre ≈ 0.924` |
| - mean `auroc_pext_yext = 1.0` |
| - mean `auprc_pext_yext = 1.0` |
| - mean `f1_phase_yretrieve ≈ 0.996` |
| - mean `f1_phase_yready ≈ 0.906` |
| - mean dense boundary error to `y_retrieve ≈ 0.88` frames |
| - mean pre-ready extract success `= 0.0/2.0` |
| - mean pre-ready wait extract success `= 0.0/2.0` |
| - mean post-ready extract success `≈ 1.625/2.0` |
| - The main remaining limitation on this oven task is not a broken metric but task structure: |
| - the grasp-region visibility metric is visually faithful but only weakly predictive because the tray lip is already visible early in many demos |
| - time remains a very strong trivial baseline for `y_ext` on expert demos |
| - `open_more` improves `p_ext` mainly near the reveal/retrieve boundary, not uniformly throughout the whole pre-ready window |
|
|
| See: |
|
|
| - `artifacts/results/oven_episode0_iter4_batch/iter4_batch_comparison.csv` |
| - `artifacts/results/oven_episode0_iter4_dense_geom_170_234.csv` |
| - `artifacts/results/oven_episode0_iter6_visual_checks/boundary_rgb_contact_sheet.png` |
| - `artifacts/results/oven_episode0_iter6_independent_full/episode0.dense.csv` |
| - `artifacts/results/oven_episode0_iter6_independent_full/episode0.metrics.json` |
| - `artifacts/results/manual_metric_checks/episode0_frame210_visibility.png` |
| - `artifacts/results/manual_metric_checks/episode0_frame232_visibility.png` |
| - `artifacts/results/manual_metric_checks/episode0_frame210_path.png` |
| - `artifacts/results/manual_metric_checks/episode6_frame230_path.png` |
| - `artifacts/results/iter12_parallel_smoke_8ep_refined/parallel_summary.json` |
|
|
| ## Artifact Guide |
|
|
| ### Current artifacts |
|
|
| - `oven_episode0_iter3_templates/` |
| - First regenerated template bundle after the mask/approach fixes. |
| - `oven_episode0_iter4_templates/` |
| - Current template bundle with the corrected late-window approach onset. |
| - `oven_episode0_iter4_clean/` |
| - Isolated targeted frame checks used while diagnosing the old per-frame repair drift. |
| - `oven_episode0_iter4_batch/` |
| - Current batched sequential repair validation. |
| - `oven_episode0_iter4_dense_geom_170_234.csv` |
| - Dense sequential geometry and visibility sweep across the reveal-to-retrieve boundary. |
|
|
| ### Historical artifacts |
|
|
| - `oven_episode0_repaired_v1/` |
| - Useful historical reference, but not the current authoritative artifact. |
| - It still contains the old late transition and old visibility/path issues. |
| - `oven_episode0_full*/`, `oven_to240_*/`, `oven_episode0_independent_v*/` |
| - Debugging history from earlier iterations. |
| - `parallel_smoke_2x10/` |
| - Xvfb/worker parallelization smoke test. |
| - `oven_smoke_*` |
| - Early smoke runs. |
|
|
| ## Repository Map |
|
|
| Relevant entry points: |
|
|
| - `code/rr_label_study/oven_study.py` |
| - `code/scripts/run_oven_label_study.py` |
| - `code/scripts/launch_parallel_oven_label_study.py` |
| - `code/scripts/run_oven_single_frame.py` |
| - `code/scripts/run_oven_frame_batch.py` |
| - `code/scripts/repair_oven_episode_dense.py` |
| - `code/scripts/render_oven_metric_frame.py` |
| - `code/scripts/render_oven_metric_gifs.py` |
|
|
| Relevant current artifacts: |
|
|
| - `artifacts/results/oven_episode0_iter4_templates/templates.json` |
| - `artifacts/results/oven_episode0_iter4_batch/iter4_batch_comparison.csv` |
| - `artifacts/results/oven_episode0_iter4_dense_geom_170_234.csv` |
| - `artifacts/results/oven_episode0_iter6_independent_full/episode0.dense.csv` |
| - `artifacts/results/oven_episode0_iter6_independent_full/summary.json` |
| - `artifacts/results/oven_episode0_iter6_visual_checks/boundary_rgb_contact_sheet.png` |
| - `artifacts/results/oven_episode0_iter6_visual_checks/early_visibility_contact_sheet.png` |
| - `artifacts/results/oven_episode0_iter16_gif_suite/episode0.dense.csv` |
| - `artifacts/results/oven_episode0_iter16_gif_suite/episode0.metrics.json` |
| - `artifacts/results/oven_episode0_iter16_gif_suite/visualizations/episode0_all_metrics.gif` |
| - `artifacts/results/oven_episode0_iter16_gif_suite/visualizations/episode0_visibility_focus.gif` |
| - `artifacts/results/oven_episode0_iter16_gif_suite/visualizations/episode0_path_quality_focus.gif` |
|
|
| ## Environment |
|
|
| This was run on: |
|
|
| - Ubuntu `22.04.5` |
| - Kernel `6.8.0-65-generic` |
| - `96` CPU cores visible |
| - `503 GiB` RAM visible |
| - `NVIDIA A40` |
|
|
| See: |
|
|
| - `environment/system_info.txt` |
| - `environment/repo_revisions.txt` |
| - `environment/conda_env_rlbench.yml` |
| - `environment/pip_freeze_rlbench.txt` |
|
|
| ## Upstream Repos Used |
|
|
| Exact revisions are recorded in `environment/repo_revisions.txt`. |
|
|
| The local run used: |
|
|
| - `markusgrotz/RLBench` |
| - `markusgrotz/PyRep` |
| - `markusgrotz/peract_bimanual` |
| - `markusgrotz/YARR` |
|
|
| Those source snapshots are included under `external/`. |
|
|
| ## Reproducing On The Same Hardware Class |
|
|
| 1. Read `environment/dataset_notes.txt`. |
| 2. Run `environment/setup_same_hardware.sh /workspace`. |
| 3. Source `environment/activate_rlbench_runtime.sh /workspace`. |
| 4. Run the dense study: |
|
|
| ```bash |
| python /workspace/VLAdaptorBench_upload/code/scripts/run_oven_label_study.py \ |
| --dataset-root /workspace/data/bimanual_take_tray_out_of_oven_train_128 \ |
| --result-dir /workspace/tmp_run \ |
| --max-episodes 1 \ |
| --checkpoint-stride 16 \ |
| --template-episode-index 0 \ |
| --independent-replay |
| ``` |
|
|
| 5. If you want to repair suspicious frames in parallel with the new batched path: |
|
|
| ```bash |
| python /workspace/VLAdaptorBench_upload/code/scripts/repair_oven_episode_dense.py \ |
| --dataset-root /workspace/data/bimanual_take_tray_out_of_oven_train_128 \ |
| --episode-dir /workspace/data/bimanual_take_tray_out_of_oven_train_128/all_variations/episodes/episode0 \ |
| --input-dense-csv /workspace/tmp_run/episode0.dense.csv \ |
| --output-dir /workspace/tmp_run_repaired \ |
| --checkpoint-stride 16 \ |
| --num-workers 4 \ |
| --base-display 170 |
| ``` |
|
|
| ## Important Note |
|
|
| The full 100-episode independent-replay run is not yet the authoritative artifact in this upload. The current repository state documents the repaired metric code, the exact snapshot/restore fixes, and the episode-0 independent validation that is required before scaling to the full study. |
|
|
| ## Dataset Note |
|
|
| The RLBench demonstration dataset itself is not re-uploaded here. This repository contains the study code and generated artifacts only. The expected dataset path is documented in `environment/dataset_notes.txt`. |
|
|
| CoppeliaSim binaries are not included. The setup helpers expect a local extraction at `/workspace/coppelia_sim`. |
|
|