| # Reproduction Guide (Main LIBERO Table) |
|
|
| This guide targets reproduction of the paper's main LIBERO result table: |
|
|
| - LIBERO-Spatial |
| - LIBERO-Object |
| - LIBERO-Goal |
| - LIBERO-Long |
| - Average across the four suites |
|
|
| ## 1) Scope and expected output |
|
|
| - Reproduce EVOLVE-VLA numbers for the main table (not all ablations). |
| - Log per-suite success and aggregated average. |
| - Record run metadata for repeatability. |
|
|
| ## 2) Required environments |
|
|
| - `evolve-vla` conda env for training/eval. |
| - `vlac` conda env for reward service. |
|
|
| See `INSTALLATION.md` for full setup. |
|
|
| ## 3) Required environment variables |
|
|
| Set these before running training scripts: |
|
|
| ```bash |
| export EVOLVE_SFT_CHECKPOINT=/path/to/sft/checkpoint |
| export EVOLVE_OUTPUT_DIR=/path/to/output/checkpoints |
| export EVOLVE_ALIGN_JSON=/path/to/align.json |
| export EVOLVE_REWARD_BACKEND=vlac |
| ``` |
|
|
| Set VLAC checkpoint path before launching service: |
|
|
| ```bash |
| export VLAC_CKPT_PATH=/path/to/EVOLVE-VLA/checkpoints/VLAC |
| ``` |
|
|
| Reference-frame roots for LIBERO suites (if required by rollout config): |
|
|
| ```bash |
| export EVOLVE_LIBERO_REF_LIBERO_10=/path/to/libero_10/reference_frames |
| export EVOLVE_LIBERO_REF_LIBERO_OBJECT=/path/to/libero_object/reference_frames |
| export EVOLVE_LIBERO_REF_LIBERO_OBJECT_WRIST=/path/to/libero_object_wrist/reference_frames |
| export EVOLVE_LIBERO_REF_LIBERO_SPATIAL=/path/to/libero_spatial/reference_frames |
| export EVOLVE_LIBERO_REF_LIBERO_SPATIAL_WRIST=/path/to/libero_spatial_wrist/reference_frames |
| export EVOLVE_LIBERO_REF_LIBERO_GOAL=/path/to/libero_goal/reference_frames |
| export EVOLVE_LIBERO_REF_LIBERO_GOAL_WRIST=/path/to/libero_goal_wrist/reference_frames |
| ``` |
|
|
| Notes: |
|
|
| - Non-wrist runs only require non-wrist roots (`...LIBERO_10`, `...OBJECT`, `...SPATIAL`, `...GOAL`). |
| - Wrist roots are required only when using wrist-mode workers (`interval-wrist` / `interval-wrist-only`). |
| - For VLAC, reference frames are strongly recommended for performance. |
|
|
| Optional: |
|
|
| ```bash |
| export WANDB_API_KEY=... |
| export EVOLVE_RAY_ADDRESS=ray://127.0.0.1:10001 |
| export EVOLVE_NCCL_SOCKET_IFNAME=eth0 |
| ``` |
|
|
| ## 4) Start reward service |
|
|
| ```bash |
| conda activate vlac |
| cd /path/to/EVOLVE-VLA/Release |
| python reward_model/launch_vlac_servers.py --base-port 8111 |
| ``` |
|
|
| Health-check example: |
|
|
| ```bash |
| python scripts/check_vlac_services.py --urls http://127.0.0.1:8111,http://127.0.0.1:8112 |
| ``` |
|
|
| ## 5) Run training/eval entry scripts |
|
|
| From `Release/`: |
|
|
| ```bash |
| conda activate evolve-vla |
| |
| # LIBERO-Long |
| python scripts/train_libero_10-sft_full-ttt.py |
| |
| # 1-shot setting script (often used for spatial-oriented experiments) |
| python scripts/train_libero_10-sft_1shot-ttt.py |
| |
| # Zero-shot transfer path |
| python scripts/train_libero_object-0shot-ttt.py |
| |
| # Optional wrist-view finetune variant |
| python scripts/finetune_libero_object-0shot-ttt_with_wrist.py |
| ``` |
|
|
| If you are assembling strict main-table runs, align each suite's script/config with paper settings and record them in the log template below. |
|
|
| ## 5.1) Reference-frame preparation (recommended for VLAC) |
|
|
| Reference frames are generated from expert demos and materially improve VLAC quality. |
|
|
| Fast path (one command, all suites): |
|
|
| ```bash |
| python scripts/prepare_reference_frames_pipeline.py \ |
| --raw-datasets-root /path/to/libero/raw \ |
| --working-root /path/to/libero/processed \ |
| --reference-root /path/to/libero/reference_frames \ |
| --include-wrist \ |
| --overwrite |
| ``` |
|
|
| Use `--skip-regen` if you want to use downloaded low-resolution datasets directly. |
|
|
| Detailed workflow (release-contained scripts): |
|
|
| 1. **Optional but recommended: regenerate LIBERO demos at higher resolution** |
| - Script: `scripts/regenerate_libero_dataset.py` |
| - Purpose: generate cleaner 256x256 trajectories (community versions may be low-resolution). |
| - If skipped, low-resolution source can still be used with expected performance drop. |
| - Example: |
| ```bash |
| python scripts/regenerate_libero_dataset.py \ |
| --libero_task_suite libero_10 \ |
| --libero_raw_data_dir /path/to/libero_10_raw \ |
| --libero_target_dir /path/to/libero_10_regen |
| ``` |
| |
| 2. **Select/export expert demos to reference-frame folders** |
| - Script: `scripts/prepare_expert_demo.py` |
| - Behavior: picks shortest successful demo and exports per-frame PNGs. |
| - Default is agent view (`--frame_key agentview_rgb`). |
| - For wrist-view references, use `--frame_key eye_in_hand_rgb`. |
| - Example (agent view): |
| ```bash |
| python scripts/prepare_expert_demo.py \ |
| --libero_task_suite libero_10 \ |
| --libero_raw_data_dir /path/to/libero_10_regen \ |
| --output_dir /path/to/reference_frames/agentview \ |
| --overwrite |
| ``` |
| - Example (wrist view): |
| ```bash |
| python scripts/prepare_expert_demo.py \ |
| --libero_task_suite libero_10 \ |
| --libero_raw_data_dir /path/to/libero_10_regen \ |
| --output_dir /path/to/reference_frames/wrist \ |
| --frame_key eye_in_hand_rgb \ |
| --overwrite |
| ``` |
| |
| 3. **Set exported directories as `EVOLVE_LIBERO_REF_*` roots** |
| - Set non-wrist roots for standard settings. |
| - Set wrist roots when running wrist-view worker modes. |
| |
| ## 6) Logging template (required) |
| |
| For each run, record: |
| |
| - date/time |
| - git commit hash (if repo initialized) |
| - script name |
| - backend (`vlac` by default) |
| - suite name |
| - seed |
| - checkpoint path |
| - success rate |
| |
| Summary table template: |
| |
| | Suite | Seed | Success (%) | Notes | |
| |---|---:|---:|---| |
| | Spatial | | | | |
| | Object | | | | |
| | Goal | | | | |
| | Long | | | | |
| |
| Average: |
| |
| ```text |
| Avg = (Spatial + Object + Goal + Long) / 4 |
| ``` |
| |
| ## 7) Tolerance reporting |
| |
| After running multiple seeds, report: |
| |
| - mean and std per suite |
| - mean and std for average score |
| - absolute difference vs paper target |
| |
| Use project-agreed tolerance (for example, +/- X percentage points) once baseline variance is measured. |
| |
| ## 8) Temporary cross-machine validation plan |
| |
| For internal transfer and bring-up validation on another server, use: |
| |
| - `docs/TEMP_OTHER_MACHINE_TEST_PLAN.md` |
| |
| This is a temporary internal checklist file and can be removed after validation/reproduction sign-off. |
| |