# Reproduction Guide (Main LIBERO Table) This guide targets reproduction of the paper's main LIBERO result table: - LIBERO-Spatial - LIBERO-Object - LIBERO-Goal - LIBERO-Long - Average across the four suites ## 1) Scope and expected output - Reproduce EVOLVE-VLA numbers for the main table (not all ablations). - Log per-suite success and aggregated average. - Record run metadata for repeatability. ## 2) Required environments - `evolve-vla` conda env for training/eval. - `vlac` conda env for reward service. See `INSTALLATION.md` for full setup. ## 3) Required environment variables Set these before running training scripts: ```bash export EVOLVE_SFT_CHECKPOINT=/path/to/sft/checkpoint export EVOLVE_OUTPUT_DIR=/path/to/output/checkpoints export EVOLVE_ALIGN_JSON=/path/to/align.json export EVOLVE_REWARD_BACKEND=vlac ``` Set VLAC checkpoint path before launching service: ```bash export VLAC_CKPT_PATH=/path/to/EVOLVE-VLA/checkpoints/VLAC ``` Reference-frame roots for LIBERO suites (if required by rollout config): ```bash export EVOLVE_LIBERO_REF_LIBERO_10=/path/to/libero_10/reference_frames export EVOLVE_LIBERO_REF_LIBERO_OBJECT=/path/to/libero_object/reference_frames export EVOLVE_LIBERO_REF_LIBERO_OBJECT_WRIST=/path/to/libero_object_wrist/reference_frames export EVOLVE_LIBERO_REF_LIBERO_SPATIAL=/path/to/libero_spatial/reference_frames export EVOLVE_LIBERO_REF_LIBERO_SPATIAL_WRIST=/path/to/libero_spatial_wrist/reference_frames export EVOLVE_LIBERO_REF_LIBERO_GOAL=/path/to/libero_goal/reference_frames export EVOLVE_LIBERO_REF_LIBERO_GOAL_WRIST=/path/to/libero_goal_wrist/reference_frames ``` Notes: - Non-wrist runs only require non-wrist roots (`...LIBERO_10`, `...OBJECT`, `...SPATIAL`, `...GOAL`). - Wrist roots are required only when using wrist-mode workers (`interval-wrist` / `interval-wrist-only`). - For VLAC, reference frames are strongly recommended for performance. Optional: ```bash export WANDB_API_KEY=... export EVOLVE_RAY_ADDRESS=ray://127.0.0.1:10001 export EVOLVE_NCCL_SOCKET_IFNAME=eth0 ``` ## 4) Start reward service ```bash conda activate vlac cd /path/to/EVOLVE-VLA/Release python reward_model/launch_vlac_servers.py --base-port 8111 ``` Health-check example: ```bash python scripts/check_vlac_services.py --urls http://127.0.0.1:8111,http://127.0.0.1:8112 ``` ## 5) Run training/eval entry scripts From `Release/`: ```bash conda activate evolve-vla # LIBERO-Long python scripts/train_libero_10-sft_full-ttt.py # 1-shot setting script (often used for spatial-oriented experiments) python scripts/train_libero_10-sft_1shot-ttt.py # Zero-shot transfer path python scripts/train_libero_object-0shot-ttt.py # Optional wrist-view finetune variant python scripts/finetune_libero_object-0shot-ttt_with_wrist.py ``` If you are assembling strict main-table runs, align each suite's script/config with paper settings and record them in the log template below. ## 5.1) Reference-frame preparation (recommended for VLAC) Reference frames are generated from expert demos and materially improve VLAC quality. Fast path (one command, all suites): ```bash python scripts/prepare_reference_frames_pipeline.py \ --raw-datasets-root /path/to/libero/raw \ --working-root /path/to/libero/processed \ --reference-root /path/to/libero/reference_frames \ --include-wrist \ --overwrite ``` Use `--skip-regen` if you want to use downloaded low-resolution datasets directly. Detailed workflow (release-contained scripts): 1. **Optional but recommended: regenerate LIBERO demos at higher resolution** - Script: `scripts/regenerate_libero_dataset.py` - Purpose: generate cleaner 256x256 trajectories (community versions may be low-resolution). - If skipped, low-resolution source can still be used with expected performance drop. - Example: ```bash python scripts/regenerate_libero_dataset.py \ --libero_task_suite libero_10 \ --libero_raw_data_dir /path/to/libero_10_raw \ --libero_target_dir /path/to/libero_10_regen ``` 2. **Select/export expert demos to reference-frame folders** - Script: `scripts/prepare_expert_demo.py` - Behavior: picks shortest successful demo and exports per-frame PNGs. - Default is agent view (`--frame_key agentview_rgb`). - For wrist-view references, use `--frame_key eye_in_hand_rgb`. - Example (agent view): ```bash python scripts/prepare_expert_demo.py \ --libero_task_suite libero_10 \ --libero_raw_data_dir /path/to/libero_10_regen \ --output_dir /path/to/reference_frames/agentview \ --overwrite ``` - Example (wrist view): ```bash python scripts/prepare_expert_demo.py \ --libero_task_suite libero_10 \ --libero_raw_data_dir /path/to/libero_10_regen \ --output_dir /path/to/reference_frames/wrist \ --frame_key eye_in_hand_rgb \ --overwrite ``` 3. **Set exported directories as `EVOLVE_LIBERO_REF_*` roots** - Set non-wrist roots for standard settings. - Set wrist roots when running wrist-view worker modes. ## 6) Logging template (required) For each run, record: - date/time - git commit hash (if repo initialized) - script name - backend (`vlac` by default) - suite name - seed - checkpoint path - success rate Summary table template: | Suite | Seed | Success (%) | Notes | |---|---:|---:|---| | Spatial | | | | | Object | | | | | Goal | | | | | Long | | | | Average: ```text Avg = (Spatial + Object + Goal + Long) / 4 ``` ## 7) Tolerance reporting After running multiple seeds, report: - mean and std per suite - mean and std for average score - absolute difference vs paper target Use project-agreed tolerance (for example, +/- X percentage points) once baseline variance is measured. ## 8) Temporary cross-machine validation plan For internal transfer and bring-up validation on another server, use: - `docs/TEMP_OTHER_MACHINE_TEST_PLAN.md` This is a temporary internal checklist file and can be removed after validation/reproduction sign-off.