Reproduction Guide (Main LIBERO Table)
This guide targets reproduction of the paper's main LIBERO result table:
- LIBERO-Spatial
- LIBERO-Object
- LIBERO-Goal
- LIBERO-Long
- Average across the four suites
1) Scope and expected output
- Reproduce EVOLVE-VLA numbers for the main table (not all ablations).
- Log per-suite success and aggregated average.
- Record run metadata for repeatability.
2) Required environments
evolve-vlaconda env for training/eval.vlacconda env for reward service.
See INSTALLATION.md for full setup.
3) Required environment variables
Set these before running training scripts:
export EVOLVE_SFT_CHECKPOINT=/path/to/sft/checkpoint
export EVOLVE_OUTPUT_DIR=/path/to/output/checkpoints
export EVOLVE_ALIGN_JSON=/path/to/align.json
export EVOLVE_REWARD_BACKEND=vlac
Set VLAC checkpoint path before launching service:
export VLAC_CKPT_PATH=/path/to/EVOLVE-VLA/checkpoints/VLAC
Reference-frame roots for LIBERO suites (if required by rollout config):
export EVOLVE_LIBERO_REF_LIBERO_10=/path/to/libero_10/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_OBJECT=/path/to/libero_object/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_OBJECT_WRIST=/path/to/libero_object_wrist/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_SPATIAL=/path/to/libero_spatial/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_SPATIAL_WRIST=/path/to/libero_spatial_wrist/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_GOAL=/path/to/libero_goal/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_GOAL_WRIST=/path/to/libero_goal_wrist/reference_frames
Notes:
- Non-wrist runs only require non-wrist roots (
...LIBERO_10,...OBJECT,...SPATIAL,...GOAL). - Wrist roots are required only when using wrist-mode workers (
interval-wrist/interval-wrist-only). - For VLAC, reference frames are strongly recommended for performance.
Optional:
export WANDB_API_KEY=...
export EVOLVE_RAY_ADDRESS=ray://127.0.0.1:10001
export EVOLVE_NCCL_SOCKET_IFNAME=eth0
4) Start reward service
conda activate vlac
cd /path/to/EVOLVE-VLA/Release
python reward_model/launch_vlac_servers.py --base-port 8111
Health-check example:
python scripts/check_vlac_services.py --urls http://127.0.0.1:8111,http://127.0.0.1:8112
5) Run training/eval entry scripts
From Release/:
conda activate evolve-vla
# LIBERO-Long
python scripts/train_libero_10-sft_full-ttt.py
# 1-shot setting script (often used for spatial-oriented experiments)
python scripts/train_libero_10-sft_1shot-ttt.py
# Zero-shot transfer path
python scripts/train_libero_object-0shot-ttt.py
# Optional wrist-view finetune variant
python scripts/finetune_libero_object-0shot-ttt_with_wrist.py
If you are assembling strict main-table runs, align each suite's script/config with paper settings and record them in the log template below.
5.1) Reference-frame preparation (recommended for VLAC)
Reference frames are generated from expert demos and materially improve VLAC quality.
Fast path (one command, all suites):
python scripts/prepare_reference_frames_pipeline.py \
--raw-datasets-root /path/to/libero/raw \
--working-root /path/to/libero/processed \
--reference-root /path/to/libero/reference_frames \
--include-wrist \
--overwrite
Use --skip-regen if you want to use downloaded low-resolution datasets directly.
Detailed workflow (release-contained scripts):
Optional but recommended: regenerate LIBERO demos at higher resolution
- Script:
scripts/regenerate_libero_dataset.py - Purpose: generate cleaner 256x256 trajectories (community versions may be low-resolution).
- If skipped, low-resolution source can still be used with expected performance drop.
- Example:
python scripts/regenerate_libero_dataset.py \ --libero_task_suite libero_10 \ --libero_raw_data_dir /path/to/libero_10_raw \ --libero_target_dir /path/to/libero_10_regen
- Script:
Select/export expert demos to reference-frame folders
- Script:
scripts/prepare_expert_demo.py - Behavior: picks shortest successful demo and exports per-frame PNGs.
- Default is agent view (
--frame_key agentview_rgb). - For wrist-view references, use
--frame_key eye_in_hand_rgb. - Example (agent view):
python scripts/prepare_expert_demo.py \ --libero_task_suite libero_10 \ --libero_raw_data_dir /path/to/libero_10_regen \ --output_dir /path/to/reference_frames/agentview \ --overwrite - Example (wrist view):
python scripts/prepare_expert_demo.py \ --libero_task_suite libero_10 \ --libero_raw_data_dir /path/to/libero_10_regen \ --output_dir /path/to/reference_frames/wrist \ --frame_key eye_in_hand_rgb \ --overwrite
- Script:
Set exported directories as
EVOLVE_LIBERO_REF_*roots- Set non-wrist roots for standard settings.
- Set wrist roots when running wrist-view worker modes.
6) Logging template (required)
For each run, record:
- date/time
- git commit hash (if repo initialized)
- script name
- backend (
vlacby default) - suite name
- seed
- checkpoint path
- success rate
Summary table template:
| Suite | Seed | Success (%) | Notes |
|---|---|---|---|
| Spatial | |||
| Object | |||
| Goal | |||
| Long |
Average:
Avg = (Spatial + Object + Goal + Long) / 4
7) Tolerance reporting
After running multiple seeds, report:
- mean and std per suite
- mean and std for average score
- absolute difference vs paper target
Use project-agreed tolerance (for example, +/- X percentage points) once baseline variance is measured.
8) Temporary cross-machine validation plan
For internal transfer and bring-up validation on another server, use:
docs/TEMP_OTHER_MACHINE_TEST_PLAN.md
This is a temporary internal checklist file and can be removed after validation/reproduction sign-off.