TTI / Release /docs /REPRODUCTION.md
JosephBai's picture
Upload folder using huggingface_hub
857c2e9 verified

Reproduction Guide (Main LIBERO Table)

This guide targets reproduction of the paper's main LIBERO result table:

  • LIBERO-Spatial
  • LIBERO-Object
  • LIBERO-Goal
  • LIBERO-Long
  • Average across the four suites

1) Scope and expected output

  • Reproduce EVOLVE-VLA numbers for the main table (not all ablations).
  • Log per-suite success and aggregated average.
  • Record run metadata for repeatability.

2) Required environments

  • evolve-vla conda env for training/eval.
  • vlac conda env for reward service.

See INSTALLATION.md for full setup.

3) Required environment variables

Set these before running training scripts:

export EVOLVE_SFT_CHECKPOINT=/path/to/sft/checkpoint
export EVOLVE_OUTPUT_DIR=/path/to/output/checkpoints
export EVOLVE_ALIGN_JSON=/path/to/align.json
export EVOLVE_REWARD_BACKEND=vlac

Set VLAC checkpoint path before launching service:

export VLAC_CKPT_PATH=/path/to/EVOLVE-VLA/checkpoints/VLAC

Reference-frame roots for LIBERO suites (if required by rollout config):

export EVOLVE_LIBERO_REF_LIBERO_10=/path/to/libero_10/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_OBJECT=/path/to/libero_object/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_OBJECT_WRIST=/path/to/libero_object_wrist/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_SPATIAL=/path/to/libero_spatial/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_SPATIAL_WRIST=/path/to/libero_spatial_wrist/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_GOAL=/path/to/libero_goal/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_GOAL_WRIST=/path/to/libero_goal_wrist/reference_frames

Notes:

  • Non-wrist runs only require non-wrist roots (...LIBERO_10, ...OBJECT, ...SPATIAL, ...GOAL).
  • Wrist roots are required only when using wrist-mode workers (interval-wrist / interval-wrist-only).
  • For VLAC, reference frames are strongly recommended for performance.

Optional:

export WANDB_API_KEY=...
export EVOLVE_RAY_ADDRESS=ray://127.0.0.1:10001
export EVOLVE_NCCL_SOCKET_IFNAME=eth0

4) Start reward service

conda activate vlac
cd /path/to/EVOLVE-VLA/Release
python reward_model/launch_vlac_servers.py --base-port 8111

Health-check example:

python scripts/check_vlac_services.py --urls http://127.0.0.1:8111,http://127.0.0.1:8112

5) Run training/eval entry scripts

From Release/:

conda activate evolve-vla

# LIBERO-Long
python scripts/train_libero_10-sft_full-ttt.py

# 1-shot setting script (often used for spatial-oriented experiments)
python scripts/train_libero_10-sft_1shot-ttt.py

# Zero-shot transfer path
python scripts/train_libero_object-0shot-ttt.py

# Optional wrist-view finetune variant
python scripts/finetune_libero_object-0shot-ttt_with_wrist.py

If you are assembling strict main-table runs, align each suite's script/config with paper settings and record them in the log template below.

5.1) Reference-frame preparation (recommended for VLAC)

Reference frames are generated from expert demos and materially improve VLAC quality.

Fast path (one command, all suites):

python scripts/prepare_reference_frames_pipeline.py \
  --raw-datasets-root /path/to/libero/raw \
  --working-root /path/to/libero/processed \
  --reference-root /path/to/libero/reference_frames \
  --include-wrist \
  --overwrite

Use --skip-regen if you want to use downloaded low-resolution datasets directly.

Detailed workflow (release-contained scripts):

  1. Optional but recommended: regenerate LIBERO demos at higher resolution

    • Script: scripts/regenerate_libero_dataset.py
    • Purpose: generate cleaner 256x256 trajectories (community versions may be low-resolution).
    • If skipped, low-resolution source can still be used with expected performance drop.
    • Example:
      python scripts/regenerate_libero_dataset.py \
        --libero_task_suite libero_10 \
        --libero_raw_data_dir /path/to/libero_10_raw \
        --libero_target_dir /path/to/libero_10_regen
      
  2. Select/export expert demos to reference-frame folders

    • Script: scripts/prepare_expert_demo.py
    • Behavior: picks shortest successful demo and exports per-frame PNGs.
    • Default is agent view (--frame_key agentview_rgb).
    • For wrist-view references, use --frame_key eye_in_hand_rgb.
    • Example (agent view):
      python scripts/prepare_expert_demo.py \
        --libero_task_suite libero_10 \
        --libero_raw_data_dir /path/to/libero_10_regen \
        --output_dir /path/to/reference_frames/agentview \
        --overwrite
      
    • Example (wrist view):
      python scripts/prepare_expert_demo.py \
        --libero_task_suite libero_10 \
        --libero_raw_data_dir /path/to/libero_10_regen \
        --output_dir /path/to/reference_frames/wrist \
        --frame_key eye_in_hand_rgb \
        --overwrite
      
  3. Set exported directories as EVOLVE_LIBERO_REF_* roots

    • Set non-wrist roots for standard settings.
    • Set wrist roots when running wrist-view worker modes.

6) Logging template (required)

For each run, record:

  • date/time
  • git commit hash (if repo initialized)
  • script name
  • backend (vlac by default)
  • suite name
  • seed
  • checkpoint path
  • success rate

Summary table template:

Suite Seed Success (%) Notes
Spatial
Object
Goal
Long

Average:

Avg = (Spatial + Object + Goal + Long) / 4

7) Tolerance reporting

After running multiple seeds, report:

  • mean and std per suite
  • mean and std for average score
  • absolute difference vs paper target

Use project-agreed tolerance (for example, +/- X percentage points) once baseline variance is measured.

8) Temporary cross-machine validation plan

For internal transfer and bring-up validation on another server, use:

  • docs/TEMP_OTHER_MACHINE_TEST_PLAN.md

This is a temporary internal checklist file and can be removed after validation/reproduction sign-off.