TTI / Release /docs /REPRODUCTION.md

JosephBai

Upload folder using huggingface_hub

857c2e9 verified about 1 month ago

preview code

raw

history blame contribute delete

6.04 kB

Reproduction Guide (Main LIBERO Table)

This guide targets reproduction of the paper's main LIBERO result table:

LIBERO-Spatial
LIBERO-Object
LIBERO-Goal
LIBERO-Long
Average across the four suites

1) Scope and expected output

Reproduce EVOLVE-VLA numbers for the main table (not all ablations).
Log per-suite success and aggregated average.
Record run metadata for repeatability.

2) Required environments

evolve-vla conda env for training/eval.
vlac conda env for reward service.

See INSTALLATION.md for full setup.

3) Required environment variables

Set these before running training scripts:

export EVOLVE_SFT_CHECKPOINT=/path/to/sft/checkpoint
export EVOLVE_OUTPUT_DIR=/path/to/output/checkpoints
export EVOLVE_ALIGN_JSON=/path/to/align.json
export EVOLVE_REWARD_BACKEND=vlac

Set VLAC checkpoint path before launching service:

export VLAC_CKPT_PATH=/path/to/EVOLVE-VLA/checkpoints/VLAC

Reference-frame roots for LIBERO suites (if required by rollout config):

export EVOLVE_LIBERO_REF_LIBERO_10=/path/to/libero_10/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_OBJECT=/path/to/libero_object/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_OBJECT_WRIST=/path/to/libero_object_wrist/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_SPATIAL=/path/to/libero_spatial/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_SPATIAL_WRIST=/path/to/libero_spatial_wrist/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_GOAL=/path/to/libero_goal/reference_frames
export EVOLVE_LIBERO_REF_LIBERO_GOAL_WRIST=/path/to/libero_goal_wrist/reference_frames

Notes:

Non-wrist runs only require non-wrist roots (...LIBERO_10, ...OBJECT, ...SPATIAL, ...GOAL).
Wrist roots are required only when using wrist-mode workers (interval-wrist / interval-wrist-only).
For VLAC, reference frames are strongly recommended for performance.

Optional:

export WANDB_API_KEY=...
export EVOLVE_RAY_ADDRESS=ray://127.0.0.1:10001
export EVOLVE_NCCL_SOCKET_IFNAME=eth0

4) Start reward service

conda activate vlac
cd /path/to/EVOLVE-VLA/Release
python reward_model/launch_vlac_servers.py --base-port 8111

Health-check example:

python scripts/check_vlac_services.py --urls http://127.0.0.1:8111,http://127.0.0.1:8112

5) Run training/eval entry scripts

From Release/:

conda activate evolve-vla

# LIBERO-Long
python scripts/train_libero_10-sft_full-ttt.py

# 1-shot setting script (often used for spatial-oriented experiments)
python scripts/train_libero_10-sft_1shot-ttt.py

# Zero-shot transfer path
python scripts/train_libero_object-0shot-ttt.py

# Optional wrist-view finetune variant
python scripts/finetune_libero_object-0shot-ttt_with_wrist.py

If you are assembling strict main-table runs, align each suite's script/config with paper settings and record them in the log template below.

5.1) Reference-frame preparation (recommended for VLAC)

Reference frames are generated from expert demos and materially improve VLAC quality.

Fast path (one command, all suites):

python scripts/prepare_reference_frames_pipeline.py \
  --raw-datasets-root /path/to/libero/raw \
  --working-root /path/to/libero/processed \
  --reference-root /path/to/libero/reference_frames \
  --include-wrist \
  --overwrite

Use --skip-regen if you want to use downloaded low-resolution datasets directly.

Detailed workflow (release-contained scripts):

Optional but recommended: regenerate LIBERO demos at higher resolution
- Script: scripts/regenerate_libero_dataset.py
- Purpose: generate cleaner 256x256 trajectories (community versions may be low-resolution).
- If skipped, low-resolution source can still be used with expected performance drop.
- Example:
```
python scripts/regenerate_libero_dataset.py \
  --libero_task_suite libero_10 \
  --libero_raw_data_dir /path/to/libero_10_raw \
  --libero_target_dir /path/to/libero_10_regen
```

Select/export expert demos to reference-frame folders

Script: scripts/prepare_expert_demo.py
Behavior: picks shortest successful demo and exports per-frame PNGs.
Default is agent view (--frame_key agentview_rgb).
For wrist-view references, use --frame_key eye_in_hand_rgb.

Example (agent view):

python scripts/prepare_expert_demo.py \
  --libero_task_suite libero_10 \
  --libero_raw_data_dir /path/to/libero_10_regen \
  --output_dir /path/to/reference_frames/agentview \
  --overwrite

Example (wrist view):

python scripts/prepare_expert_demo.py \
  --libero_task_suite libero_10 \
  --libero_raw_data_dir /path/to/libero_10_regen \
  --output_dir /path/to/reference_frames/wrist \
  --frame_key eye_in_hand_rgb \
  --overwrite

Set exported directories as EVOLVE_LIBERO_REF_* roots
- Set non-wrist roots for standard settings.
- Set wrist roots when running wrist-view worker modes.

6) Logging template (required)

For each run, record:

date/time
git commit hash (if repo initialized)
script name
backend (vlac by default)
suite name
seed
checkpoint path
success rate

Summary table template:

Suite	Seed	Success (%)	Notes
Spatial
Object
Goal
Long

Average:

Avg = (Spatial + Object + Goal + Long) / 4

7) Tolerance reporting

After running multiple seeds, report:

mean and std per suite
mean and std for average score
absolute difference vs paper target

Use project-agreed tolerance (for example, +/- X percentage points) once baseline variance is measured.

8) Temporary cross-machine validation plan

For internal transfer and bring-up validation on another server, use:

docs/TEMP_OTHER_MACHINE_TEST_PLAN.md

This is a temporary internal checklist file and can be removed after validation/reproduction sign-off.