TTI / Release /docs /REPRODUCTION.md

Upload folder using huggingface_hub

857c2e9 verified about 1 month ago

6.04 kB

	# Reproduction Guide (Main LIBERO Table)

	This guide targets reproduction of the paper's main LIBERO result table:

	- LIBERO-Spatial
	- LIBERO-Object
	- LIBERO-Goal
	- LIBERO-Long
	- Average across the four suites

	## 1) Scope and expected output

	- Reproduce EVOLVE-VLA numbers for the main table (not all ablations).
	- Log per-suite success and aggregated average.
	- Record run metadata for repeatability.

	## 2) Required environments

	- `evolve-vla` conda env for training/eval.
	- `vlac` conda env for reward service.

	See `INSTALLATION.md` for full setup.

	## 3) Required environment variables

	Set these before running training scripts:

	```bash
	export EVOLVE_SFT_CHECKPOINT=/path/to/sft/checkpoint
	export EVOLVE_OUTPUT_DIR=/path/to/output/checkpoints
	export EVOLVE_ALIGN_JSON=/path/to/align.json
	export EVOLVE_REWARD_BACKEND=vlac
	```

	Set VLAC checkpoint path before launching service:

	```bash
	export VLAC_CKPT_PATH=/path/to/EVOLVE-VLA/checkpoints/VLAC
	```

	Reference-frame roots for LIBERO suites (if required by rollout config):

	```bash
	export EVOLVE_LIBERO_REF_LIBERO_10=/path/to/libero_10/reference_frames
	export EVOLVE_LIBERO_REF_LIBERO_OBJECT=/path/to/libero_object/reference_frames
	export EVOLVE_LIBERO_REF_LIBERO_OBJECT_WRIST=/path/to/libero_object_wrist/reference_frames
	export EVOLVE_LIBERO_REF_LIBERO_SPATIAL=/path/to/libero_spatial/reference_frames
	export EVOLVE_LIBERO_REF_LIBERO_SPATIAL_WRIST=/path/to/libero_spatial_wrist/reference_frames
	export EVOLVE_LIBERO_REF_LIBERO_GOAL=/path/to/libero_goal/reference_frames
	export EVOLVE_LIBERO_REF_LIBERO_GOAL_WRIST=/path/to/libero_goal_wrist/reference_frames
	```

	Notes:

	- Non-wrist runs only require non-wrist roots (`...LIBERO_10`, `...OBJECT`, `...SPATIAL`, `...GOAL`).
	- Wrist roots are required only when using wrist-mode workers (`interval-wrist` / `interval-wrist-only`).
	- For VLAC, reference frames are strongly recommended for performance.

	Optional:

	```bash
	export WANDB_API_KEY=...
	export EVOLVE_RAY_ADDRESS=ray://127.0.0.1:10001
	export EVOLVE_NCCL_SOCKET_IFNAME=eth0
	```

	## 4) Start reward service

	```bash
	conda activate vlac
	cd /path/to/EVOLVE-VLA/Release
	python reward_model/launch_vlac_servers.py --base-port 8111
	```

	Health-check example:

	```bash
	python scripts/check_vlac_services.py --urls http://127.0.0.1:8111,http://127.0.0.1:8112
	```

	## 5) Run training/eval entry scripts

	From `Release/`:

	```bash
	conda activate evolve-vla

	# LIBERO-Long
	python scripts/train_libero_10-sft_full-ttt.py

	# 1-shot setting script (often used for spatial-oriented experiments)
	python scripts/train_libero_10-sft_1shot-ttt.py

	# Zero-shot transfer path
	python scripts/train_libero_object-0shot-ttt.py

	# Optional wrist-view finetune variant
	python scripts/finetune_libero_object-0shot-ttt_with_wrist.py
	```

	If you are assembling strict main-table runs, align each suite's script/config with paper settings and record them in the log template below.

	## 5.1) Reference-frame preparation (recommended for VLAC)

	Reference frames are generated from expert demos and materially improve VLAC quality.

	Fast path (one command, all suites):

	```bash
	python scripts/prepare_reference_frames_pipeline.py \
	--raw-datasets-root /path/to/libero/raw \
	--working-root /path/to/libero/processed \
	--reference-root /path/to/libero/reference_frames \
	--include-wrist \
	--overwrite
	```

	Use `--skip-regen` if you want to use downloaded low-resolution datasets directly.

	Detailed workflow (release-contained scripts):

	1. Optional but recommended: regenerate LIBERO demos at higher resolution
	- Script: `scripts/regenerate_libero_dataset.py`
	- Purpose: generate cleaner 256x256 trajectories (community versions may be low-resolution).
	- If skipped, low-resolution source can still be used with expected performance drop.
	- Example:
	```bash
	python scripts/regenerate_libero_dataset.py \
	--libero_task_suite libero_10 \
	--libero_raw_data_dir /path/to/libero_10_raw \
	--libero_target_dir /path/to/libero_10_regen
	```

	2. Select/export expert demos to reference-frame folders
	- Script: `scripts/prepare_expert_demo.py`
	- Behavior: picks shortest successful demo and exports per-frame PNGs.
	- Default is agent view (`--frame_key agentview_rgb`).
	- For wrist-view references, use `--frame_key eye_in_hand_rgb`.
	- Example (agent view):
	```bash
	python scripts/prepare_expert_demo.py \
	--libero_task_suite libero_10 \
	--libero_raw_data_dir /path/to/libero_10_regen \
	--output_dir /path/to/reference_frames/agentview \
	--overwrite
	```
	- Example (wrist view):
	```bash
	python scripts/prepare_expert_demo.py \
	--libero_task_suite libero_10 \
	--libero_raw_data_dir /path/to/libero_10_regen \
	--output_dir /path/to/reference_frames/wrist \
	--frame_key eye_in_hand_rgb \
	--overwrite
	```

	3. *Set exported directories as `EVOLVE_LIBERO_REF_` roots**
	- Set non-wrist roots for standard settings.
	- Set wrist roots when running wrist-view worker modes.

	## 6) Logging template (required)

	For each run, record:

	- date/time
	- git commit hash (if repo initialized)
	- script name
	- backend (`vlac` by default)
	- suite name
	- seed
	- checkpoint path
	- success rate

	Summary table template:

	\| Suite \| Seed \| Success (%) \| Notes \|
	\|---\|---:\|---:\|---\|
	\| Spatial \| \| \| \|
	\| Object \| \| \| \|
	\| Goal \| \| \| \|
	\| Long \| \| \| \|

	Average:

	```text
	Avg = (Spatial + Object + Goal + Long) / 4
	```

	## 7) Tolerance reporting

	After running multiple seeds, report:

	- mean and std per suite
	- mean and std for average score
	- absolute difference vs paper target

	Use project-agreed tolerance (for example, +/- X percentage points) once baseline variance is measured.

	## 8) Temporary cross-machine validation plan

	For internal transfer and bring-up validation on another server, use:

	- `docs/TEMP_OTHER_MACHINE_TEST_PLAN.md`

	This is a temporary internal checklist file and can be removed after validation/reproduction sign-off.