VLAarchtests / README.md

lsnu

Add files using upload-large-folder tool

de2fd70 verified 4 days ago

preview code

raw

history blame contribute delete

8.74 kB

metadata

tags:
  - robotics
  - vision-language-action
  - bimanual-manipulation
  - rlbench
  - rgbd

VLAarchtests

Bundle uploaded from /workspace runpod sessions dated 2026-03-25 UTC and 2026-03-26 UTC.

Top-Level Contents

code/reveal_vla_bimanual/
- project code used for the proxy and RLBench runs in this bundle
artifacts/data/reveal_proxy/
- proxy dataset bundles used by the handoff runs
artifacts/outputs/r3d/
- previously uploaded R3D proxy outputs already present in the bundle
artifacts/outputs/r3d_handoff/
- handoff proxy checkpoints
artifacts/outputs/r3d_handoff_phase/
- phase-supervised handoff proxy checkpoints
artifacts/outputs/rlbench_current/
- RLBench checkpoints from the current session
artifacts/reports/
- proxy and RLBench result files copied from /workspace/reports
environment/
- same-machine setup files and validation helpers
tests/
- local test suite
handoff/instructions.md
- instruction file used for the handoff work
MODEL_INDEX.md
- checkpoint and result index
results/session_results_20260326.md
- raw result tables for the 2026-03-25/26 work

Code Added Or Updated

Core model, memory, planner, and dataset paths

code/reveal_vla_bimanual/models/backbones.py
code/reveal_vla_bimanual/models/multiview_fusion.py
code/reveal_vla_bimanual/models/observation_memory.py
code/reveal_vla_bimanual/models/reveal_head.py
code/reveal_vla_bimanual/models/world_model.py
code/reveal_vla_bimanual/models/action_decoder.py
code/reveal_vla_bimanual/models/planner.py
code/reveal_vla_bimanual/models/policy.py
code/reveal_vla_bimanual/train/losses.py
code/reveal_vla_bimanual/sim_reveal/dataset.py
code/reveal_vla_bimanual/sim_reveal/procedural_envs.py
code/reveal_vla_bimanual/sim_rlbench/dataset.py

Training and evaluation paths

code/reveal_vla_bimanual/train/run_rlbench_experiment.py
code/reveal_vla_bimanual/eval/run_reveal_benchmark.py
code/reveal_vla_bimanual/eval/run_ablations.py
code/reveal_vla_bimanual/eval/run_teacher_audit.py
code/reveal_vla_bimanual/eval/run_rlbench_rollout_eval.py
code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py

Added or updated training configs

code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact.yaml
code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial.yaml
code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase.yaml
code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial_phase.yaml
code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_valid9.yaml
code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_common23.yaml
code/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_current_wide.yaml
code/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_step1.yaml
code/reveal_vla_bimanual/train/configs/rlbench_push_box_backbone_only_clip_step1.yaml

Test files

The staged tests/ directory contains 32 test modules plus conftest.py, including:

geometry and camera rotation coverage
phase-label and candidate-ranking coverage
planner gradient-flow and reocclusion gating coverage
world-model null-rollout, field-consistency, and task-adapter coverage
proxy scripted benchmark and teacher-audit coverage

Verification

local test command:
- PYTHONPATH=/workspace/VLAarchtests_work/code/reveal_vla_bimanual python -m pytest -q /workspace/VLAarchtests_work/tests
result:
- 33 passed

Raw Result Files

Proxy and handoff results

artifacts/reports/reveal_smoke_mod/reveal_benchmark.json
artifacts/reports/reveal_smoke_nogeom/reveal_benchmark.json
artifacts/reports/reveal_smoke_noplanner/reveal_benchmark.json
artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json
artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json
artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json
artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json
artifacts/reports/reveal_phase_ablations_compact/ablations.json
artifacts/reports/reveal_teacher_audit_serious/teacher_audit.json

RLBench result files

artifacts/reports/rlbench_dual_buttons_baseline_len100_ep1_ik_rescale/rollout_eval.json
artifacts/reports/rlbench_dual_buttons_common23_len100_ep1_ik_rescale/rollout_eval.json
artifacts/reports/rlbench_push_box_common23_len100_ep1_ik_rescale/rollout_eval.json
artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json
artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json
artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json
artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json
artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json
artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json

Raw Result Tables

Proxy serious runs

Artifact	File	Raw values
spatial handoff vs released baseline	`artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json`	baseline mean success `0.5833`, handoff mean success `0.2167`
spatial-trained checkpoint with compact world model vs released baseline	`artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json`	baseline mean success `0.5833`, handoff mean success `0.5200`
compact-phase vs released baseline	`artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json`	baseline mean success `0.5833`, compact-phase mean success `0.5133`
spatial-phase with compact world model vs released baseline	`artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json`	baseline mean success `0.5833`, spatial-phase compact-world-model mean success `0.4933`

Proxy ablations

Artifact	File	Raw values
compact-phase ablations	`artifacts/reports/reveal_phase_ablations_compact/ablations.json`	full `0.5133`, `no_geometry` `0.5133`, `no_spatial_memory` `0.4967`, `compact_world_model` `0.5133`, `no_planner` `0.4333`, `gaussian_candidates_only` `0.4667`, `no_task_head` `0.5133`, `no_support_mode_conditioning` `0.5133`

RLBench direct-policy runs

Artifact	File	Raw values
lift-ball wide checkpoint, one-step replanning	`artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json`	mean success `0.0`, mean return `0.0`, path recoveries `[148]`, noop fallbacks `[11]`
push-box step-1 checkpoint, one-step replanning	`artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json`	mean success `0.0`, mean return `0.0`, path recoveries `[177]`, noop fallbacks `[0]`
push-box step-1 checkpoint, one-step replanning, `delta_scale=0.05`	`artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json`	mean success `0.0`, mean return `0.0`, path recoveries `[180]`, noop fallbacks `[0]`

RLBench retrieval runs

Artifact	File	Raw values
push-box kNN, `bank_stride=4`, `top_k=5`, `time_window=8`, `episodes=1`	`artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json`	mean success `1.0`, mean return `1.0`, bank size `2815`
push-box kNN, `bank_stride=4`, `top_k=5`, `time_window=8`, `episodes=5`	`artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json`	successes `[0.0, 1.0, 0.0, 0.0, 0.0]`, mean success `0.2`, bank size `2815`
push-box kNN, `bank_stride=1`, `top_k=1`, `time_window=4`, `episodes=5`	`artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json`	successes `[0.0, 0.0, 1.0, 1.0, 0.0]`, mean success `0.4`, bank size `11259`

Environment Recreation Files

environment/setup_same_machine.sh
environment/validate_same_machine.sh
environment/run_peract2_13_rollouts.sh
environment/runtime_env_vars.sh
environment/hardware_snapshot.txt
environment/glxinfo_B.txt
environment/upstream_revisions.txt
environment/system_packages_same_machine.txt
environment/rlbench_env_export.yaml
environment/rlbench_env_explicit.txt
environment/rlbench_pip_freeze.txt
environment/reveal_env_export.yaml
environment/reveal_env_explicit.txt
environment/reveal_pip_freeze.txt

Detailed raw tables for the 2026-03-25/26 work are in results/session_results_20260326.md.