VLAarchtests
Bundle uploaded from /workspace runpod sessions dated 2026-03-25 UTC and 2026-03-26 UTC.
Top-Level Contents
code/reveal_vla_bimanual/
- project code used for the proxy and RLBench runs in this bundle
artifacts/data/reveal_proxy/
- proxy dataset bundles used by the handoff runs
artifacts/outputs/r3d/
- previously uploaded R3D proxy outputs already present in the bundle
artifacts/outputs/r3d_handoff/
- handoff proxy checkpoints
artifacts/outputs/r3d_handoff_phase/
- phase-supervised handoff proxy checkpoints
artifacts/outputs/rlbench_current/
- RLBench checkpoints from the current session
artifacts/reports/
- proxy and RLBench result files copied from
/workspace/reports
environment/
- same-machine setup files and validation helpers
tests/
handoff/instructions.md
- instruction file used for the handoff work
MODEL_INDEX.md
- checkpoint and result index
results/session_results_20260326.md
- raw result tables for the
2026-03-25/26 work
Code Added Or Updated
Core model, memory, planner, and dataset paths
code/reveal_vla_bimanual/models/backbones.py
code/reveal_vla_bimanual/models/multiview_fusion.py
code/reveal_vla_bimanual/models/observation_memory.py
code/reveal_vla_bimanual/models/reveal_head.py
code/reveal_vla_bimanual/models/world_model.py
code/reveal_vla_bimanual/models/action_decoder.py
code/reveal_vla_bimanual/models/planner.py
code/reveal_vla_bimanual/models/policy.py
code/reveal_vla_bimanual/train/losses.py
code/reveal_vla_bimanual/sim_reveal/dataset.py
code/reveal_vla_bimanual/sim_reveal/procedural_envs.py
code/reveal_vla_bimanual/sim_rlbench/dataset.py
Training and evaluation paths
code/reveal_vla_bimanual/train/run_rlbench_experiment.py
code/reveal_vla_bimanual/eval/run_reveal_benchmark.py
code/reveal_vla_bimanual/eval/run_ablations.py
code/reveal_vla_bimanual/eval/run_teacher_audit.py
code/reveal_vla_bimanual/eval/run_rlbench_rollout_eval.py
code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py
Added or updated training configs
code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact.yaml
code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial.yaml
code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase.yaml
code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial_phase.yaml
code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_valid9.yaml
code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_common23.yaml
code/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_current_wide.yaml
code/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_step1.yaml
code/reveal_vla_bimanual/train/configs/rlbench_push_box_backbone_only_clip_step1.yaml
Test files
The staged tests/ directory contains 32 test modules plus conftest.py, including:
- geometry and camera rotation coverage
- phase-label and candidate-ranking coverage
- planner gradient-flow and reocclusion gating coverage
- world-model null-rollout, field-consistency, and task-adapter coverage
- proxy scripted benchmark and teacher-audit coverage
Verification
- local test command:
PYTHONPATH=/workspace/VLAarchtests_work/code/reveal_vla_bimanual python -m pytest -q /workspace/VLAarchtests_work/tests
- result:
Raw Result Files
Proxy and handoff results
artifacts/reports/reveal_smoke_mod/reveal_benchmark.json
artifacts/reports/reveal_smoke_nogeom/reveal_benchmark.json
artifacts/reports/reveal_smoke_noplanner/reveal_benchmark.json
artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json
artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json
artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json
artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json
artifacts/reports/reveal_phase_ablations_compact/ablations.json
artifacts/reports/reveal_teacher_audit_serious/teacher_audit.json
RLBench result files
artifacts/reports/rlbench_dual_buttons_baseline_len100_ep1_ik_rescale/rollout_eval.json
artifacts/reports/rlbench_dual_buttons_common23_len100_ep1_ik_rescale/rollout_eval.json
artifacts/reports/rlbench_push_box_common23_len100_ep1_ik_rescale/rollout_eval.json
artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json
artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json
artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json
artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json
artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json
artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json
Raw Result Tables
Proxy serious runs
| Artifact |
File |
Raw values |
| spatial handoff vs released baseline |
artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json |
baseline mean success 0.5833, handoff mean success 0.2167 |
| spatial-trained checkpoint with compact world model vs released baseline |
artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json |
baseline mean success 0.5833, handoff mean success 0.5200 |
| compact-phase vs released baseline |
artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json |
baseline mean success 0.5833, compact-phase mean success 0.5133 |
| spatial-phase with compact world model vs released baseline |
artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json |
baseline mean success 0.5833, spatial-phase compact-world-model mean success 0.4933 |
Proxy ablations
| Artifact |
File |
Raw values |
| compact-phase ablations |
artifacts/reports/reveal_phase_ablations_compact/ablations.json |
full 0.5133, no_geometry 0.5133, no_spatial_memory 0.4967, compact_world_model 0.5133, no_planner 0.4333, gaussian_candidates_only 0.4667, no_task_head 0.5133, no_support_mode_conditioning 0.5133 |
RLBench direct-policy runs
| Artifact |
File |
Raw values |
| lift-ball wide checkpoint, one-step replanning |
artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json |
mean success 0.0, mean return 0.0, path recoveries [148], noop fallbacks [11] |
| push-box step-1 checkpoint, one-step replanning |
artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json |
mean success 0.0, mean return 0.0, path recoveries [177], noop fallbacks [0] |
push-box step-1 checkpoint, one-step replanning, delta_scale=0.05 |
artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json |
mean success 0.0, mean return 0.0, path recoveries [180], noop fallbacks [0] |
RLBench retrieval runs
| Artifact |
File |
Raw values |
push-box kNN, bank_stride=4, top_k=5, time_window=8, episodes=1 |
artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json |
mean success 1.0, mean return 1.0, bank size 2815 |
push-box kNN, bank_stride=4, top_k=5, time_window=8, episodes=5 |
artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json |
successes [0.0, 1.0, 0.0, 0.0, 0.0], mean success 0.2, bank size 2815 |
push-box kNN, bank_stride=1, top_k=1, time_window=4, episodes=5 |
artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json |
successes [0.0, 0.0, 1.0, 1.0, 0.0], mean success 0.4, bank size 11259 |
Environment Recreation Files
environment/setup_same_machine.sh
environment/validate_same_machine.sh
environment/run_peract2_13_rollouts.sh
environment/runtime_env_vars.sh
environment/hardware_snapshot.txt
environment/glxinfo_B.txt
environment/upstream_revisions.txt
environment/system_packages_same_machine.txt
environment/rlbench_env_export.yaml
environment/rlbench_env_explicit.txt
environment/rlbench_pip_freeze.txt
environment/reveal_env_export.yaml
environment/reveal_env_explicit.txt
environment/reveal_pip_freeze.txt
Detailed raw tables for the 2026-03-25/26 work are in results/session_results_20260326.md.