metadata
tags:
- robotics
- vision-language-action
- bimanual-manipulation
- rlbench
- rgbd
VLAarchtests
Bundle uploaded from /workspace runpod sessions dated 2026-03-25 UTC and 2026-03-26 UTC.
Top-Level Contents
code/reveal_vla_bimanual/- project code used for the proxy and RLBench runs in this bundle
artifacts/data/reveal_proxy/- proxy dataset bundles used by the handoff runs
artifacts/outputs/r3d/- previously uploaded R3D proxy outputs already present in the bundle
artifacts/outputs/r3d_handoff/- handoff proxy checkpoints
artifacts/outputs/r3d_handoff_phase/- phase-supervised handoff proxy checkpoints
artifacts/outputs/rlbench_current/- RLBench checkpoints from the current session
artifacts/reports/- proxy and RLBench result files copied from
/workspace/reports
- proxy and RLBench result files copied from
environment/- same-machine setup files and validation helpers
tests/- local test suite
handoff/instructions.md- instruction file used for the handoff work
MODEL_INDEX.md- checkpoint and result index
results/session_results_20260326.md- raw result tables for the
2026-03-25/26work
- raw result tables for the
Code Added Or Updated
Core model, memory, planner, and dataset paths
code/reveal_vla_bimanual/models/backbones.pycode/reveal_vla_bimanual/models/multiview_fusion.pycode/reveal_vla_bimanual/models/observation_memory.pycode/reveal_vla_bimanual/models/reveal_head.pycode/reveal_vla_bimanual/models/world_model.pycode/reveal_vla_bimanual/models/action_decoder.pycode/reveal_vla_bimanual/models/planner.pycode/reveal_vla_bimanual/models/policy.pycode/reveal_vla_bimanual/train/losses.pycode/reveal_vla_bimanual/sim_reveal/dataset.pycode/reveal_vla_bimanual/sim_reveal/procedural_envs.pycode/reveal_vla_bimanual/sim_rlbench/dataset.py
Training and evaluation paths
code/reveal_vla_bimanual/train/run_rlbench_experiment.pycode/reveal_vla_bimanual/eval/run_reveal_benchmark.pycode/reveal_vla_bimanual/eval/run_ablations.pycode/reveal_vla_bimanual/eval/run_teacher_audit.pycode/reveal_vla_bimanual/eval/run_rlbench_rollout_eval.pycode/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py
Added or updated training configs
code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact.yamlcode/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial.yamlcode/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase.yamlcode/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial_phase.yamlcode/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_valid9.yamlcode/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_common23.yamlcode/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_current_wide.yamlcode/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_step1.yamlcode/reveal_vla_bimanual/train/configs/rlbench_push_box_backbone_only_clip_step1.yaml
Test files
The staged tests/ directory contains 32 test modules plus conftest.py, including:
- geometry and camera rotation coverage
- phase-label and candidate-ranking coverage
- planner gradient-flow and reocclusion gating coverage
- world-model null-rollout, field-consistency, and task-adapter coverage
- proxy scripted benchmark and teacher-audit coverage
Verification
- local test command:
PYTHONPATH=/workspace/VLAarchtests_work/code/reveal_vla_bimanual python -m pytest -q /workspace/VLAarchtests_work/tests
- result:
33 passed
Raw Result Files
Proxy and handoff results
artifacts/reports/reveal_smoke_mod/reveal_benchmark.jsonartifacts/reports/reveal_smoke_nogeom/reveal_benchmark.jsonartifacts/reports/reveal_smoke_noplanner/reveal_benchmark.jsonartifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.jsonartifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.jsonartifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.jsonartifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.jsonartifacts/reports/reveal_phase_ablations_compact/ablations.jsonartifacts/reports/reveal_teacher_audit_serious/teacher_audit.json
RLBench result files
artifacts/reports/rlbench_dual_buttons_baseline_len100_ep1_ik_rescale/rollout_eval.jsonartifacts/reports/rlbench_dual_buttons_common23_len100_ep1_ik_rescale/rollout_eval.jsonartifacts/reports/rlbench_push_box_common23_len100_ep1_ik_rescale/rollout_eval.jsonartifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.jsonartifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.jsonartifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.jsonartifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.jsonartifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.jsonartifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json
Raw Result Tables
Proxy serious runs
| Artifact | File | Raw values |
|---|---|---|
| spatial handoff vs released baseline | artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json |
baseline mean success 0.5833, handoff mean success 0.2167 |
| spatial-trained checkpoint with compact world model vs released baseline | artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json |
baseline mean success 0.5833, handoff mean success 0.5200 |
| compact-phase vs released baseline | artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json |
baseline mean success 0.5833, compact-phase mean success 0.5133 |
| spatial-phase with compact world model vs released baseline | artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json |
baseline mean success 0.5833, spatial-phase compact-world-model mean success 0.4933 |
Proxy ablations
| Artifact | File | Raw values |
|---|---|---|
| compact-phase ablations | artifacts/reports/reveal_phase_ablations_compact/ablations.json |
full 0.5133, no_geometry 0.5133, no_spatial_memory 0.4967, compact_world_model 0.5133, no_planner 0.4333, gaussian_candidates_only 0.4667, no_task_head 0.5133, no_support_mode_conditioning 0.5133 |
RLBench direct-policy runs
| Artifact | File | Raw values |
|---|---|---|
| lift-ball wide checkpoint, one-step replanning | artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json |
mean success 0.0, mean return 0.0, path recoveries [148], noop fallbacks [11] |
| push-box step-1 checkpoint, one-step replanning | artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json |
mean success 0.0, mean return 0.0, path recoveries [177], noop fallbacks [0] |
push-box step-1 checkpoint, one-step replanning, delta_scale=0.05 |
artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json |
mean success 0.0, mean return 0.0, path recoveries [180], noop fallbacks [0] |
RLBench retrieval runs
| Artifact | File | Raw values |
|---|---|---|
push-box kNN, bank_stride=4, top_k=5, time_window=8, episodes=1 |
artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json |
mean success 1.0, mean return 1.0, bank size 2815 |
push-box kNN, bank_stride=4, top_k=5, time_window=8, episodes=5 |
artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json |
successes [0.0, 1.0, 0.0, 0.0, 0.0], mean success 0.2, bank size 2815 |
push-box kNN, bank_stride=1, top_k=1, time_window=4, episodes=5 |
artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json |
successes [0.0, 0.0, 1.0, 1.0, 0.0], mean success 0.4, bank size 11259 |
Environment Recreation Files
environment/setup_same_machine.shenvironment/validate_same_machine.shenvironment/run_peract2_13_rollouts.shenvironment/runtime_env_vars.shenvironment/hardware_snapshot.txtenvironment/glxinfo_B.txtenvironment/upstream_revisions.txtenvironment/system_packages_same_machine.txtenvironment/rlbench_env_export.yamlenvironment/rlbench_env_explicit.txtenvironment/rlbench_pip_freeze.txtenvironment/reveal_env_export.yamlenvironment/reveal_env_explicit.txtenvironment/reveal_pip_freeze.txt
Detailed raw tables for the 2026-03-25/26 work are in results/session_results_20260326.md.