| --- |
| tags: |
| - robotics |
| - vision-language-action |
| - bimanual-manipulation |
| - rlbench |
| - rgbd |
| --- |
| |
| # VLAarchtests |
|
|
| Bundle uploaded from `/workspace` runpod sessions dated `2026-03-25 UTC` and `2026-03-26 UTC`. |
|
|
| ## Top-Level Contents |
|
|
| - `code/reveal_vla_bimanual/` |
| - project code used for the proxy and RLBench runs in this bundle |
| - `artifacts/data/reveal_proxy/` |
| - proxy dataset bundles used by the handoff runs |
| - `artifacts/outputs/r3d/` |
| - previously uploaded R3D proxy outputs already present in the bundle |
| - `artifacts/outputs/r3d_handoff/` |
| - handoff proxy checkpoints |
| - `artifacts/outputs/r3d_handoff_phase/` |
| - phase-supervised handoff proxy checkpoints |
| - `artifacts/outputs/rlbench_current/` |
| - RLBench checkpoints from the current session |
| - `artifacts/reports/` |
| - proxy and RLBench result files copied from `/workspace/reports` |
| - `environment/` |
| - same-machine setup files and validation helpers |
| - `tests/` |
| - local test suite |
| - `handoff/instructions.md` |
| - instruction file used for the handoff work |
| - `MODEL_INDEX.md` |
| - checkpoint and result index |
| - `results/session_results_20260326.md` |
| - raw result tables for the `2026-03-25/26` work |
|
|
| ## Code Added Or Updated |
|
|
| ### Core model, memory, planner, and dataset paths |
|
|
| - `code/reveal_vla_bimanual/models/backbones.py` |
| - `code/reveal_vla_bimanual/models/multiview_fusion.py` |
| - `code/reveal_vla_bimanual/models/observation_memory.py` |
| - `code/reveal_vla_bimanual/models/reveal_head.py` |
| - `code/reveal_vla_bimanual/models/world_model.py` |
| - `code/reveal_vla_bimanual/models/action_decoder.py` |
| - `code/reveal_vla_bimanual/models/planner.py` |
| - `code/reveal_vla_bimanual/models/policy.py` |
| - `code/reveal_vla_bimanual/train/losses.py` |
| - `code/reveal_vla_bimanual/sim_reveal/dataset.py` |
| - `code/reveal_vla_bimanual/sim_reveal/procedural_envs.py` |
| - `code/reveal_vla_bimanual/sim_rlbench/dataset.py` |
|
|
| ### Training and evaluation paths |
|
|
| - `code/reveal_vla_bimanual/train/run_rlbench_experiment.py` |
| - `code/reveal_vla_bimanual/eval/run_reveal_benchmark.py` |
| - `code/reveal_vla_bimanual/eval/run_ablations.py` |
| - `code/reveal_vla_bimanual/eval/run_teacher_audit.py` |
| - `code/reveal_vla_bimanual/eval/run_rlbench_rollout_eval.py` |
| - `code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py` |
|
|
| ### Added or updated training configs |
|
|
| - `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact.yaml` |
| - `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial.yaml` |
| - `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase.yaml` |
| - `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial_phase.yaml` |
| - `code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_valid9.yaml` |
| - `code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_common23.yaml` |
| - `code/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_current_wide.yaml` |
| - `code/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_step1.yaml` |
| - `code/reveal_vla_bimanual/train/configs/rlbench_push_box_backbone_only_clip_step1.yaml` |
|
|
| ### Test files |
|
|
| The staged `tests/` directory contains `32` test modules plus `conftest.py`, including: |
|
|
| - geometry and camera rotation coverage |
| - phase-label and candidate-ranking coverage |
| - planner gradient-flow and reocclusion gating coverage |
| - world-model null-rollout, field-consistency, and task-adapter coverage |
| - proxy scripted benchmark and teacher-audit coverage |
|
|
| ## Verification |
|
|
| - local test command: |
| - `PYTHONPATH=/workspace/VLAarchtests_work/code/reveal_vla_bimanual python -m pytest -q /workspace/VLAarchtests_work/tests` |
| - result: |
| - `33 passed` |
|
|
| ## Raw Result Files |
|
|
| ### Proxy and handoff results |
|
|
| - `artifacts/reports/reveal_smoke_mod/reveal_benchmark.json` |
| - `artifacts/reports/reveal_smoke_nogeom/reveal_benchmark.json` |
| - `artifacts/reports/reveal_smoke_noplanner/reveal_benchmark.json` |
| - `artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json` |
| - `artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json` |
| - `artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json` |
| - `artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json` |
| - `artifacts/reports/reveal_phase_ablations_compact/ablations.json` |
| - `artifacts/reports/reveal_teacher_audit_serious/teacher_audit.json` |
|
|
| ### RLBench result files |
|
|
| - `artifacts/reports/rlbench_dual_buttons_baseline_len100_ep1_ik_rescale/rollout_eval.json` |
| - `artifacts/reports/rlbench_dual_buttons_common23_len100_ep1_ik_rescale/rollout_eval.json` |
| - `artifacts/reports/rlbench_push_box_common23_len100_ep1_ik_rescale/rollout_eval.json` |
| - `artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json` |
| - `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json` |
| - `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json` |
| - `artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json` |
| - `artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json` |
| - `artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json` |
|
|
| ## Raw Result Tables |
|
|
| ### Proxy serious runs |
|
|
| | Artifact | File | Raw values | |
| | --- | --- | --- | |
| | spatial handoff vs released baseline | `artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json` | baseline mean success `0.5833`, handoff mean success `0.2167` | |
| | spatial-trained checkpoint with compact world model vs released baseline | `artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json` | baseline mean success `0.5833`, handoff mean success `0.5200` | |
| | compact-phase vs released baseline | `artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json` | baseline mean success `0.5833`, compact-phase mean success `0.5133` | |
| | spatial-phase with compact world model vs released baseline | `artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json` | baseline mean success `0.5833`, spatial-phase compact-world-model mean success `0.4933` | |
|
|
| ### Proxy ablations |
|
|
| | Artifact | File | Raw values | |
| | --- | --- | --- | |
| | compact-phase ablations | `artifacts/reports/reveal_phase_ablations_compact/ablations.json` | full `0.5133`, `no_geometry` `0.5133`, `no_spatial_memory` `0.4967`, `compact_world_model` `0.5133`, `no_planner` `0.4333`, `gaussian_candidates_only` `0.4667`, `no_task_head` `0.5133`, `no_support_mode_conditioning` `0.5133` | |
|
|
| ### RLBench direct-policy runs |
|
|
| | Artifact | File | Raw values | |
| | --- | --- | --- | |
| | lift-ball wide checkpoint, one-step replanning | `artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json` | mean success `0.0`, mean return `0.0`, path recoveries `[148]`, noop fallbacks `[11]` | |
| | push-box step-1 checkpoint, one-step replanning | `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json` | mean success `0.0`, mean return `0.0`, path recoveries `[177]`, noop fallbacks `[0]` | |
| | push-box step-1 checkpoint, one-step replanning, `delta_scale=0.05` | `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json` | mean success `0.0`, mean return `0.0`, path recoveries `[180]`, noop fallbacks `[0]` | |
|
|
| ### RLBench retrieval runs |
|
|
| | Artifact | File | Raw values | |
| | --- | --- | --- | |
| | push-box kNN, `bank_stride=4`, `top_k=5`, `time_window=8`, `episodes=1` | `artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json` | mean success `1.0`, mean return `1.0`, bank size `2815` | |
| | push-box kNN, `bank_stride=4`, `top_k=5`, `time_window=8`, `episodes=5` | `artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json` | successes `[0.0, 1.0, 0.0, 0.0, 0.0]`, mean success `0.2`, bank size `2815` | |
| | push-box kNN, `bank_stride=1`, `top_k=1`, `time_window=4`, `episodes=5` | `artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json` | successes `[0.0, 0.0, 1.0, 1.0, 0.0]`, mean success `0.4`, bank size `11259` | |
|
|
| ## Environment Recreation Files |
|
|
| - `environment/setup_same_machine.sh` |
| - `environment/validate_same_machine.sh` |
| - `environment/run_peract2_13_rollouts.sh` |
| - `environment/runtime_env_vars.sh` |
| - `environment/hardware_snapshot.txt` |
| - `environment/glxinfo_B.txt` |
| - `environment/upstream_revisions.txt` |
| - `environment/system_packages_same_machine.txt` |
| - `environment/rlbench_env_export.yaml` |
| - `environment/rlbench_env_explicit.txt` |
| - `environment/rlbench_pip_freeze.txt` |
| - `environment/reveal_env_export.yaml` |
| - `environment/reveal_env_explicit.txt` |
| - `environment/reveal_pip_freeze.txt` |
|
|
| Detailed raw tables for the `2026-03-25/26` work are in `results/session_results_20260326.md`. |
|
|