File size: 8,743 Bytes
380eb78 10471c5 35377df de2fd70 35377df de2fd70 35377df de2fd70 e7d8e79 de2fd70 e7d8e79 de2fd70 e7d8e79 de2fd70 e7d8e79 de2fd70 e7d8e79 16405f2 de2fd70 a9e0685 de2fd70 a9e0685 de2fd70 d5d49c1 de2fd70 d5d49c1 de2fd70 d5d49c1 de2fd70 d5d49c1 de2fd70 d5d49c1 de2fd70 d5d49c1 de2fd70 10471c5 de2fd70 10471c5 de2fd70 10471c5 de2fd70 10471c5 de2fd70 10471c5 de2fd70 10471c5 e7d8e79 de2fd70 e7d8e79 de2fd70 e7d8e79 de2fd70 e7d8e79 de2fd70 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | ---
tags:
- robotics
- vision-language-action
- bimanual-manipulation
- rlbench
- rgbd
---
# VLAarchtests
Bundle uploaded from `/workspace` runpod sessions dated `2026-03-25 UTC` and `2026-03-26 UTC`.
## Top-Level Contents
- `code/reveal_vla_bimanual/`
- project code used for the proxy and RLBench runs in this bundle
- `artifacts/data/reveal_proxy/`
- proxy dataset bundles used by the handoff runs
- `artifacts/outputs/r3d/`
- previously uploaded R3D proxy outputs already present in the bundle
- `artifacts/outputs/r3d_handoff/`
- handoff proxy checkpoints
- `artifacts/outputs/r3d_handoff_phase/`
- phase-supervised handoff proxy checkpoints
- `artifacts/outputs/rlbench_current/`
- RLBench checkpoints from the current session
- `artifacts/reports/`
- proxy and RLBench result files copied from `/workspace/reports`
- `environment/`
- same-machine setup files and validation helpers
- `tests/`
- local test suite
- `handoff/instructions.md`
- instruction file used for the handoff work
- `MODEL_INDEX.md`
- checkpoint and result index
- `results/session_results_20260326.md`
- raw result tables for the `2026-03-25/26` work
## Code Added Or Updated
### Core model, memory, planner, and dataset paths
- `code/reveal_vla_bimanual/models/backbones.py`
- `code/reveal_vla_bimanual/models/multiview_fusion.py`
- `code/reveal_vla_bimanual/models/observation_memory.py`
- `code/reveal_vla_bimanual/models/reveal_head.py`
- `code/reveal_vla_bimanual/models/world_model.py`
- `code/reveal_vla_bimanual/models/action_decoder.py`
- `code/reveal_vla_bimanual/models/planner.py`
- `code/reveal_vla_bimanual/models/policy.py`
- `code/reveal_vla_bimanual/train/losses.py`
- `code/reveal_vla_bimanual/sim_reveal/dataset.py`
- `code/reveal_vla_bimanual/sim_reveal/procedural_envs.py`
- `code/reveal_vla_bimanual/sim_rlbench/dataset.py`
### Training and evaluation paths
- `code/reveal_vla_bimanual/train/run_rlbench_experiment.py`
- `code/reveal_vla_bimanual/eval/run_reveal_benchmark.py`
- `code/reveal_vla_bimanual/eval/run_ablations.py`
- `code/reveal_vla_bimanual/eval/run_teacher_audit.py`
- `code/reveal_vla_bimanual/eval/run_rlbench_rollout_eval.py`
- `code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py`
### Added or updated training configs
- `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact.yaml`
- `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial.yaml`
- `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase.yaml`
- `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial_phase.yaml`
- `code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_valid9.yaml`
- `code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_common23.yaml`
- `code/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_current_wide.yaml`
- `code/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_step1.yaml`
- `code/reveal_vla_bimanual/train/configs/rlbench_push_box_backbone_only_clip_step1.yaml`
### Test files
The staged `tests/` directory contains `32` test modules plus `conftest.py`, including:
- geometry and camera rotation coverage
- phase-label and candidate-ranking coverage
- planner gradient-flow and reocclusion gating coverage
- world-model null-rollout, field-consistency, and task-adapter coverage
- proxy scripted benchmark and teacher-audit coverage
## Verification
- local test command:
- `PYTHONPATH=/workspace/VLAarchtests_work/code/reveal_vla_bimanual python -m pytest -q /workspace/VLAarchtests_work/tests`
- result:
- `33 passed`
## Raw Result Files
### Proxy and handoff results
- `artifacts/reports/reveal_smoke_mod/reveal_benchmark.json`
- `artifacts/reports/reveal_smoke_nogeom/reveal_benchmark.json`
- `artifacts/reports/reveal_smoke_noplanner/reveal_benchmark.json`
- `artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json`
- `artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json`
- `artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json`
- `artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json`
- `artifacts/reports/reveal_phase_ablations_compact/ablations.json`
- `artifacts/reports/reveal_teacher_audit_serious/teacher_audit.json`
### RLBench result files
- `artifacts/reports/rlbench_dual_buttons_baseline_len100_ep1_ik_rescale/rollout_eval.json`
- `artifacts/reports/rlbench_dual_buttons_common23_len100_ep1_ik_rescale/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_common23_len100_ep1_ik_rescale/rollout_eval.json`
- `artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json`
## Raw Result Tables
### Proxy serious runs
| Artifact | File | Raw values |
| --- | --- | --- |
| spatial handoff vs released baseline | `artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json` | baseline mean success `0.5833`, handoff mean success `0.2167` |
| spatial-trained checkpoint with compact world model vs released baseline | `artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json` | baseline mean success `0.5833`, handoff mean success `0.5200` |
| compact-phase vs released baseline | `artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json` | baseline mean success `0.5833`, compact-phase mean success `0.5133` |
| spatial-phase with compact world model vs released baseline | `artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json` | baseline mean success `0.5833`, spatial-phase compact-world-model mean success `0.4933` |
### Proxy ablations
| Artifact | File | Raw values |
| --- | --- | --- |
| compact-phase ablations | `artifacts/reports/reveal_phase_ablations_compact/ablations.json` | full `0.5133`, `no_geometry` `0.5133`, `no_spatial_memory` `0.4967`, `compact_world_model` `0.5133`, `no_planner` `0.4333`, `gaussian_candidates_only` `0.4667`, `no_task_head` `0.5133`, `no_support_mode_conditioning` `0.5133` |
### RLBench direct-policy runs
| Artifact | File | Raw values |
| --- | --- | --- |
| lift-ball wide checkpoint, one-step replanning | `artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json` | mean success `0.0`, mean return `0.0`, path recoveries `[148]`, noop fallbacks `[11]` |
| push-box step-1 checkpoint, one-step replanning | `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json` | mean success `0.0`, mean return `0.0`, path recoveries `[177]`, noop fallbacks `[0]` |
| push-box step-1 checkpoint, one-step replanning, `delta_scale=0.05` | `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json` | mean success `0.0`, mean return `0.0`, path recoveries `[180]`, noop fallbacks `[0]` |
### RLBench retrieval runs
| Artifact | File | Raw values |
| --- | --- | --- |
| push-box kNN, `bank_stride=4`, `top_k=5`, `time_window=8`, `episodes=1` | `artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json` | mean success `1.0`, mean return `1.0`, bank size `2815` |
| push-box kNN, `bank_stride=4`, `top_k=5`, `time_window=8`, `episodes=5` | `artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json` | successes `[0.0, 1.0, 0.0, 0.0, 0.0]`, mean success `0.2`, bank size `2815` |
| push-box kNN, `bank_stride=1`, `top_k=1`, `time_window=4`, `episodes=5` | `artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json` | successes `[0.0, 0.0, 1.0, 1.0, 0.0]`, mean success `0.4`, bank size `11259` |
## Environment Recreation Files
- `environment/setup_same_machine.sh`
- `environment/validate_same_machine.sh`
- `environment/run_peract2_13_rollouts.sh`
- `environment/runtime_env_vars.sh`
- `environment/hardware_snapshot.txt`
- `environment/glxinfo_B.txt`
- `environment/upstream_revisions.txt`
- `environment/system_packages_same_machine.txt`
- `environment/rlbench_env_export.yaml`
- `environment/rlbench_env_explicit.txt`
- `environment/rlbench_pip_freeze.txt`
- `environment/reveal_env_export.yaml`
- `environment/reveal_env_explicit.txt`
- `environment/reveal_pip_freeze.txt`
Detailed raw tables for the `2026-03-25/26` work are in `results/session_results_20260326.md`.
|