File size: 8,743 Bytes
380eb78
 
 
 
 
 
 
 
 
10471c5
35377df
de2fd70
35377df
de2fd70
35377df
 
de2fd70
e7d8e79
de2fd70
 
 
e7d8e79
de2fd70
e7d8e79
de2fd70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e7d8e79
de2fd70
e7d8e79
16405f2
de2fd70
 
 
a9e0685
de2fd70
 
 
 
 
 
 
 
 
a9e0685
de2fd70
d5d49c1
de2fd70
 
 
 
 
 
 
 
 
d5d49c1
de2fd70
d5d49c1
de2fd70
d5d49c1
de2fd70
 
 
 
 
 
d5d49c1
de2fd70
d5d49c1
de2fd70
 
 
10471c5
de2fd70
10471c5
de2fd70
 
 
 
 
10471c5
de2fd70
10471c5
de2fd70
 
 
 
 
10471c5
de2fd70
10471c5
e7d8e79
 
de2fd70
e7d8e79
de2fd70
 
e7d8e79
de2fd70
e7d8e79
 
 
de2fd70
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
---
tags:
  - robotics
  - vision-language-action
  - bimanual-manipulation
  - rlbench
  - rgbd
---

# VLAarchtests

Bundle uploaded from `/workspace` runpod sessions dated `2026-03-25 UTC` and `2026-03-26 UTC`.

## Top-Level Contents

- `code/reveal_vla_bimanual/`
  - project code used for the proxy and RLBench runs in this bundle
- `artifacts/data/reveal_proxy/`
  - proxy dataset bundles used by the handoff runs
- `artifacts/outputs/r3d/`
  - previously uploaded R3D proxy outputs already present in the bundle
- `artifacts/outputs/r3d_handoff/`
  - handoff proxy checkpoints
- `artifacts/outputs/r3d_handoff_phase/`
  - phase-supervised handoff proxy checkpoints
- `artifacts/outputs/rlbench_current/`
  - RLBench checkpoints from the current session
- `artifacts/reports/`
  - proxy and RLBench result files copied from `/workspace/reports`
- `environment/`
  - same-machine setup files and validation helpers
- `tests/`
  - local test suite
- `handoff/instructions.md`
  - instruction file used for the handoff work
- `MODEL_INDEX.md`
  - checkpoint and result index
- `results/session_results_20260326.md`
  - raw result tables for the `2026-03-25/26` work

## Code Added Or Updated

### Core model, memory, planner, and dataset paths

- `code/reveal_vla_bimanual/models/backbones.py`
- `code/reveal_vla_bimanual/models/multiview_fusion.py`
- `code/reveal_vla_bimanual/models/observation_memory.py`
- `code/reveal_vla_bimanual/models/reveal_head.py`
- `code/reveal_vla_bimanual/models/world_model.py`
- `code/reveal_vla_bimanual/models/action_decoder.py`
- `code/reveal_vla_bimanual/models/planner.py`
- `code/reveal_vla_bimanual/models/policy.py`
- `code/reveal_vla_bimanual/train/losses.py`
- `code/reveal_vla_bimanual/sim_reveal/dataset.py`
- `code/reveal_vla_bimanual/sim_reveal/procedural_envs.py`
- `code/reveal_vla_bimanual/sim_rlbench/dataset.py`

### Training and evaluation paths

- `code/reveal_vla_bimanual/train/run_rlbench_experiment.py`
- `code/reveal_vla_bimanual/eval/run_reveal_benchmark.py`
- `code/reveal_vla_bimanual/eval/run_ablations.py`
- `code/reveal_vla_bimanual/eval/run_teacher_audit.py`
- `code/reveal_vla_bimanual/eval/run_rlbench_rollout_eval.py`
- `code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py`

### Added or updated training configs

- `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact.yaml`
- `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial.yaml`
- `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase.yaml`
- `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial_phase.yaml`
- `code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_valid9.yaml`
- `code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_common23.yaml`
- `code/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_current_wide.yaml`
- `code/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_step1.yaml`
- `code/reveal_vla_bimanual/train/configs/rlbench_push_box_backbone_only_clip_step1.yaml`

### Test files

The staged `tests/` directory contains `32` test modules plus `conftest.py`, including:

- geometry and camera rotation coverage
- phase-label and candidate-ranking coverage
- planner gradient-flow and reocclusion gating coverage
- world-model null-rollout, field-consistency, and task-adapter coverage
- proxy scripted benchmark and teacher-audit coverage

## Verification

- local test command:
  - `PYTHONPATH=/workspace/VLAarchtests_work/code/reveal_vla_bimanual python -m pytest -q /workspace/VLAarchtests_work/tests`
- result:
  - `33 passed`

## Raw Result Files

### Proxy and handoff results

- `artifacts/reports/reveal_smoke_mod/reveal_benchmark.json`
- `artifacts/reports/reveal_smoke_nogeom/reveal_benchmark.json`
- `artifacts/reports/reveal_smoke_noplanner/reveal_benchmark.json`
- `artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json`
- `artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json`
- `artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json`
- `artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json`
- `artifacts/reports/reveal_phase_ablations_compact/ablations.json`
- `artifacts/reports/reveal_teacher_audit_serious/teacher_audit.json`

### RLBench result files

- `artifacts/reports/rlbench_dual_buttons_baseline_len100_ep1_ik_rescale/rollout_eval.json`
- `artifacts/reports/rlbench_dual_buttons_common23_len100_ep1_ik_rescale/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_common23_len100_ep1_ik_rescale/rollout_eval.json`
- `artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json`

## Raw Result Tables

### Proxy serious runs

| Artifact | File | Raw values |
| --- | --- | --- |
| spatial handoff vs released baseline | `artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json` | baseline mean success `0.5833`, handoff mean success `0.2167` |
| spatial-trained checkpoint with compact world model vs released baseline | `artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json` | baseline mean success `0.5833`, handoff mean success `0.5200` |
| compact-phase vs released baseline | `artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json` | baseline mean success `0.5833`, compact-phase mean success `0.5133` |
| spatial-phase with compact world model vs released baseline | `artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json` | baseline mean success `0.5833`, spatial-phase compact-world-model mean success `0.4933` |

### Proxy ablations

| Artifact | File | Raw values |
| --- | --- | --- |
| compact-phase ablations | `artifacts/reports/reveal_phase_ablations_compact/ablations.json` | full `0.5133`, `no_geometry` `0.5133`, `no_spatial_memory` `0.4967`, `compact_world_model` `0.5133`, `no_planner` `0.4333`, `gaussian_candidates_only` `0.4667`, `no_task_head` `0.5133`, `no_support_mode_conditioning` `0.5133` |

### RLBench direct-policy runs

| Artifact | File | Raw values |
| --- | --- | --- |
| lift-ball wide checkpoint, one-step replanning | `artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json` | mean success `0.0`, mean return `0.0`, path recoveries `[148]`, noop fallbacks `[11]` |
| push-box step-1 checkpoint, one-step replanning | `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json` | mean success `0.0`, mean return `0.0`, path recoveries `[177]`, noop fallbacks `[0]` |
| push-box step-1 checkpoint, one-step replanning, `delta_scale=0.05` | `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json` | mean success `0.0`, mean return `0.0`, path recoveries `[180]`, noop fallbacks `[0]` |

### RLBench retrieval runs

| Artifact | File | Raw values |
| --- | --- | --- |
| push-box kNN, `bank_stride=4`, `top_k=5`, `time_window=8`, `episodes=1` | `artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json` | mean success `1.0`, mean return `1.0`, bank size `2815` |
| push-box kNN, `bank_stride=4`, `top_k=5`, `time_window=8`, `episodes=5` | `artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json` | successes `[0.0, 1.0, 0.0, 0.0, 0.0]`, mean success `0.2`, bank size `2815` |
| push-box kNN, `bank_stride=1`, `top_k=1`, `time_window=4`, `episodes=5` | `artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json` | successes `[0.0, 0.0, 1.0, 1.0, 0.0]`, mean success `0.4`, bank size `11259` |

## Environment Recreation Files

- `environment/setup_same_machine.sh`
- `environment/validate_same_machine.sh`
- `environment/run_peract2_13_rollouts.sh`
- `environment/runtime_env_vars.sh`
- `environment/hardware_snapshot.txt`
- `environment/glxinfo_B.txt`
- `environment/upstream_revisions.txt`
- `environment/system_packages_same_machine.txt`
- `environment/rlbench_env_export.yaml`
- `environment/rlbench_env_explicit.txt`
- `environment/rlbench_pip_freeze.txt`
- `environment/reveal_env_export.yaml`
- `environment/reveal_env_explicit.txt`
- `environment/reveal_pip_freeze.txt`

Detailed raw tables for the `2026-03-25/26` work are in `results/session_results_20260326.md`.