File size: 8,743 Bytes

380eb78
 
 
 
 
 
 
 
 
10471c5
35377df
de2fd70
35377df
de2fd70
35377df
 
de2fd70
e7d8e79
de2fd70
 
 
e7d8e79
de2fd70
e7d8e79
de2fd70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e7d8e79
de2fd70
e7d8e79
16405f2
de2fd70
 
 
a9e0685
de2fd70
 
 
 
 
 
 
 
 
a9e0685
de2fd70
d5d49c1
de2fd70
 
 
 
 
 
 
 
 
d5d49c1
de2fd70
d5d49c1
de2fd70
d5d49c1
de2fd70
 
 
 
 
 
d5d49c1
de2fd70
d5d49c1
de2fd70
 
 
10471c5
de2fd70
10471c5
de2fd70
 
 
 
 
10471c5
de2fd70
10471c5
de2fd70
 
 
 
 
10471c5
de2fd70
10471c5
e7d8e79
 
de2fd70
e7d8e79
de2fd70
 
e7d8e79
de2fd70
e7d8e79
 
 
de2fd70

---
tags:
  - robotics
  - vision-language-action
  - bimanual-manipulation
  - rlbench
  - rgbd
---

# VLAarchtests

Bundle uploaded from `/workspace` runpod sessions dated `2026-03-25 UTC` and `2026-03-26 UTC`.

## Top-Level Contents

- `code/reveal_vla_bimanual/`
  - project code used for the proxy and RLBench runs in this bundle
- `artifacts/data/reveal_proxy/`
  - proxy dataset bundles used by the handoff runs
- `artifacts/outputs/r3d/`
  - previously uploaded R3D proxy outputs already present in the bundle
- `artifacts/outputs/r3d_handoff/`
  - handoff proxy checkpoints
- `artifacts/outputs/r3d_handoff_phase/`
  - phase-supervised handoff proxy checkpoints
- `artifacts/outputs/rlbench_current/`
  - RLBench checkpoints from the current session
- `artifacts/reports/`
  - proxy and RLBench result files copied from `/workspace/reports`
- `environment/`
  - same-machine setup files and validation helpers
- `tests/`
  - local test suite
- `handoff/instructions.md`
  - instruction file used for the handoff work
- `MODEL_INDEX.md`
  - checkpoint and result index
- `results/session_results_20260326.md`
  - raw result tables for the `2026-03-25/26` work

## Code Added Or Updated

### Core model, memory, planner, and dataset paths

- `code/reveal_vla_bimanual/models/backbones.py`
- `code/reveal_vla_bimanual/models/multiview_fusion.py`
- `code/reveal_vla_bimanual/models/observation_memory.py`
- `code/reveal_vla_bimanual/models/reveal_head.py`
- `code/reveal_vla_bimanual/models/world_model.py`
- `code/reveal_vla_bimanual/models/action_decoder.py`
- `code/reveal_vla_bimanual/models/planner.py`
- `code/reveal_vla_bimanual/models/policy.py`
- `code/reveal_vla_bimanual/train/losses.py`
- `code/reveal_vla_bimanual/sim_reveal/dataset.py`
- `code/reveal_vla_bimanual/sim_reveal/procedural_envs.py`
- `code/reveal_vla_bimanual/sim_rlbench/dataset.py`

### Training and evaluation paths

- `code/reveal_vla_bimanual/train/run_rlbench_experiment.py`
- `code/reveal_vla_bimanual/eval/run_reveal_benchmark.py`
- `code/reveal_vla_bimanual/eval/run_ablations.py`
- `code/reveal_vla_bimanual/eval/run_teacher_audit.py`
- `code/reveal_vla_bimanual/eval/run_rlbench_rollout_eval.py`
- `code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py`

### Added or updated training configs

- `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact.yaml`
- `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial.yaml`
- `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase.yaml`
- `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial_phase.yaml`
- `code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_valid9.yaml`
- `code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_common23.yaml`
- `code/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_current_wide.yaml`
- `code/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_step1.yaml`
- `code/reveal_vla_bimanual/train/configs/rlbench_push_box_backbone_only_clip_step1.yaml`

### Test files

The staged `tests/` directory contains `32` test modules plus `conftest.py`, including:

- geometry and camera rotation coverage
- phase-label and candidate-ranking coverage
- planner gradient-flow and reocclusion gating coverage
- world-model null-rollout, field-consistency, and task-adapter coverage
- proxy scripted benchmark and teacher-audit coverage

## Verification

- local test command:
  - `PYTHONPATH=/workspace/VLAarchtests_work/code/reveal_vla_bimanual python -m pytest -q /workspace/VLAarchtests_work/tests`
- result:
  - `33 passed`

## Raw Result Files

### Proxy and handoff results

- `artifacts/reports/reveal_smoke_mod/reveal_benchmark.json`
- `artifacts/reports/reveal_smoke_nogeom/reveal_benchmark.json`
- `artifacts/reports/reveal_smoke_noplanner/reveal_benchmark.json`
- `artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json`
- `artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json`
- `artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json`
- `artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json`
- `artifacts/reports/reveal_phase_ablations_compact/ablations.json`
- `artifacts/reports/reveal_teacher_audit_serious/teacher_audit.json`

### RLBench result files

- `artifacts/reports/rlbench_dual_buttons_baseline_len100_ep1_ik_rescale/rollout_eval.json`
- `artifacts/reports/rlbench_dual_buttons_common23_len100_ep1_ik_rescale/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_common23_len100_ep1_ik_rescale/rollout_eval.json`
- `artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json`
- `artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json`

## Raw Result Tables

### Proxy serious runs

| Artifact | File | Raw values |
| --- | --- | --- |
| spatial handoff vs released baseline | `artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json` | baseline mean success `0.5833`, handoff mean success `0.2167` |
| spatial-trained checkpoint with compact world model vs released baseline | `artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json` | baseline mean success `0.5833`, handoff mean success `0.5200` |
| compact-phase vs released baseline | `artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json` | baseline mean success `0.5833`, compact-phase mean success `0.5133` |
| spatial-phase with compact world model vs released baseline | `artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json` | baseline mean success `0.5833`, spatial-phase compact-world-model mean success `0.4933` |

### Proxy ablations

| Artifact | File | Raw values |
| --- | --- | --- |
| compact-phase ablations | `artifacts/reports/reveal_phase_ablations_compact/ablations.json` | full `0.5133`, `no_geometry` `0.5133`, `no_spatial_memory` `0.4967`, `compact_world_model` `0.5133`, `no_planner` `0.4333`, `gaussian_candidates_only` `0.4667`, `no_task_head` `0.5133`, `no_support_mode_conditioning` `0.5133` |

### RLBench direct-policy runs

| Artifact | File | Raw values |
| --- | --- | --- |
| lift-ball wide checkpoint, one-step replanning | `artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json` | mean success `0.0`, mean return `0.0`, path recoveries `[148]`, noop fallbacks `[11]` |
| push-box step-1 checkpoint, one-step replanning | `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json` | mean success `0.0`, mean return `0.0`, path recoveries `[177]`, noop fallbacks `[0]` |
| push-box step-1 checkpoint, one-step replanning, `delta_scale=0.05` | `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json` | mean success `0.0`, mean return `0.0`, path recoveries `[180]`, noop fallbacks `[0]` |

### RLBench retrieval runs

| Artifact | File | Raw values |
| --- | --- | --- |
| push-box kNN, `bank_stride=4`, `top_k=5`, `time_window=8`, `episodes=1` | `artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json` | mean success `1.0`, mean return `1.0`, bank size `2815` |
| push-box kNN, `bank_stride=4`, `top_k=5`, `time_window=8`, `episodes=5` | `artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json` | successes `[0.0, 1.0, 0.0, 0.0, 0.0]`, mean success `0.2`, bank size `2815` |
| push-box kNN, `bank_stride=1`, `top_k=1`, `time_window=4`, `episodes=5` | `artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json` | successes `[0.0, 0.0, 1.0, 1.0, 0.0]`, mean success `0.4`, bank size `11259` |

## Environment Recreation Files

- `environment/setup_same_machine.sh`
- `environment/validate_same_machine.sh`
- `environment/run_peract2_13_rollouts.sh`
- `environment/runtime_env_vars.sh`
- `environment/hardware_snapshot.txt`
- `environment/glxinfo_B.txt`
- `environment/upstream_revisions.txt`
- `environment/system_packages_same_machine.txt`
- `environment/rlbench_env_export.yaml`
- `environment/rlbench_env_explicit.txt`
- `environment/rlbench_pip_freeze.txt`
- `environment/reveal_env_export.yaml`
- `environment/reveal_env_explicit.txt`
- `environment/reveal_pip_freeze.txt`

Detailed raw tables for the `2026-03-25/26` work are in `results/session_results_20260326.md`.