Add files using upload-large-folder tool

de2fd70 verified 4 days ago

8.74 kB

	---
	tags:
	- robotics
	- vision-language-action
	- bimanual-manipulation
	- rlbench
	- rgbd
	---

	# VLAarchtests

	Bundle uploaded from `/workspace` runpod sessions dated `2026-03-25 UTC` and `2026-03-26 UTC`.

	## Top-Level Contents

	- `code/reveal_vla_bimanual/`
	- project code used for the proxy and RLBench runs in this bundle
	- `artifacts/data/reveal_proxy/`
	- proxy dataset bundles used by the handoff runs
	- `artifacts/outputs/r3d/`
	- previously uploaded R3D proxy outputs already present in the bundle
	- `artifacts/outputs/r3d_handoff/`
	- handoff proxy checkpoints
	- `artifacts/outputs/r3d_handoff_phase/`
	- phase-supervised handoff proxy checkpoints
	- `artifacts/outputs/rlbench_current/`
	- RLBench checkpoints from the current session
	- `artifacts/reports/`
	- proxy and RLBench result files copied from `/workspace/reports`
	- `environment/`
	- same-machine setup files and validation helpers
	- `tests/`
	- local test suite
	- `handoff/instructions.md`
	- instruction file used for the handoff work
	- `MODEL_INDEX.md`
	- checkpoint and result index
	- `results/session_results_20260326.md`
	- raw result tables for the `2026-03-25/26` work

	## Code Added Or Updated

	### Core model, memory, planner, and dataset paths

	- `code/reveal_vla_bimanual/models/backbones.py`
	- `code/reveal_vla_bimanual/models/multiview_fusion.py`
	- `code/reveal_vla_bimanual/models/observation_memory.py`
	- `code/reveal_vla_bimanual/models/reveal_head.py`
	- `code/reveal_vla_bimanual/models/world_model.py`
	- `code/reveal_vla_bimanual/models/action_decoder.py`
	- `code/reveal_vla_bimanual/models/planner.py`
	- `code/reveal_vla_bimanual/models/policy.py`
	- `code/reveal_vla_bimanual/train/losses.py`
	- `code/reveal_vla_bimanual/sim_reveal/dataset.py`
	- `code/reveal_vla_bimanual/sim_reveal/procedural_envs.py`
	- `code/reveal_vla_bimanual/sim_rlbench/dataset.py`

	### Training and evaluation paths

	- `code/reveal_vla_bimanual/train/run_rlbench_experiment.py`
	- `code/reveal_vla_bimanual/eval/run_reveal_benchmark.py`
	- `code/reveal_vla_bimanual/eval/run_ablations.py`
	- `code/reveal_vla_bimanual/eval/run_teacher_audit.py`
	- `code/reveal_vla_bimanual/eval/run_rlbench_rollout_eval.py`
	- `code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py`

	### Added or updated training configs

	- `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact.yaml`
	- `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial.yaml`
	- `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase.yaml`
	- `code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial_phase.yaml`
	- `code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_valid9.yaml`
	- `code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_common23.yaml`
	- `code/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_current_wide.yaml`
	- `code/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_step1.yaml`
	- `code/reveal_vla_bimanual/train/configs/rlbench_push_box_backbone_only_clip_step1.yaml`

	### Test files

	The staged `tests/` directory contains `32` test modules plus `conftest.py`, including:

	- geometry and camera rotation coverage
	- phase-label and candidate-ranking coverage
	- planner gradient-flow and reocclusion gating coverage
	- world-model null-rollout, field-consistency, and task-adapter coverage
	- proxy scripted benchmark and teacher-audit coverage

	## Verification

	- local test command:
	- `PYTHONPATH=/workspace/VLAarchtests_work/code/reveal_vla_bimanual python -m pytest -q /workspace/VLAarchtests_work/tests`
	- result:
	- `33 passed`

	## Raw Result Files

	### Proxy and handoff results

	- `artifacts/reports/reveal_smoke_mod/reveal_benchmark.json`
	- `artifacts/reports/reveal_smoke_nogeom/reveal_benchmark.json`
	- `artifacts/reports/reveal_smoke_noplanner/reveal_benchmark.json`
	- `artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json`
	- `artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json`
	- `artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json`
	- `artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json`
	- `artifacts/reports/reveal_phase_ablations_compact/ablations.json`
	- `artifacts/reports/reveal_teacher_audit_serious/teacher_audit.json`

	### RLBench result files

	- `artifacts/reports/rlbench_dual_buttons_baseline_len100_ep1_ik_rescale/rollout_eval.json`
	- `artifacts/reports/rlbench_dual_buttons_common23_len100_ep1_ik_rescale/rollout_eval.json`
	- `artifacts/reports/rlbench_push_box_common23_len100_ep1_ik_rescale/rollout_eval.json`
	- `artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json`
	- `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json`
	- `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json`
	- `artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json`
	- `artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json`
	- `artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json`

	## Raw Result Tables

	### Proxy serious runs

	\| Artifact \| File \| Raw values \|
	\| --- \| --- \| --- \|
	\| spatial handoff vs released baseline \| `artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json` \| baseline mean success `0.5833`, handoff mean success `0.2167` \|
	\| spatial-trained checkpoint with compact world model vs released baseline \| `artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json` \| baseline mean success `0.5833`, handoff mean success `0.5200` \|
	\| compact-phase vs released baseline \| `artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json` \| baseline mean success `0.5833`, compact-phase mean success `0.5133` \|
	\| spatial-phase with compact world model vs released baseline \| `artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json` \| baseline mean success `0.5833`, spatial-phase compact-world-model mean success `0.4933` \|

	### Proxy ablations

	\| Artifact \| File \| Raw values \|
	\| --- \| --- \| --- \|
	\| compact-phase ablations \| `artifacts/reports/reveal_phase_ablations_compact/ablations.json` \| full `0.5133`, `no_geometry` `0.5133`, `no_spatial_memory` `0.4967`, `compact_world_model` `0.5133`, `no_planner` `0.4333`, `gaussian_candidates_only` `0.4667`, `no_task_head` `0.5133`, `no_support_mode_conditioning` `0.5133` \|

	### RLBench direct-policy runs

	\| Artifact \| File \| Raw values \|
	\| --- \| --- \| --- \|
	\| lift-ball wide checkpoint, one-step replanning \| `artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json` \| mean success `0.0`, mean return `0.0`, path recoveries `[148]`, noop fallbacks `[11]` \|
	\| push-box step-1 checkpoint, one-step replanning \| `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json` \| mean success `0.0`, mean return `0.0`, path recoveries `[177]`, noop fallbacks `[0]` \|
	\| push-box step-1 checkpoint, one-step replanning, `delta_scale=0.05` \| `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json` \| mean success `0.0`, mean return `0.0`, path recoveries `[180]`, noop fallbacks `[0]` \|

	### RLBench retrieval runs

	\| Artifact \| File \| Raw values \|
	\| --- \| --- \| --- \|
	\| push-box kNN, `bank_stride=4`, `top_k=5`, `time_window=8`, `episodes=1` \| `artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json` \| mean success `1.0`, mean return `1.0`, bank size `2815` \|
	\| push-box kNN, `bank_stride=4`, `top_k=5`, `time_window=8`, `episodes=5` \| `artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json` \| successes `[0.0, 1.0, 0.0, 0.0, 0.0]`, mean success `0.2`, bank size `2815` \|
	\| push-box kNN, `bank_stride=1`, `top_k=1`, `time_window=4`, `episodes=5` \| `artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json` \| successes `[0.0, 0.0, 1.0, 1.0, 0.0]`, mean success `0.4`, bank size `11259` \|

	## Environment Recreation Files

	- `environment/setup_same_machine.sh`
	- `environment/validate_same_machine.sh`
	- `environment/run_peract2_13_rollouts.sh`
	- `environment/runtime_env_vars.sh`
	- `environment/hardware_snapshot.txt`
	- `environment/glxinfo_B.txt`
	- `environment/upstream_revisions.txt`
	- `environment/system_packages_same_machine.txt`
	- `environment/rlbench_env_export.yaml`
	- `environment/rlbench_env_explicit.txt`
	- `environment/rlbench_pip_freeze.txt`
	- `environment/reveal_env_export.yaml`
	- `environment/reveal_env_explicit.txt`
	- `environment/reveal_pip_freeze.txt`

	Detailed raw tables for the `2026-03-25/26` work are in `results/session_results_20260326.md`.