VLAarchtests

Bundle uploaded from /workspace runpod sessions dated 2026-03-25 UTC and 2026-03-26 UTC.

Top-Level Contents

  • code/reveal_vla_bimanual/
    • project code used for the proxy and RLBench runs in this bundle
  • artifacts/data/reveal_proxy/
    • proxy dataset bundles used by the handoff runs
  • artifacts/outputs/r3d/
    • previously uploaded R3D proxy outputs already present in the bundle
  • artifacts/outputs/r3d_handoff/
    • handoff proxy checkpoints
  • artifacts/outputs/r3d_handoff_phase/
    • phase-supervised handoff proxy checkpoints
  • artifacts/outputs/rlbench_current/
    • RLBench checkpoints from the current session
  • artifacts/reports/
    • proxy and RLBench result files copied from /workspace/reports
  • environment/
    • same-machine setup files and validation helpers
  • tests/
    • local test suite
  • handoff/instructions.md
    • instruction file used for the handoff work
  • MODEL_INDEX.md
    • checkpoint and result index
  • results/session_results_20260326.md
    • raw result tables for the 2026-03-25/26 work

Code Added Or Updated

Core model, memory, planner, and dataset paths

  • code/reveal_vla_bimanual/models/backbones.py
  • code/reveal_vla_bimanual/models/multiview_fusion.py
  • code/reveal_vla_bimanual/models/observation_memory.py
  • code/reveal_vla_bimanual/models/reveal_head.py
  • code/reveal_vla_bimanual/models/world_model.py
  • code/reveal_vla_bimanual/models/action_decoder.py
  • code/reveal_vla_bimanual/models/planner.py
  • code/reveal_vla_bimanual/models/policy.py
  • code/reveal_vla_bimanual/train/losses.py
  • code/reveal_vla_bimanual/sim_reveal/dataset.py
  • code/reveal_vla_bimanual/sim_reveal/procedural_envs.py
  • code/reveal_vla_bimanual/sim_rlbench/dataset.py

Training and evaluation paths

  • code/reveal_vla_bimanual/train/run_rlbench_experiment.py
  • code/reveal_vla_bimanual/eval/run_reveal_benchmark.py
  • code/reveal_vla_bimanual/eval/run_ablations.py
  • code/reveal_vla_bimanual/eval/run_teacher_audit.py
  • code/reveal_vla_bimanual/eval/run_rlbench_rollout_eval.py
  • code/reveal_vla_bimanual/eval/run_rlbench_knn_eval.py

Added or updated training configs

  • code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact.yaml
  • code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial.yaml
  • code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase.yaml
  • code/reveal_vla_bimanual/train/configs/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial_phase.yaml
  • code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_valid9.yaml
  • code/reveal_vla_bimanual/train/configs/rlbench_subset3_backbone_only_clip_current_common23.yaml
  • code/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_current_wide.yaml
  • code/reveal_vla_bimanual/train/configs/rlbench_lift_ball_backbone_only_clip_step1.yaml
  • code/reveal_vla_bimanual/train/configs/rlbench_push_box_backbone_only_clip_step1.yaml

Test files

The staged tests/ directory contains 32 test modules plus conftest.py, including:

  • geometry and camera rotation coverage
  • phase-label and candidate-ranking coverage
  • planner gradient-flow and reocclusion gating coverage
  • world-model null-rollout, field-consistency, and task-adapter coverage
  • proxy scripted benchmark and teacher-audit coverage

Verification

  • local test command:
    • PYTHONPATH=/workspace/VLAarchtests_work/code/reveal_vla_bimanual python -m pytest -q /workspace/VLAarchtests_work/tests
  • result:
    • 33 passed

Raw Result Files

Proxy and handoff results

  • artifacts/reports/reveal_smoke_mod/reveal_benchmark.json
  • artifacts/reports/reveal_smoke_nogeom/reveal_benchmark.json
  • artifacts/reports/reveal_smoke_noplanner/reveal_benchmark.json
  • artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json
  • artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json
  • artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json
  • artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json
  • artifacts/reports/reveal_phase_ablations_compact/ablations.json
  • artifacts/reports/reveal_teacher_audit_serious/teacher_audit.json

RLBench result files

  • artifacts/reports/rlbench_dual_buttons_baseline_len100_ep1_ik_rescale/rollout_eval.json
  • artifacts/reports/rlbench_dual_buttons_common23_len100_ep1_ik_rescale/rollout_eval.json
  • artifacts/reports/rlbench_push_box_common23_len100_ep1_ik_rescale/rollout_eval.json
  • artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json
  • artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json
  • artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json
  • artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json
  • artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json
  • artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json

Raw Result Tables

Proxy serious runs

Artifact File Raw values
spatial handoff vs released baseline artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json baseline mean success 0.5833, handoff mean success 0.2167
spatial-trained checkpoint with compact world model vs released baseline artifacts/reports/reveal_handoff_compare_serious_compact/reveal_benchmark.json baseline mean success 0.5833, handoff mean success 0.5200
compact-phase vs released baseline artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json baseline mean success 0.5833, compact-phase mean success 0.5133
spatial-phase with compact world model vs released baseline artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json baseline mean success 0.5833, spatial-phase compact-world-model mean success 0.4933

Proxy ablations

Artifact File Raw values
compact-phase ablations artifacts/reports/reveal_phase_ablations_compact/ablations.json full 0.5133, no_geometry 0.5133, no_spatial_memory 0.4967, compact_world_model 0.5133, no_planner 0.4333, gaussian_candidates_only 0.4667, no_task_head 0.5133, no_support_mode_conditioning 0.5133

RLBench direct-policy runs

Artifact File Raw values
lift-ball wide checkpoint, one-step replanning artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json mean success 0.0, mean return 0.0, path recoveries [148], noop fallbacks [11]
push-box step-1 checkpoint, one-step replanning artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json mean success 0.0, mean return 0.0, path recoveries [177], noop fallbacks [0]
push-box step-1 checkpoint, one-step replanning, delta_scale=0.05 artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json mean success 0.0, mean return 0.0, path recoveries [180], noop fallbacks [0]

RLBench retrieval runs

Artifact File Raw values
push-box kNN, bank_stride=4, top_k=5, time_window=8, episodes=1 artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json mean success 1.0, mean return 1.0, bank size 2815
push-box kNN, bank_stride=4, top_k=5, time_window=8, episodes=5 artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json successes [0.0, 1.0, 0.0, 0.0, 0.0], mean success 0.2, bank size 2815
push-box kNN, bank_stride=1, top_k=1, time_window=4, episodes=5 artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json successes [0.0, 0.0, 1.0, 1.0, 0.0], mean success 0.4, bank size 11259

Environment Recreation Files

  • environment/setup_same_machine.sh
  • environment/validate_same_machine.sh
  • environment/run_peract2_13_rollouts.sh
  • environment/runtime_env_vars.sh
  • environment/hardware_snapshot.txt
  • environment/glxinfo_B.txt
  • environment/upstream_revisions.txt
  • environment/system_packages_same_machine.txt
  • environment/rlbench_env_export.yaml
  • environment/rlbench_env_explicit.txt
  • environment/rlbench_pip_freeze.txt
  • environment/reveal_env_export.yaml
  • environment/reveal_env_explicit.txt
  • environment/reveal_pip_freeze.txt

Detailed raw tables for the 2026-03-25/26 work are in results/session_results_20260326.md.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading