Regression Baselines
Snapshot source: /workspace/VLAarchtests/README.md plus the committed artifact JSONs under /workspace/VLAarchtests/artifacts/outputs.
Proxy benchmarks
- Dummy action-history benchmark:
- interaction:
0.5278from/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_interaction_actionhist_commit4/reveal_benchmark.json - backbone:
0.5556from/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_old_no_leak_baselines_commit4/reveal_benchmark.json - reveal:
0.5417from/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_old_no_leak_baselines_commit4/reveal_benchmark.json
- interaction:
- CLIP action-history benchmark:
- interaction_clip:
0.3056from/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_interaction_clip_commit4_compare/reveal_benchmark.json - backbone_clip:
0.3333from/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_clip_baselines_commit4/reveal_benchmark.json - reveal_clip:
0.2083from/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_clip_baselines_commit4/reveal_benchmark.json
- interaction_clip:
Action-history ablations
- full_model:
0.5278 - no_interaction_head:
0.3889 - no_world_model:
0.5278 - no_planner:
0.5278 - no_role_tokens:
0.5278 - short_history:
0.5417
JSON path: /workspace/VLAarchtests/artifacts/outputs/interaction_debug/ablation_none_actionhist/ablations.json
Diagnostics
- planner_top1_accuracy:
0.1985 - planner_regret:
0.2120
JSON path: /workspace/VLAarchtests/artifacts/outputs/interaction_debug/proxy_interaction_state_actionhist/diagnostics/proxy_diagnostics.json
Integration baselines
- RLBench open-drawer rollout:
- mean_success:
0.0 - error:
"A path could not be found because the target is outside of workspace." - JSON:
/workspace/VLAarchtests/artifacts/outputs/interaction_debug/rlbench_open_drawer_rollout_eval_commit4_rerun/rollout_eval.json
- mean_success:
- PerAct2 13-task sweep:
- no-plan mean_success:
0.0 - planner mean_success:
0.0 - JSON roots:
/workspace/VLAarchtests/artifacts/outputs/interaction_debug/peract2_13_rollout_noplan_split/rollout_eval.json/workspace/VLAarchtests/artifacts/outputs/interaction_debug/peract2_13_rollout_plan_split/rollout_eval.json
- no-plan mean_success: