| # Regression Baselines |
|
|
| Snapshot source: `/workspace/VLAarchtests/README.md` plus the committed artifact JSONs under `/workspace/VLAarchtests/artifacts/outputs`. |
|
|
| ## Proxy benchmarks |
|
|
| - Dummy action-history benchmark: |
| - interaction: `0.5278` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_interaction_actionhist_commit4/reveal_benchmark.json` |
| - backbone: `0.5556` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_old_no_leak_baselines_commit4/reveal_benchmark.json` |
| - reveal: `0.5417` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_old_no_leak_baselines_commit4/reveal_benchmark.json` |
| - CLIP action-history benchmark: |
| - interaction_clip: `0.3056` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_interaction_clip_commit4_compare/reveal_benchmark.json` |
| - backbone_clip: `0.3333` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_clip_baselines_commit4/reveal_benchmark.json` |
| - reveal_clip: `0.2083` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_clip_baselines_commit4/reveal_benchmark.json` |
|
|
| ## Action-history ablations |
|
|
| - full_model: `0.5278` |
| - no_interaction_head: `0.3889` |
| - no_world_model: `0.5278` |
| - no_planner: `0.5278` |
| - no_role_tokens: `0.5278` |
| - short_history: `0.5417` |
| |
| JSON path: `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/ablation_none_actionhist/ablations.json` |
|
|
| ## Diagnostics |
|
|
| - planner_top1_accuracy: `0.1985` |
| - planner_regret: `0.2120` |
| |
| JSON path: `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/proxy_interaction_state_actionhist/diagnostics/proxy_diagnostics.json` |
|
|
| ## Integration baselines |
|
|
| - RLBench open-drawer rollout: |
| - mean_success: `0.0` |
| - error: `"A path could not be found because the target is outside of workspace."` |
| - JSON: `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/rlbench_open_drawer_rollout_eval_commit4_rerun/rollout_eval.json` |
| - PerAct2 13-task sweep: |
| - no-plan mean_success: `0.0` |
| - planner mean_success: `0.0` |
| - JSON roots: |
| - `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/peract2_13_rollout_noplan_split/rollout_eval.json` |
| - `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/peract2_13_rollout_plan_split/rollout_eval.json` |
| |