File size: 2,349 Bytes
16405f2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Regression Baselines

Snapshot source: `/workspace/VLAarchtests/README.md` plus the committed artifact JSONs under `/workspace/VLAarchtests/artifacts/outputs`.

## Proxy benchmarks

- Dummy action-history benchmark:
  - interaction: `0.5278` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_interaction_actionhist_commit4/reveal_benchmark.json`
  - backbone: `0.5556` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_old_no_leak_baselines_commit4/reveal_benchmark.json`
  - reveal: `0.5417` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_old_no_leak_baselines_commit4/reveal_benchmark.json`
- CLIP action-history benchmark:
  - interaction_clip: `0.3056` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_interaction_clip_commit4_compare/reveal_benchmark.json`
  - backbone_clip: `0.3333` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_clip_baselines_commit4/reveal_benchmark.json`
  - reveal_clip: `0.2083` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_clip_baselines_commit4/reveal_benchmark.json`

## Action-history ablations

- full_model: `0.5278`
- no_interaction_head: `0.3889`
- no_world_model: `0.5278`
- no_planner: `0.5278`
- no_role_tokens: `0.5278`
- short_history: `0.5417`

JSON path: `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/ablation_none_actionhist/ablations.json`

## Diagnostics

- planner_top1_accuracy: `0.1985`
- planner_regret: `0.2120`

JSON path: `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/proxy_interaction_state_actionhist/diagnostics/proxy_diagnostics.json`

## Integration baselines

- RLBench open-drawer rollout:
  - mean_success: `0.0`
  - error: `"A path could not be found because the target is outside of workspace."`
  - JSON: `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/rlbench_open_drawer_rollout_eval_commit4_rerun/rollout_eval.json`
- PerAct2 13-task sweep:
  - no-plan mean_success: `0.0`
  - planner mean_success: `0.0`
  - JSON roots:
    - `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/peract2_13_rollout_noplan_split/rollout_eval.json`
    - `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/peract2_13_rollout_plan_split/rollout_eval.json`