File size: 11,976 Bytes
bdc5558
 
de2fd70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16405f2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63a70c7
16405f2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# Model Index

## 2026-03-25/26 Additions

### Handoff Proxy Checkpoints

| Run | Checkpoint | Summary | Report |
| --- | --- | --- | --- |
| spatial handoff | `artifacts/outputs/r3d_handoff/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial_seed17/checkpoint_best.pt` | `artifacts/outputs/r3d_handoff/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial_seed17/summary.json` | `artifacts/reports/reveal_handoff_compare_serious/reveal_benchmark.json` |
| compact handoff | `artifacts/outputs/r3d_handoff/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_seed17/checkpoint_best.pt` | `artifacts/outputs/r3d_handoff/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_seed17/summary.json` | `artifacts/reports/reveal_handoff_compact_train_probe/reveal_benchmark.json` |
| compact-phase handoff | `artifacts/outputs/r3d_handoff_phase/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_seed17/checkpoint_best.pt` | `artifacts/outputs/r3d_handoff_phase/proxy_interaction_r3d_stage3_clip_rgbd_handoff_compact_phase_seed17/summary.json` | `artifacts/reports/reveal_phase_compare_serious_compact/reveal_benchmark.json` |
| spatial-phase handoff | `artifacts/outputs/r3d_handoff_phase/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial_phase_seed17/checkpoint_best.pt` | `artifacts/outputs/r3d_handoff_phase/proxy_interaction_r3d_stage3_clip_rgbd_handoff_spatial_phase_seed17/summary.json` | `artifacts/reports/reveal_phase_compare_serious_spatial_compactwm/reveal_benchmark.json` |

### RLBench Current Checkpoints

| Run | Checkpoint | Related files |
| --- | --- | --- |
| subset3 valid9 | `artifacts/outputs/rlbench_current/rlbench_subset3_backbone_only_clip_current_valid9/checkpoint_best.pt` | `artifacts/outputs/rlbench_current/rlbench_subset3_backbone_only_clip_current_valid9/checkpoint_stable.pt` |
| subset3 common23 | `artifacts/outputs/rlbench_current/rlbench_subset3_backbone_only_clip_current_common23/checkpoint_best.pt` | `artifacts/outputs/rlbench_current/rlbench_subset3_backbone_only_clip_current_common23/checkpoint_stable.pt` |
| lift-ball wide | `artifacts/outputs/rlbench_current/rlbench_lift_ball_backbone_only_clip_current_wide/checkpoint_best.pt` | `artifacts/outputs/rlbench_current/rlbench_lift_ball_backbone_only_clip_current_wide/checkpoint_stable.pt` |
| push-box step1 | `artifacts/outputs/rlbench_current/rlbench_push_box_backbone_only_clip_step1/checkpoint_best.pt` | `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json`, `artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json` |

### RLBench Result Files

| Artifact | File |
| --- | --- |
| lift-ball wide, one-step replanning | `artifacts/reports/rlbench_lift_ball_wide_len160_ep1_ik_c1/rollout_eval.json` |
| push-box step1, one-step replanning | `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1/rollout_eval.json` |
| push-box step1, one-step replanning, `delta_scale=0.05` | `artifacts/reports/rlbench_push_box_step1_ep1_ik_c1_s005/rollout_eval.json` |
| push-box kNN, `episodes=1` | `artifacts/reports/rlbench_push_box_knn_step1_ep1/rollout_eval.json` |
| push-box kNN, `episodes=5`, `top_k=5` | `artifacts/reports/rlbench_push_box_knn_step1_ep5/rollout_eval.json` |
| push-box kNN, `episodes=5`, `top_k=1`, dense bank | `artifacts/reports/rlbench_push_box_knn_step1_ep5_top1_dense/rollout_eval.json` |

## R3D Proxy Runs

| Run | Config | Seed | Checkpoint | Summary | Benchmark | Diagnostics |
| --- | --- | ---: | --- | --- | --- | --- |
| stage1 dummy | `proxy_interaction_r3d_stage1_dummy.yaml` | 13 | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_dummy_seed13/checkpoint_best.pt` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_dummy_seed13/summary.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_dummy_seed13/benchmark_full/reveal_benchmark.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_dummy_seed13/diagnostics_full/proxy_diagnostics.json` |
| stage1 dummy | `proxy_interaction_r3d_stage1_dummy.yaml` | 14 | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_dummy_seed14/checkpoint_best.pt` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_dummy_seed14/summary.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_dummy_seed14/benchmark_full/reveal_benchmark.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_dummy_seed14/diagnostics_full/proxy_diagnostics.json` |
| stage1 dummy | `proxy_interaction_r3d_stage1_dummy.yaml` | 15 | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_dummy_seed15/checkpoint_best.pt` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_dummy_seed15/summary.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_dummy_seed15/benchmark_full/reveal_benchmark.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_dummy_seed15/diagnostics_full/proxy_diagnostics.json` |
| stage2 dummy | `proxy_interaction_r3d_stage2_dummy.yaml` | 21 | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_dummy_seed21/checkpoint_best.pt` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_dummy_seed21/summary.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_dummy_seed21/benchmark_full/reveal_benchmark.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_dummy_seed21/diagnostics_full/proxy_diagnostics.json` |
| stage2 dummy | `proxy_interaction_r3d_stage2_dummy.yaml` | 22 | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_dummy_seed22/checkpoint_best.pt` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_dummy_seed22/summary.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_dummy_seed22/benchmark_full/reveal_benchmark.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_dummy_seed22/diagnostics_full/proxy_diagnostics.json` |
| stage2 dummy | `proxy_interaction_r3d_stage2_dummy.yaml` | 23 | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_dummy_seed23/checkpoint_best.pt` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_dummy_seed23/summary.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_dummy_seed23/benchmark_full/reveal_benchmark.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_dummy_seed23/diagnostics_full/proxy_diagnostics.json` |
| stage1 clip | `proxy_interaction_r3d_stage1_clip.yaml` | 7 | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_clip_seed7/checkpoint_best.pt` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_clip_seed7/summary.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_clip_seed7/benchmark_full/reveal_benchmark.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_clip_seed7/diagnostics_full/proxy_diagnostics.json` |
| stage1 clip | `proxy_interaction_r3d_stage1_clip.yaml` | 8 | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_clip_seed8/checkpoint_best.pt` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_clip_seed8/summary.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_clip_seed8/benchmark_full/reveal_benchmark.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_clip_seed8/diagnostics_full/proxy_diagnostics.json` |
| stage1 clip | `proxy_interaction_r3d_stage1_clip.yaml` | 9 | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_clip_seed9/checkpoint_best.pt` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_clip_seed9/summary.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_clip_seed9/benchmark_full/reveal_benchmark.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_clip_seed9/diagnostics_full/proxy_diagnostics.json` |
| stage2 clip | `proxy_interaction_r3d_stage2_clip.yaml` | 11 | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_clip_seed11/checkpoint_best.pt` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_clip_seed11/summary.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_clip_seed11/benchmark_full/reveal_benchmark.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_clip_seed11/diagnostics_full/proxy_diagnostics.json` |
| stage2 clip | `proxy_interaction_r3d_stage2_clip.yaml` | 12 | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_clip_seed12/checkpoint_best.pt` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_clip_seed12/summary.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_clip_seed12/benchmark_full/reveal_benchmark.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_clip_seed12/diagnostics_full/proxy_diagnostics.json` |
| stage2 clip | `proxy_interaction_r3d_stage2_clip.yaml` | 13 | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_clip_seed13/checkpoint_best.pt` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_clip_seed13/summary.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_clip_seed13/benchmark_full/reveal_benchmark.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_clip_seed13/diagnostics_full/proxy_diagnostics.json` |
| stage3 clip rgbd | `proxy_interaction_r3d_stage3_clip_rgbd.yaml` | 17 | `artifacts/outputs/r3d/proxy_interaction_r3d_stage3_clip_rgbd_seed17/checkpoint_best.pt` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage3_clip_rgbd_seed17/summary.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage3_clip_rgbd_seed17/benchmark_full/reveal_benchmark.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage3_clip_rgbd_seed17/diagnostics_full/proxy_diagnostics.json` |
| stage3 clip rgbd | `proxy_interaction_r3d_stage3_clip_rgbd.yaml` | 18 | `artifacts/outputs/r3d/proxy_interaction_r3d_stage3_clip_rgbd_seed18/checkpoint_best.pt` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage3_clip_rgbd_seed18/summary.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage3_clip_rgbd_seed18/benchmark_full/reveal_benchmark.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage3_clip_rgbd_seed18/diagnostics_full/proxy_diagnostics.json` |
| stage3 clip rgbd | `proxy_interaction_r3d_stage3_clip_rgbd.yaml` | 19 | `artifacts/outputs/r3d/proxy_interaction_r3d_stage3_clip_rgbd_seed19/checkpoint_best.pt` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage3_clip_rgbd_seed19/summary.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage3_clip_rgbd_seed19/benchmark_full/reveal_benchmark.json` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage3_clip_rgbd_seed19/diagnostics_full/proxy_diagnostics.json` |

## Ablation Benchmark Files

| Ablation | File |
| --- | --- |
| stage1 dummy `no_planner` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_dummy_seed13/benchmark_no_planner/reveal_benchmark.json` |
| stage1 dummy `no_role_symmetry` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage1_dummy_seed13/benchmark_no_role_symmetry/reveal_benchmark.json` |
| stage2 dummy `no_world_model` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_dummy_seed21/benchmark_no_world_model/reveal_benchmark.json` |
| stage2 dummy `no_world_model` pre-fix backup | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_dummy_seed21/benchmark_no_world_model/reveal_benchmark_pre_null_rollout_fix.json` |
| stage2 dummy `short_history` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage2_dummy_seed21/benchmark_short_history/reveal_benchmark.json` |
| stage3 clip RGB-D `no_depth` | `artifacts/outputs/r3d/proxy_interaction_r3d_stage3_clip_rgbd_seed17/benchmark_no_depth/reveal_benchmark.json` |

Equivalent files exist under the other seed directories.

## Integration Artifacts

| Artifact | File |
| --- | --- |
| RLBench import/config smoke | `artifacts/outputs/r3d/rlbench_smokes/smoke_test_output.txt` |
| RLBench `open_drawer` launch smoke | `artifacts/outputs/r3d/rlbench_smokes/launch_smoke_open_drawer.txt` |
| RLBench `open_drawer` rollout | `artifacts/outputs/r3d/rlbench_open_drawer_r3d_rollout/rollout_eval.json` |
| PerAct2 13-task launch smoke summary | `artifacts/outputs/r3d/peract2_13_launch_smoke/launch_smoke_summary.json` |

## Historical References

| File | Purpose |
| --- | --- |
| `regression/baselines.md` | historical baseline metrics from the downloaded snapshot |
| `results/phase_tracking.md` | phase-by-phase acceptance tracking |