RLBench Custom Subset Eval
Date: 2026-03-23 UTC
Scope:
- Local 3-task RLBench subset:
bimanual_lift_ball,bimanual_push_box,bimanual_dual_push_buttons - Train episodes:
episode0 - Validation episodes:
episode1 - Observation interface:
front,wrist_left,wrist_rightat224x224 - Policy action format: 14-D bimanual delta pose + gripper commands, executed through RLBench bimanual end-effector planning
- Backbone used in these custom runs: the existing 128-d dummy frozen backbone, initialized from the previously trained reveal-proxy checkpoints
Offline training results:
- Backbone-only: train total
0.0079247, val total0.0056060 - Reveal-state: train total
0.0078287, val total0.0091639
Live bounded rollout results:
- Backbone-only,
plan=false: mean success0.000 - Reveal-state,
plan=false: mean success0.000 - Reveal-state,
plan=true: mean success0.000
Per-task live rollout success:
bimanual_lift_ball:0.0for all three runsbimanual_push_box:0.0for all three runsbimanual_dual_push_buttons:0.0for all three runs
Interpretation:
- The missing RLBench-side custom trainer/eval path is now implemented and tested.
- The bounded custom subset runs do not support a go decision. They fit the tiny offline slice but do not produce any short-horizon task success in live RLBench rollouts.
- On this subset, the reveal-state model is not better than the backbone-only model, and enabling planning does not recover success.
- These are not paper-scale results. They are bounded diagnostic runs on a repaired local subset, not a credible full PerAct2 reproduction or a full custom-model benchmark.
Key artifacts:
- Backbone-only summary:
/workspace/outputs/rlbench_custom/rlbench_subset3_backbone_only_dummy/summary.json - Reveal-state summary:
/workspace/outputs/rlbench_custom/rlbench_subset3_reveal_state_dummy/summary.json - Backbone-only rollout:
/workspace/reports/rlbench_custom/backbone_only_rollout/rollout_eval.json - Reveal-state rollout, no plan:
/workspace/reports/rlbench_custom/reveal_state_rollout_noplan/rollout_eval.json - Reveal-state rollout, plan enabled:
/workspace/reports/rlbench_custom/reveal_state_rollout_plan/rollout_eval.json