# RLBench Custom Subset Eval

Date: 2026-03-23 UTC

Scope:
- Local 3-task RLBench subset: `bimanual_lift_ball`, `bimanual_push_box`, `bimanual_dual_push_buttons`
- Train episodes: `episode0`
- Validation episodes: `episode1`
- Observation interface: `front`, `wrist_left`, `wrist_right` at `224x224`
- Policy action format: 14-D bimanual delta pose + gripper commands, executed through RLBench bimanual end-effector planning
- Backbone used in these custom runs: the existing 128-d dummy frozen backbone, initialized from the previously trained reveal-proxy checkpoints

Offline training results:
- Backbone-only: train total `0.0079247`, val total `0.0056060`
- Reveal-state: train total `0.0078287`, val total `0.0091639`

Live bounded rollout results:
- Backbone-only, `plan=false`: mean success `0.000`
- Reveal-state, `plan=false`: mean success `0.000`
- Reveal-state, `plan=true`: mean success `0.000`

Per-task live rollout success:
- `bimanual_lift_ball`: `0.0` for all three runs
- `bimanual_push_box`: `0.0` for all three runs
- `bimanual_dual_push_buttons`: `0.0` for all three runs

Interpretation:
- The missing RLBench-side custom trainer/eval path is now implemented and tested.
- The bounded custom subset runs do not support a go decision. They fit the tiny offline slice but do not produce any short-horizon task success in live RLBench rollouts.
- On this subset, the reveal-state model is not better than the backbone-only model, and enabling planning does not recover success.
- These are not paper-scale results. They are bounded diagnostic runs on a repaired local subset, not a credible full PerAct2 reproduction or a full custom-model benchmark.

Key artifacts:
- Backbone-only summary: `/workspace/outputs/rlbench_custom/rlbench_subset3_backbone_only_dummy/summary.json`
- Reveal-state summary: `/workspace/outputs/rlbench_custom/rlbench_subset3_reveal_state_dummy/summary.json`
- Backbone-only rollout: `/workspace/reports/rlbench_custom/backbone_only_rollout/rollout_eval.json`
- Reveal-state rollout, no plan: `/workspace/reports/rlbench_custom/reveal_state_rollout_noplan/rollout_eval.json`
- Reveal-state rollout, plan enabled: `/workspace/reports/rlbench_custom/reveal_state_rollout_plan/rollout_eval.json`