# RLBench Custom Subset Eval Date: 2026-03-23 UTC Scope: - Local 3-task RLBench subset: `bimanual_lift_ball`, `bimanual_push_box`, `bimanual_dual_push_buttons` - Train episodes: `episode0` - Validation episodes: `episode1` - Observation interface: `front`, `wrist_left`, `wrist_right` at `224x224` - Policy action format: 14-D bimanual delta pose + gripper commands, executed through RLBench bimanual end-effector planning - Backbone used in these custom runs: the existing 128-d dummy frozen backbone, initialized from the previously trained reveal-proxy checkpoints Offline training results: - Backbone-only: train total `0.0079247`, val total `0.0056060` - Reveal-state: train total `0.0078287`, val total `0.0091639` Live bounded rollout results: - Backbone-only, `plan=false`: mean success `0.000` - Reveal-state, `plan=false`: mean success `0.000` - Reveal-state, `plan=true`: mean success `0.000` Per-task live rollout success: - `bimanual_lift_ball`: `0.0` for all three runs - `bimanual_push_box`: `0.0` for all three runs - `bimanual_dual_push_buttons`: `0.0` for all three runs Interpretation: - The missing RLBench-side custom trainer/eval path is now implemented and tested. - The bounded custom subset runs do not support a go decision. They fit the tiny offline slice but do not produce any short-horizon task success in live RLBench rollouts. - On this subset, the reveal-state model is not better than the backbone-only model, and enabling planning does not recover success. - These are not paper-scale results. They are bounded diagnostic runs on a repaired local subset, not a credible full PerAct2 reproduction or a full custom-model benchmark. Key artifacts: - Backbone-only summary: `/workspace/outputs/rlbench_custom/rlbench_subset3_backbone_only_dummy/summary.json` - Reveal-state summary: `/workspace/outputs/rlbench_custom/rlbench_subset3_reveal_state_dummy/summary.json` - Backbone-only rollout: `/workspace/reports/rlbench_custom/backbone_only_rollout/rollout_eval.json` - Reveal-state rollout, no plan: `/workspace/reports/rlbench_custom/reveal_state_rollout_noplan/rollout_eval.json` - Reveal-state rollout, plan enabled: `/workspace/reports/rlbench_custom/reveal_state_rollout_plan/rollout_eval.json`