# Task-Routed Proxy Controller - routing rule: `foliage -> iter6`, `bag -> iter8`, `cloth -> iter8` - fixed benchmark slices: 100 episodes per task from the standard sprint spec - mean success: `0.4867` ## Per-Task Success - foliage: `0.46` - bag: `0.41` - cloth: `0.59` ## Why This Matters - This is the strongest current controller for the three custom proxy tasks. - The routing rule is fair: it depends only on explicit task metadata, not on per-episode oracle information. - Compared with the best single checkpoint (`iter7`/`iter8` at `0.4667`), the routed controller keeps the cloth gain while recovering the stronger foliage checkpoint. ## Source Runs - foliage source: `/workspace/VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json` - bag source: `/workspace/VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json` - cloth source: `/workspace/VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/cloth_fixed_default/reveal_benchmark.json` ## Reproduction - benchmark wrapper: `/workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh` - routing is now supported directly in `run_reveal_benchmark.py` through `--task-routed-model`