Task-Routed Proxy Controller
- routing rule:
foliage -> iter6,bag -> iter8,cloth -> iter8 - fixed benchmark slices: 100 episodes per task from the standard sprint spec
- mean success:
0.4867
Per-Task Success
- foliage:
0.46 - bag:
0.41 - cloth:
0.59
Why This Matters
- This is the strongest current controller for the three custom proxy tasks.
- The routing rule is fair: it depends only on explicit task metadata, not on per-episode oracle information.
- Compared with the best single checkpoint (
iter7/iter8at0.4667), the routed controller keeps the cloth gain while recovering the stronger foliage checkpoint.
Source Runs
- foliage source:
/workspace/VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json - bag source:
/workspace/VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json - cloth source:
/workspace/VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/cloth_fixed_default/reveal_benchmark.json
Reproduction
- benchmark wrapper:
/workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh - routing is now supported directly in
run_reveal_benchmark.pythrough--task-routed-model