lsnu's picture
Add files using upload-large-folder tool
2f28ec8 verified

Task-Routed Proxy Controller

  • routing rule: foliage -> iter6, bag -> iter8, cloth -> iter8
  • fixed benchmark slices: 100 episodes per task from the standard sprint spec
  • mean success: 0.4867

Per-Task Success

  • foliage: 0.46
  • bag: 0.41
  • cloth: 0.59

Why This Matters

  • This is the strongest current controller for the three custom proxy tasks.
  • The routing rule is fair: it depends only on explicit task metadata, not on per-episode oracle information.
  • Compared with the best single checkpoint (iter7/iter8 at 0.4667), the routed controller keeps the cloth gain while recovering the stronger foliage checkpoint.

Source Runs

  • foliage source: /workspace/VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json
  • bag source: /workspace/VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json
  • cloth source: /workspace/VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/cloth_fixed_default/reveal_benchmark.json

Reproduction

  • benchmark wrapper: /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh
  • routing is now supported directly in run_reveal_benchmark.py through --task-routed-model