lsnu
/

VLAarchtests2

Model card Files Files and versions

Metrics Training metrics Community

VLAarchtests2 / VLAarchtests /artifacts /reports /task_routed_proxy_v1 /summary.md

lsnu's picture

Add files using upload-large-folder tool

2f28ec8 verified 2 months ago

|

history blame contribute delete

1.26 kB

Task-Routed Proxy Controller

routing rule: foliage -> iter6, bag -> iter8, cloth -> iter8
fixed benchmark slices: 100 episodes per task from the standard sprint spec
mean success: 0.4867

Per-Task Success

foliage: 0.46
bag: 0.41
cloth: 0.59

Why This Matters

This is the strongest current controller for the three custom proxy tasks.
The routing rule is fair: it depends only on explicit task metadata, not on per-episode oracle information.
Compared with the best single checkpoint (iter7/iter8 at 0.4667), the routed controller keeps the cloth gain while recovering the stronger foliage checkpoint.

Source Runs

foliage source: /workspace/VLAarchtests/artifacts/reports/selector_finetune_v7_iter6/default/reveal_benchmark.json
bag source: /workspace/VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/bag_fixed_default/reveal_benchmark.json
cloth source: /workspace/VLAarchtests/artifacts/reports/selector_finetune_v7_iter8/cloth_fixed_default/reveal_benchmark.json

Reproduction

benchmark wrapper: /workspace/VLAarchtests/code/reveal_vla_bimanual/scripts/run_task_routed_proxy_eval.sh
routing is now supported directly in run_reveal_benchmark.py through --task-routed-model