lsnu
/

VLAarchtests

vision-language-action

bimanual-manipulation

Model card Files Files and versions

Metrics Training metrics Community

VLAarchtests / regression /baselines.md

lsnu's picture

Add files using upload-large-folder tool

16405f2 verified 5 days ago

|

history blame contribute delete

2.35 kB

	# Regression Baselines

	Snapshot source: `/workspace/VLAarchtests/README.md` plus the committed artifact JSONs under `/workspace/VLAarchtests/artifacts/outputs`.

	## Proxy benchmarks

	- Dummy action-history benchmark:
	- interaction: `0.5278` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_interaction_actionhist_commit4/reveal_benchmark.json`
	- backbone: `0.5556` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_old_no_leak_baselines_commit4/reveal_benchmark.json`
	- reveal: `0.5417` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_old_no_leak_baselines_commit4/reveal_benchmark.json`
	- CLIP action-history benchmark:
	- interaction_clip: `0.3056` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_interaction_clip_commit4_compare/reveal_benchmark.json`
	- backbone_clip: `0.3333` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_clip_baselines_commit4/reveal_benchmark.json`
	- reveal_clip: `0.2083` from `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/reveal_eval_clip_baselines_commit4/reveal_benchmark.json`

	## Action-history ablations

	- full_model: `0.5278`
	- no_interaction_head: `0.3889`
	- no_world_model: `0.5278`
	- no_planner: `0.5278`
	- no_role_tokens: `0.5278`
	- short_history: `0.5417`

	JSON path: `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/ablation_none_actionhist/ablations.json`

	## Diagnostics

	- planner_top1_accuracy: `0.1985`
	- planner_regret: `0.2120`

	JSON path: `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/proxy_interaction_state_actionhist/diagnostics/proxy_diagnostics.json`

	## Integration baselines

	- RLBench open-drawer rollout:
	- mean_success: `0.0`
	- error: `"A path could not be found because the target is outside of workspace."`
	- JSON: `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/rlbench_open_drawer_rollout_eval_commit4_rerun/rollout_eval.json`
	- PerAct2 13-task sweep:
	- no-plan mean_success: `0.0`
	- planner mean_success: `0.0`
	- JSON roots:
	- `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/peract2_13_rollout_noplan_split/rollout_eval.json`
	- `/workspace/VLAarchtests/artifacts/outputs/interaction_debug/peract2_13_rollout_plan_split/rollout_eval.json`