Reproducible LIBERO-Spatial eval for SmolVLA — 79/100 with full manifest

by indravardhan - opened 30 days ago

I ran SmolVLA (HuggingFaceVLA/smolvla_libero) through a new evaluation harness called roboeval and got 79/100 on LIBERO-Spatial with n_action_steps=1 (the closed-loop setting from lerobot issue #2354, matching what the paper authors use).

The result file includes a full reproducibility manifest: pip freeze, GPU (RTX 5090), driver, CUDA, seed 0, git SHA, and a content-addressed hash. You can re-verify it in one command:

pip install -e ".[dev]" roboeval validate results/smolvla_libero_spatial.json

The repo is at github.com/ActuallyIR/roboeval. It's designed to be the lm-eval-harness equivalent for robot policies — one JSON schema, mandatory manifests, plugin-based policies and suites.

Happy to hear if your internal numbers differ and why. That's exactly the kind of feedback this is built for.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment