Reproducible LIBERO-Spatial eval for SmolVLA โ€” 79/100 with full manifest

#11
by indravardhan - opened

I ran SmolVLA (HuggingFaceVLA/smolvla_libero) through a new evaluation harness called roboeval and got 79/100 on LIBERO-Spatial with n_action_steps=1 (the closed-loop setting from lerobot issue #2354, matching what the paper authors use).

The result file includes a full reproducibility manifest: pip freeze, GPU (RTX 5090), driver, CUDA, seed 0, git SHA, and a content-addressed hash. You can re-verify it in one command:
'''
pip install -e ".[dev]"
roboeval validate results/smolvla_libero_spatial.json
'''
The repo is at github.com/ActuallyIR/roboeval. It's designed to be the lm-eval-harness equivalent for robot policies โ€” one JSON schema, mandatory manifests, plugin-based policies and suites.

Happy to hear if your internal numbers differ and why. That's exactly the kind of feedback this is built for.

--> github.com/ActuallyIR/roboeval

Sign up or log in to comment