Spaces:
Running
Reproducible LIBERO-Spatial eval for SmolVLA โ 79/100 with full manifest
I ran SmolVLA (HuggingFaceVLA/smolvla_libero) through a new evaluation harness called roboeval and got 79/100 on LIBERO-Spatial with n_action_steps=1 (the closed-loop setting from lerobot issue #2354, matching what the paper authors use).
The result file includes a full reproducibility manifest: pip freeze, GPU (RTX 5090), driver, CUDA, seed 0, git SHA, and a content-addressed hash. You can re-verify it in one command:
'''
pip install -e ".[dev]"
roboeval validate results/smolvla_libero_spatial.json
'''
The repo is at github.com/ActuallyIR/roboeval. It's designed to be the lm-eval-harness equivalent for robot policies โ one JSON schema, mandatory manifests, plugin-based policies and suites.
Happy to hear if your internal numbers differ and why. That's exactly the kind of feedback this is built for.
--> github.com/ActuallyIR/roboeval