Robotics
LeRobot
Safetensors
smolvla

Reproducible LIBERO-Spatial eval for SmolVLA β€” 79/100 with full manifest

#1
by indravardhan - opened

I ran SmolVLA (HuggingFaceVLA/smolvla_libero) through a new evaluation harness called roboeval and got 79/100 on LIBERO-Spatial with n_action_steps=1 (the closed-loop setting from lerobot issue #2354, matching what the paper authors use).

The result file includes a full reproducibility manifest: pip freeze, GPU (RTX 5090), driver, CUDA, seed 0, git SHA, and a content-addressed hash. You can re-verify it in one command:

pip install -e ".[dev]" roboeval validate results/smolvla_libero_spatial.json

The repo is at github.com/ActuallyIR/roboeval. It's designed to be the lm-eval-harness equivalent for robot policies β€” one JSON schema, mandatory manifests, plugin-based policies and suites.

Happy to hear if your internal numbers differ and why. That's exactly the kind of feedback this is built for.

Sign up or log in to comment