Spaces:

lerobot
/

README

Running

App Files Files Community

Reproducible LIBERO-Spatial eval for SmolVLA — 79/100 with full manifest

#11

by indravardhan - opened 19 days ago

Discussion

indravardhan

19 days ago

•

edited 19 days ago

I ran SmolVLA (HuggingFaceVLA/smolvla_libero) through a new evaluation harness called roboeval and got 79/100 on LIBERO-Spatial with n_action_steps=1 (the closed-loop setting from lerobot issue #2354, matching what the paper authors use).

The result file includes a full reproducibility manifest: pip freeze, GPU (RTX 5090), driver, CUDA, seed 0, git SHA, and a content-addressed hash. You can re-verify it in one command:
'''
pip install -e ".[dev]"
roboeval validate results/smolvla_libero_spatial.json
'''
The repo is at github.com/ActuallyIR/roboeval. It's designed to be the lm-eval-harness equivalent for robot policies — one JSON schema, mandatory manifests, plugin-based policies and suites.

Happy to hear if your internal numbers differ and why. That's exactly the kind of feedback this is built for.

--> github.com/ActuallyIR/roboeval

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment