RLinf
/

RLinf-OpenVLAOFT-GRPO-LIBERO-spatial

Reinforcement Learning

Eval Results (legacy)

Model card Files Files and versions

WinstonWmj0512 commited on Dec 21, 2025

Commit

618a800

·

verified ·

1 Parent(s): d7c77ea

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -76,8 +76,7 @@ We trained four models using RLinf:
 ### Benchmark Results
 Sft models for LIBERO-90 and LIBERO-130 are trained by ourself following training reciepe from [OpenVLA-OFT](https://github.com/moojink/openvla-oft/blob/main/vla-scripts/finetune.py). And other sft models are from [SimpleVLA-RL](https://huggingface.co/collections/Haozhan72/simplevla-rl-6833311430cd9df52aeb1f86).
-  > We conduct two evaluation runs using libero_seed = 0. Each run contains 500 episodes for the Object, Spatial, Goal, and Long suites, 4,500 episodes for LIBERO-90, and 6,500 episodes for LIBERO-130.
-  > We evaluate each model according to its training configuration.
   > For the SFT-trained (LoRA-base) models, we set do_sample = False.
   > For the RL-trained models, we set do_sample = True, temperature = 1.6, and enable rollout_epoch=2, and the final results are reported as the average across the two runs.

 ### Benchmark Results
 Sft models for LIBERO-90 and LIBERO-130 are trained by ourself following training reciepe from [OpenVLA-OFT](https://github.com/moojink/openvla-oft/blob/main/vla-scripts/finetune.py). And other sft models are from [SimpleVLA-RL](https://huggingface.co/collections/Haozhan72/simplevla-rl-6833311430cd9df52aeb1f86).
+  > We evaluate each model according to its training configuration. Using libero_seed = 0 and evaluating 500 episodes for the Object, Spatial, Goal, and Long suites, 4,500 episodes for LIBERO-90, and 6,500 episodes for LIBERO-130.
   > For the SFT-trained (LoRA-base) models, we set do_sample = False.
   > For the RL-trained models, we set do_sample = True, temperature = 1.6, and enable rollout_epoch=2, and the final results are reported as the average across the two runs.