YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

OS_libero_traj_strict_rew

OpenSora + OpenVLA-OFT checkpoints trained with GRPO on LIBERO-Spatial using a strict VLM trajectory reward (J_adversarial_fast prompt).

Training Details

  • Base model: RLinf-OpenVLAOFT-LIBERO-130-Base-Lora (OpenVLA-OFT with LoRA)
  • RL algorithm: GRPO (Group Relative Policy Optimization)
  • Environment: LIBERO-Spatial (10 tasks)
  • Reward model: Qwen3-VL-8B-Instruct with J_adversarial_fast prompt
  • Reward type: VLM trajectory reward (4 frames sampled from trajectory)
  • Training framework: RLinf
  • GPUs: 8x NVIDIA B200
  • KL penalty (kl_beta): 0.0
  • Reward coefficient: 5.0
  • VLM max_new_tokens: 32
  • enable_thinking: False
  • val_check_interval: 5

Checkpoints

Checkpoint Description
global_step_25/ Step 25 โ€” peak eval performance window
global_step_30/ Step 30 โ€” peak eval performance window

Each checkpoint contains:

  • actor/model_state_dict/full_weights.pt โ€” consolidated full model weights (~15GB)
  • actor/dcp_checkpoint/ โ€” distributed checkpoint shards for resuming training (~43GB)

VLM Reward Prompt

See prompt.txt for the full J_adversarial_fast prompt template used during training.

Prompt Metrics

VLM Model Accuracy Precision Recall F1 FP Rate
Qwen3-VL-8B 68.3% 90.6% 59.0% 0.714 12.8%
Qwen3.5-9B 85.8% 97.1% 81.5% 0.886 5.1%

Training Eval Curve (VLM success)

Training ran for 95 steps with periodic validation every 5 steps.

  • Peak VLM success ~85% around steps 20-25
  • Gradual decline after step 30

Usage

To load the full weights for inference:

import torch

state_dict = torch.load("global_step_25/actor/model_state_dict/full_weights.pt", map_location="cpu")
# Apply to your OpenVLA-OFT model

License

See the main RLinf repository for license details.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support