Robotics
Safetensors
File size: 2,660 Bytes
d40c85c
 
 
 
 
 
d7e2662
 
 
 
d40c85c
d7e2662
5ab7ed6
3ce6162
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
base_model:
- moojink/openvla-7b-oft-finetuned-libero-spatial
- moojink/openvla-7b-oft-finetuned-libero-10
- moojink/openvla-7b-oft-finetuned-libero-object
- moojink/openvla-7b-oft-finetuned-libero-goal
datasets:
- yifengzhu-hf/LIBERO-datasets
pipeline_tag: robotics
license: mit
---

# πŸ’ͺ RIPT-VLA: Interactive Post-Training for Vision-Language-Action Models (arxiv.org/abs/2505.17016)

**Authors**: Shuhan Tan, Kairan Dou, Yue Zhao, Philipp KrΓ€henbΓΌhl  
**Codebase**: [GitHub – RIPT-VLA](https://github.com/Ariostgx/ript-vla)  
**Website**: [Project Page](https://ariostgx.github.io/ript_vla/)  

> **RIPT-VLA** enables interactive post-training for any pretrained Vision-Language-Action (VLA) model using only **sparse binary success rewards**.  
> With **K-rollout interaction**, **dynamic sampling**, and **leave-one-out advantage estimation**, RIPT-VLA achieves **state-of-the-art** performance in extremely low-data regimes.

---

## 🧠 Model Summary

RIPT-VLA takes a pretrained VLA model (e.g., QueST or OpenVLA-OFT) and improves its performance by fine-tuning it with reinforcement learning based on success/failure signals only β€” no dense rewards or value functions required.

Supported models:
- βœ… QueST (small, efficient)
- βœ… OpenVLA-OFT (large-scale, high-capacity)

---

## πŸ§ͺ Model Use

### βœ… Intended Use

- Research on post-training VLA models via RL
- Evaluation on LIBERO benchmarks (LIBERO-90, Goal, Object, Spatial, Long)
- Studying low-data reinforcement learning settings

---

## πŸ“¦ Checkpoints

All checkpoints are hosted here in this repository.

### βœ”οΈ QueST Checkpoints

| Suite            | SFT Checkpoint | RIPT Checkpoint |
|------------------|----------------|-----------------|
| LIBERO-90        | βœ…              | βœ…              |
| LIBERO-GOAL      | βœ…              | βœ…              |
| LIBERO-LONG      | βœ…              | βœ…              |
| LIBERO-OBJECT    | βœ…              | βœ…              |
| LIBERO-SPATIAL   | βœ…              | βœ…              |

Each QueST checkpoint is ~80MB.

### βœ”οΈ OpenVLA-OFT Checkpoints

| Suite            | SFT Scale Head | RIPT LoRA Adaptor |
|------------------|----------------|--------------------|
| LIBERO-GOAL      | βœ…              | βœ…    |
| LIBERO-LONG      | βœ…              | βœ…                 |
| LIBERO-OBJECT    | βœ…              | βœ…                 |
| LIBERO-SPATIAL   | βœ…              | βœ…                 |

OpenVLA-OFT scale heads are ~300MB; RIPT LoRA adaptors are ~1GB.

---

## πŸ›  How to Use

For usage, see [INSTALL.md](https://github.com/Ariostgx/ript-vla/blob/main/INSTALL.md) in the main GitHub repo.