|
|
--- |
|
|
base_model: |
|
|
- moojink/openvla-7b-oft-finetuned-libero-spatial |
|
|
- moojink/openvla-7b-oft-finetuned-libero-10 |
|
|
- moojink/openvla-7b-oft-finetuned-libero-object |
|
|
- moojink/openvla-7b-oft-finetuned-libero-goal |
|
|
datasets: |
|
|
- yifengzhu-hf/LIBERO-datasets |
|
|
pipeline_tag: robotics |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# πͺ RIPT-VLA: Interactive Post-Training for Vision-Language-Action Models (arxiv.org/abs/2505.17016) |
|
|
|
|
|
**Authors**: Shuhan Tan, Kairan Dou, Yue Zhao, Philipp KrΓ€henbΓΌhl |
|
|
**Codebase**: [GitHub β RIPT-VLA](https://github.com/Ariostgx/ript-vla) |
|
|
**Website**: [Project Page](https://ariostgx.github.io/ript_vla/) |
|
|
|
|
|
> **RIPT-VLA** enables interactive post-training for any pretrained Vision-Language-Action (VLA) model using only **sparse binary success rewards**. |
|
|
> With **K-rollout interaction**, **dynamic sampling**, and **leave-one-out advantage estimation**, RIPT-VLA achieves **state-of-the-art** performance in extremely low-data regimes. |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Model Summary |
|
|
|
|
|
RIPT-VLA takes a pretrained VLA model (e.g., QueST or OpenVLA-OFT) and improves its performance by fine-tuning it with reinforcement learning based on success/failure signals only β no dense rewards or value functions required. |
|
|
|
|
|
Supported models: |
|
|
- β
QueST (small, efficient) |
|
|
- β
OpenVLA-OFT (large-scale, high-capacity) |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ͺ Model Use |
|
|
|
|
|
### β
Intended Use |
|
|
|
|
|
- Research on post-training VLA models via RL |
|
|
- Evaluation on LIBERO benchmarks (LIBERO-90, Goal, Object, Spatial, Long) |
|
|
- Studying low-data reinforcement learning settings |
|
|
|
|
|
--- |
|
|
|
|
|
## π¦ Checkpoints |
|
|
|
|
|
All checkpoints are hosted here in this repository. |
|
|
|
|
|
### βοΈ QueST Checkpoints |
|
|
|
|
|
| Suite | SFT Checkpoint | RIPT Checkpoint | |
|
|
|------------------|----------------|-----------------| |
|
|
| LIBERO-90 | β
| β
| |
|
|
| LIBERO-GOAL | β
| β
| |
|
|
| LIBERO-LONG | β
| β
| |
|
|
| LIBERO-OBJECT | β
| β
| |
|
|
| LIBERO-SPATIAL | β
| β
| |
|
|
|
|
|
Each QueST checkpoint is ~80MB. |
|
|
|
|
|
### βοΈ OpenVLA-OFT Checkpoints |
|
|
|
|
|
| Suite | SFT Scale Head | RIPT LoRA Adaptor | |
|
|
|------------------|----------------|--------------------| |
|
|
| LIBERO-GOAL | β
| β
| |
|
|
| LIBERO-LONG | β
| β
| |
|
|
| LIBERO-OBJECT | β
| β
| |
|
|
| LIBERO-SPATIAL | β
| β
| |
|
|
|
|
|
OpenVLA-OFT scale heads are ~300MB; RIPT LoRA adaptors are ~1GB. |
|
|
|
|
|
--- |
|
|
|
|
|
## π How to Use |
|
|
|
|
|
For usage, see [INSTALL.md](https://github.com/Ariostgx/ript-vla/blob/main/INSTALL.md) in the main GitHub repo. |