| | --- |
| | base_model: |
| | - moojink/openvla-7b-oft-finetuned-libero-spatial |
| | - moojink/openvla-7b-oft-finetuned-libero-10 |
| | - moojink/openvla-7b-oft-finetuned-libero-object |
| | - moojink/openvla-7b-oft-finetuned-libero-goal |
| | datasets: |
| | - yifengzhu-hf/LIBERO-datasets |
| | pipeline_tag: robotics |
| | license: mit |
| | --- |
| | |
| | # πͺ RIPT-VLA: Interactive Post-Training for Vision-Language-Action Models (arxiv.org/abs/2505.17016) |
| |
|
| | **Authors**: Shuhan Tan, Kairan Dou, Yue Zhao, Philipp KrΓ€henbΓΌhl |
| | **Codebase**: [GitHub β RIPT-VLA](https://github.com/Ariostgx/ript-vla) |
| | **Website**: [Project Page](https://ariostgx.github.io/ript_vla/) |
| |
|
| | > **RIPT-VLA** enables interactive post-training for any pretrained Vision-Language-Action (VLA) model using only **sparse binary success rewards**. |
| | > With **K-rollout interaction**, **dynamic sampling**, and **leave-one-out advantage estimation**, RIPT-VLA achieves **state-of-the-art** performance in extremely low-data regimes. |
| |
|
| | --- |
| |
|
| | ## π§ Model Summary |
| |
|
| | RIPT-VLA takes a pretrained VLA model (e.g., QueST or OpenVLA-OFT) and improves its performance by fine-tuning it with reinforcement learning based on success/failure signals only β no dense rewards or value functions required. |
| |
|
| | Supported models: |
| | - β
QueST (small, efficient) |
| | - β
OpenVLA-OFT (large-scale, high-capacity) |
| |
|
| | --- |
| |
|
| | ## π§ͺ Model Use |
| |
|
| | ### β
Intended Use |
| |
|
| | - Research on post-training VLA models via RL |
| | - Evaluation on LIBERO benchmarks (LIBERO-90, Goal, Object, Spatial, Long) |
| | - Studying low-data reinforcement learning settings |
| |
|
| | --- |
| |
|
| | ## π¦ Checkpoints |
| |
|
| | All checkpoints are hosted here in this repository. |
| |
|
| | ### βοΈ QueST Checkpoints |
| |
|
| | | Suite | SFT Checkpoint | RIPT Checkpoint | |
| | |------------------|----------------|-----------------| |
| | | LIBERO-90 | β
| β
| |
| | | LIBERO-GOAL | β
| β
| |
| | | LIBERO-LONG | β
| β
| |
| | | LIBERO-OBJECT | β
| β
| |
| | | LIBERO-SPATIAL | β
| β
| |
| |
|
| | Each QueST checkpoint is ~80MB. |
| |
|
| | ### βοΈ OpenVLA-OFT Checkpoints |
| |
|
| | | Suite | SFT Scale Head | RIPT LoRA Adaptor | |
| | |------------------|----------------|--------------------| |
| | | LIBERO-GOAL | β
| β
| |
| | | LIBERO-LONG | β
| β
| |
| | | LIBERO-OBJECT | β
| β
| |
| | | LIBERO-SPATIAL | β
| β
| |
| |
|
| | OpenVLA-OFT scale heads are ~300MB; RIPT LoRA adaptors are ~1GB. |
| |
|
| | --- |
| |
|
| | ## π How to Use |
| |
|
| | For usage, see [INSTALL.md](https://github.com/Ariostgx/ript-vla/blob/main/INSTALL.md) in the main GitHub repo. |