VP-VLA-OXE

This repository contains the VP-VLA policy checkpoint trained for OXE / SimplerEnv-style tabletop manipulation.

VP-VLA uses visual prompts as an interface for vision-language-action models: a high-level planner converts language instructions into visual prompts, and the policy follows those prompts to produce robot actions.

Usage

Use this checkpoint with the released VP-VLA codebase:

Code: https://github.com/JIA-Lab-research/VP-VLA
Paper: https://huggingface.co/papers/2603.22003

Please follow the installation and evaluation instructions in the VP-VLA repository, then pass this checkpoint path to the SimplerEnv evaluation script.

Citation

If you use this model, please cite the VP-VLA paper:

https://huggingface.co/papers/2603.22003

Downloads last month: 14

Video Preview

Robotics

Collection including Vincent2311/VP-VLA-OXE

VP-VLA

Collection

Official checkpoints for VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models • 2 items • Updated Apr 11 • 1

Paper for Vincent2311/VP-VLA-OXE

VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models

Paper • 2603.22003 • Published Mar 23 • 12