VP-VLA-OXE

This repository contains the VP-VLA policy checkpoint trained for OXE / SimplerEnv-style tabletop manipulation.

VP-VLA uses visual prompts as an interface for vision-language-action models: a high-level planner converts language instructions into visual prompts, and the policy follows those prompts to produce robot actions.

Usage

Use this checkpoint with the released VP-VLA codebase:

Please follow the installation and evaluation instructions in the VP-VLA repository, then pass this checkpoint path to the SimplerEnv evaluation script.

Citation

If you use this model, please cite the VP-VLA paper:

https://huggingface.co/papers/2603.22003

Downloads last month
7
Video Preview
loading

Collection including Vincent2311/VP-VLA-OXE

Paper for Vincent2311/VP-VLA-OXE