| # OpenVLA-OFT -- color_object Checkpoint |
| |
| Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success. |
| |
| Paper: https://arxiv.org/abs/2502.19645 |
| Project: https://openvla-oft.github.io/ |
| |
| ## Repository Structure |
| |
| ``` |
| checkpoints/ |
| color_object/ |
| model-0000{1..4}-of-00004.safetensors # merged LLM weights (step 50000) |
| action_head--50000_checkpoint.pt # MLP action head |
| proprio_projector--50000_checkpoint.pt # proprio projector |
| config.json / tokenizer* / ... # model config and tokenizer files |
| lora_adapter/ |
| adapter_model.safetensors # LoRA adapter weights |
| adapter_config.json |
| prismatic/ # model architecture, dataset, training code |
| vla-scripts/ # finetune.py, deploy.py, merge_lora_weights_and_save.py |
| experiments/ # eval scripts for LIBERO, ALOHA |
| slurm_scripts/ # SLURM finetune scripts for all conflict splits |
| finetune_color_object.sh # exact script used to produce the checkpoint |
| finetune.md # step-by-step fine-tuning guide |
| SETUP.md / LIBERO.md / ALOHA.md |
| ``` |
| |
| ## Quick Inference |
|
|
| See `finetune.md` for the full loading example. |
|
|
| ```python |
| from experiments.robot.openvla_utils import get_vla, get_processor, get_action_head, get_proprio_projector, get_vla_action |
| from experiments.robot.libero.run_libero_eval import GenerateConfig |
| from prismatic.vla.constants import NUM_ACTIONS_CHUNK, PROPRIO_DIM |
| |
| cfg = GenerateConfig( |
| pretrained_checkpoint="checkpoints/color_object", |
| use_l1_regression=True, |
| use_film=False, |
| num_images_in_input=2, |
| use_proprio=True, |
| center_crop=True, |
| num_open_loop_steps=NUM_ACTIONS_CHUNK, |
| unnorm_key="conflict_maniskill", |
| ) |
| vla = get_vla(cfg) |
| processor = get_processor(cfg) |
| action_head = get_action_head(cfg, llm_dim=vla.llm_dim) |
| proprio_projector = get_proprio_projector(cfg, llm_dim=vla.llm_dim, proprio_dim=PROPRIO_DIM) |
| actions = get_vla_action(cfg, vla, processor, observation, observation["task_description"], |
| action_head, proprio_projector) |
| ``` |
|
|
| ## Fine-tuning |
|
|
| See `finetune.md` for the complete fine-tuning guide. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{kim2025openvlaoft, |
| title = {Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success}, |
| author = {Kim, Moo Jin and Pertsch, Karl and Ghosh, Dibya and Walke, Homer and |
| Bahl, Shikhar and Levine, Sergey and Finn, Chelsea}, |
| journal = {arXiv preprint arXiv:2502.19645}, |
| year = {2025} |
| } |
| ``` |
|
|