moojink
/

openvla-7b-oft-finetuned-libero-spatial-object-goal-10

Model card Files Files and versions

moojink commited on Jun 17

Commit

638918f

·

verified ·

1 Parent(s): 13cdacd

Add README.md

(Thank you to

@nielsr
for the template!)

Files changed (1) hide show

README.md +67 -0

README.md ADDED Viewed

	@@ -0,0 +1,67 @@

+---
+pipeline_tag: robotics
+library_name: transformers
+license: mit
+---
+# Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
+This repository contains the OpenVLA-OFT checkpoint trained on 4 LIBERO task suites combined (-Spatial, -Object, -Goal, -Long), as described in [Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success](https://arxiv.org/abs/2502.19645). OpenVLA-OFT significantly improves upon the base OpenVLA model by incorporating optimized fine-tuning techniques.
+Project Page: https://openvla-oft.github.io/
+Code: https://github.com/openvla-oft/openvla-oft
+See here for other OpenVLA-OFT checkpoints: https://huggingface.co/moojink?search_models=oft
+## Quick Start
+This example demonstrates generating an action chunk using a pretrained OpenVLA-OFT checkpoint.  Ensure you have set up the conda environment as described in the GitHub README.
+```python
+import pickle
+from experiments.robot.libero.run_libero_eval import GenerateConfig
+from experiments.robot.openvla_utils import get_action_head, get_processor, get_proprio_projector, get_vla, get_vla_action
+from prismatic.vla.constants import NUM_ACTIONS_CHUNK, PROPRIO_DIM
+# Instantiate config (see class GenerateConfig in experiments/robot/libero/run_libero_eval.py for definitions)
+cfg = GenerateConfig(
+    pretrained_checkpoint = "moojink/openvla-7b-oft-finetuned-libero-spatial",
+    use_l1_regression = True,
+    use_diffusion = False,
+    use_film = False,
+    num_images_in_input = 2,
+    use_proprio = True,
+    load_in_8bit = False,
+    load_in_4bit = False,
+    center_crop = True,
+    num_open_loop_steps = NUM_ACTIONS_CHUNK,
+    unnorm_key = "libero_spatial_no_noops",
+)
+# Load OpenVLA-OFT policy and inputs processor
+vla = get_vla(cfg)
+processor = get_processor(cfg)
+# Load MLP action head to generate continuous actions (via L1 regression)
+action_head = get_action_head(cfg, llm_dim=vla.llm_dim)
+# Load proprio projector to map proprio to language embedding space
+proprio_projector = get_proprio_projector(cfg, llm_dim=vla.llm_dim, proprio_dim=PROPRIO_DIM)
+# Load sample observation:
+#   observation (dict): {
+#     "full_image": primary third-person image,
+#     "wrist_image": wrist-mounted camera image,
+#     "state": robot proprioceptive state,
+#     "task_description": task description,
+#   }
+with open("experiments/robot/libero/sample_libero_spatial_observation.pkl", "rb") as file:
+    observation = pickle.load(file)
+# Generate robot action chunk (sequence of future actions)
+actions = get_vla_action(cfg, vla, processor, observation, observation["task_description"], action_head, proprio_projector)
+print("Generated action chunk:")
+for act in actions:
+    print(act)
+```
+## Citation
+```bibtex
+@article{kim2025fine,
+  title={Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success},
+  author={Kim, Moo Jin and Finn, Chelsea and Liang, Percy},
+  journal={arXiv preprint arXiv:2502.19645},
+  year={2025}
+}
+```