yqi19
/

openvla-oft

Model card Files Files and versions

openvla-oft / README.md

yqi19's picture

add: README and finetune.md

ced933a verified 23 days ago

|

history blame contribute delete

2.47 kB

	# OpenVLA-OFT -- color_object Checkpoint

	Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success.

	Paper: https://arxiv.org/abs/2502.19645
	Project: https://openvla-oft.github.io/

	## Repository Structure

	```
	checkpoints/
	color_object/
	model-0000{1..4}-of-00004.safetensors # merged LLM weights (step 50000)
	action_head--50000_checkpoint.pt # MLP action head
	proprio_projector--50000_checkpoint.pt # proprio projector
	config.json / tokenizer* / ... # model config and tokenizer files
	lora_adapter/
	adapter_model.safetensors # LoRA adapter weights
	adapter_config.json
	prismatic/ # model architecture, dataset, training code
	vla-scripts/ # finetune.py, deploy.py, merge_lora_weights_and_save.py
	experiments/ # eval scripts for LIBERO, ALOHA
	slurm_scripts/ # SLURM finetune scripts for all conflict splits
	finetune_color_object.sh # exact script used to produce the checkpoint
	finetune.md # step-by-step fine-tuning guide
	SETUP.md / LIBERO.md / ALOHA.md
	```

	## Quick Inference

	See `finetune.md` for the full loading example.

	```python
	from experiments.robot.openvla_utils import get_vla, get_processor, get_action_head, get_proprio_projector, get_vla_action
	from experiments.robot.libero.run_libero_eval import GenerateConfig
	from prismatic.vla.constants import NUM_ACTIONS_CHUNK, PROPRIO_DIM

	cfg = GenerateConfig(
	pretrained_checkpoint="checkpoints/color_object",
	use_l1_regression=True,
	use_film=False,
	num_images_in_input=2,
	use_proprio=True,
	center_crop=True,
	num_open_loop_steps=NUM_ACTIONS_CHUNK,
	unnorm_key="conflict_maniskill",
	)
	vla = get_vla(cfg)
	processor = get_processor(cfg)
	action_head = get_action_head(cfg, llm_dim=vla.llm_dim)
	proprio_projector = get_proprio_projector(cfg, llm_dim=vla.llm_dim, proprio_dim=PROPRIO_DIM)
	actions = get_vla_action(cfg, vla, processor, observation, observation["task_description"],
	action_head, proprio_projector)
	```

	## Fine-tuning

	See `finetune.md` for the complete fine-tuning guide.

	## Citation

	```bibtex
	@article{kim2025openvlaoft,
	title = {Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success},
	author = {Kim, Moo Jin and Pertsch, Karl and Ghosh, Dibya and Walke, Homer and
	Bahl, Shikhar and Levine, Sergey and Finn, Chelsea},
	journal = {arXiv preprint arXiv:2502.19645},
	year = {2025}
	}
	```