FYX026
/

MergeVLA-LIBERO

Model card Files Files and versions

MergeVLA-LIBERO / README.md

FYX026's picture

Update README.md

5f56981 verified 6 months ago

|

history blame contribute delete

1.82 kB

	---
	license: mit
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-0.5B
	---
	# Model Card for MergeVLA-LIBERO
	MergeVLA — Single-Skill Experts for Spatial / Object / Goal / Long-10 (LIBERO Task Suite). These models are used as the base expert checkpoints for our MergeVLA.

	## Model Details
	Each uploaded model is a 0.68B-parameter VLA model (excluding the vision backbone) composed of:
	- Qwen2.5-0.5B as the Vision-Language Model (VLM)
	- A lightweight 0.18B Action Expert
	- A two-layer Proprioceptive Projector MLP

	### ✔️ Performance (Success Rates on LIBERO)
	\| Task Family \| Success Rate (%) \|
	\| ----------- \| ---------------- \|
	\| Spatial \| 98.0 \|
	\| Object \| 98.6 \|
	\| Goal \| 95.0 \|
	\| Long-10 \| 95.0 \|

	### 🧠 Training Details
	Each expert is fine-tuned independently using modified LIBER demonstrations in RLDS format.
	\| Category \| Value \|
	\| ----------------------- \| ------------------------ \|
	\| LoRA \| Enabled (rank = 64) \|
	\| Optimizer \| AdamW \|
	\| Learning Rate \| 2e-4 \|
	\| Batch Size \| 8 (×2 grad accumulation) \|
	\| num_images_in_input \| 2 \|

	### Training Steps
	* Spatial — 30,000
	* Object — 20,000
	* Goal — 30,000
	* Long-10 — 50,000


	## Citation instructions
	```BibTeX
	@misc{fu2025mergevla,
	title={MergeVLA: Cross-Skill Model Merging Toward a Generalist Vision-Language-Action Agent},
	author={Yuxia Fu and Zhizhen Zhang and Yuqi Zhang and Zijian Wang and Zi Huang and Yadan Luo},
	year={2025},
	eprint={2511.18810},
	archivePrefix={arXiv},
	primaryClass={cs.RO},
	url={https://arxiv.org/abs/2511.18810},
	}
	```