FYX026
/

MergeVLA-LIBERO

Model card Files Files and versions

FYX026 commited on Dec 8, 2025

Commit

7bf9f9c

·

verified ·

1 Parent(s): 3439ecd

Create README.md

Files changed (1) hide show

README.md +46 -0

README.md ADDED Viewed

	@@ -0,0 +1,46 @@

+# Model Card for MergeVLA-LIBERO
+MergeVLA — Single-Skill Experts for Spatial / Object / Goal / Long-10 (LIBERO Task Suite). These models are used as the base expert checkpoints for our MergeVLA.
+## Model Details
+Each uploaded model is a 0.68B-parameter VLA model *(excluding the vision backbone)* composed of:
+- Qwen2.5-0.5B as the Vision-Language Model (VLM)
+- A lightweight 0.18B Action Expert
+- A two-layer Proprioceptive Projector MLP
+### ✔️ **Performance (Success Rates on LIBERO)**
+| Task Family | Success Rate (%) |
+| ----------- | ---------------- |
+| **Spatial** | **98.0**         |
+| **Object**  | **98.6**         |
+| **Goal**    | **95.0**         |
+| **Long-10** | **95.0**         |
+### 🧠 **Training Details**
+Each expert is fine-tuned independently using modified LIBER demonstrations in RLDS format.
+| Category                | Value                    |
+| ----------------------- | ------------------------ |
+| LoRA                    | Enabled (rank = 64)      |
+| Optimizer               | AdamW                    |
+| Learning Rate           | 2e-4                     |
+| Batch Size              | 8 (×2 grad accumulation) |
+| num_images_in_input     | 2                        |
+### **Training Steps**
+* **Spatial** — 30,000
+* **Object** — 20,000
+* **Goal** — 30,000
+* **Long-10** — 50,000
+## Citation instructions
+```BibTeX
+@misc{fu2025mergevla,
+      title={MergeVLA: Cross-Skill Model Merging Toward a Generalist Vision-Language-Action Agent},
+      author={Yuxia Fu and Zhizhen Zhang and Yuqi Zhang and Zijian Wang and Zi Huang and Yadan Luo},
+      year={2025},
+      eprint={2511.18810},
+      archivePrefix={arXiv},
+      primaryClass={cs.RO},
+      url={https://arxiv.org/abs/2511.18810},
+}
+```