File size: 1,819 Bytes

---
license: mit
language:
- en
base_model:
- Qwen/Qwen2.5-0.5B
---
# Model Card for MergeVLA-LIBERO
MergeVLA — Single-Skill Experts for Spatial / Object / Goal / Long-10 (LIBERO Task Suite). These models are used as the base expert checkpoints for our MergeVLA.

## Model Details
Each uploaded model is a 0.68B-parameter VLA model *(excluding the vision backbone)* composed of:
- Qwen2.5-0.5B as the Vision-Language Model (VLM)
- A lightweight 0.18B Action Expert
- A two-layer Proprioceptive Projector MLP

### ✔️ **Performance (Success Rates on LIBERO)**
| Task Family | Success Rate (%) |
| ----------- | ---------------- |
| **Spatial** | **98.0**         |
| **Object**  | **98.6**         |
| **Goal**    | **95.0**         |
| **Long-10** | **95.0**         |

### 🧠 **Training Details**
Each expert is fine-tuned independently using modified LIBER demonstrations in RLDS format.
| Category                | Value                    |
| ----------------------- | ------------------------ |
| LoRA                    | Enabled (rank = 64)      |
| Optimizer               | AdamW                    |
| Learning Rate           | 2e-4                     |
| Batch Size              | 8 (×2 grad accumulation) |
| num_images_in_input     | 2                        |

### **Training Steps**
* **Spatial** — 30,000
* **Object** — 20,000
* **Goal** — 30,000
* **Long-10** — 50,000


## Citation instructions
```BibTeX
@misc{fu2025mergevla,
      title={MergeVLA: Cross-Skill Model Merging Toward a Generalist Vision-Language-Action Agent}, 
      author={Yuxia Fu and Zhizhen Zhang and Yuqi Zhang and Zijian Wang and Zi Huang and Yadan Luo},
      year={2025},
      eprint={2511.18810},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2511.18810}, 
}
```