FYX026 commited on
Commit
7bf9f9c
·
verified ·
1 Parent(s): 3439ecd

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Card for MergeVLA-LIBERO
2
+ MergeVLA — Single-Skill Experts for Spatial / Object / Goal / Long-10 (LIBERO Task Suite). These models are used as the base expert checkpoints for our MergeVLA.
3
+
4
+ ## Model Details
5
+ Each uploaded model is a 0.68B-parameter VLA model *(excluding the vision backbone)* composed of:
6
+ - Qwen2.5-0.5B as the Vision-Language Model (VLM)
7
+ - A lightweight 0.18B Action Expert
8
+ - A two-layer Proprioceptive Projector MLP
9
+
10
+ ### ✔️ **Performance (Success Rates on LIBERO)**
11
+ | Task Family | Success Rate (%) |
12
+ | ----------- | ---------------- |
13
+ | **Spatial** | **98.0** |
14
+ | **Object** | **98.6** |
15
+ | **Goal** | **95.0** |
16
+ | **Long-10** | **95.0** |
17
+
18
+ ### 🧠 **Training Details**
19
+ Each expert is fine-tuned independently using modified LIBER demonstrations in RLDS format.
20
+ | Category | Value |
21
+ | ----------------------- | ------------------------ |
22
+ | LoRA | Enabled (rank = 64) |
23
+ | Optimizer | AdamW |
24
+ | Learning Rate | 2e-4 |
25
+ | Batch Size | 8 (×2 grad accumulation) |
26
+ | num_images_in_input | 2 |
27
+
28
+ ### **Training Steps**
29
+ * **Spatial** — 30,000
30
+ * **Object** — 20,000
31
+ * **Goal** — 30,000
32
+ * **Long-10** — 50,000
33
+
34
+
35
+ ## Citation instructions
36
+ ```BibTeX
37
+ @misc{fu2025mergevla,
38
+ title={MergeVLA: Cross-Skill Model Merging Toward a Generalist Vision-Language-Action Agent},
39
+ author={Yuxia Fu and Zhizhen Zhang and Yuqi Zhang and Zijian Wang and Zi Huang and Yadan Luo},
40
+ year={2025},
41
+ eprint={2511.18810},
42
+ archivePrefix={arXiv},
43
+ primaryClass={cs.RO},
44
+ url={https://arxiv.org/abs/2511.18810},
45
+ }
46
+ ```