FYX026
/

MergeVLA-LIBERO

Add robotics pipeline tag, library name, and project links

by nielsr HF Staff - opened Mar 12

←

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,14 +1,29 @@
 ---
-license: mit
-language:
-- en
 base_model:
 - Qwen/Qwen2.5-0.5B
 ---
 # Model Card for MergeVLA-LIBERO
-MergeVLA — Single-Skill Experts for Spatial / Object / Goal / Long-10 (LIBERO Task Suite). These models are used as the base expert checkpoints for our MergeVLA.
 ## Model Details
 Each uploaded model is a 0.68B-parameter VLA model *(excluding the vision backbone)* composed of:
 - Qwen2.5-0.5B as the Vision-Language Model (VLM)
 - A lightweight 0.18B Action Expert
@@ -23,7 +38,7 @@ Each uploaded model is a 0.68B-parameter VLA model *(excluding the vision backbo
 | **Long-10** | **95.0**         |
 ### 🧠 **Training Details**
-Each expert is fine-tuned independently using modified LIBER demonstrations in RLDS format.
 | Category                | Value                    |
 | ----------------------- | ------------------------ |
 | LoRA                    | Enabled (rank = 64)      |

 ---
 base_model:
 - Qwen/Qwen2.5-0.5B
+language:
+- en
+license: mit
+pipeline_tag: robotics
+library_name: transformers
+tags:
+- vla
+- vision-language-action
+- model-merging
+- libero
 ---
 # Model Card for MergeVLA-LIBERO
+[**MergeVLA: Cross-Skill Model Merging Toward a Generalist Vision-Language-Action Agent**](https://arxiv.org/abs/2511.18810)
+[**Project Page**](https://mergevla.github.io/) | [**Code**](https://github.com/MergeVLA/MergeVLA)
+MergeVLA — Single-Skill Experts for Spatial / Object / Goal / Long-10 (LIBERO Task Suite). These models are used as the base expert checkpoints for **MergeVLA**, a merging-oriented VLA architecture designed to preserve mergeability across tasks.
 ## Model Details
+MergeVLA addresses non-mergeability in VLAs by introducing sparsely activated LoRA adapters via task masks and replacing self-attention in action experts with cross-attention-only blocks.
 Each uploaded model is a 0.68B-parameter VLA model *(excluding the vision backbone)* composed of:
 - Qwen2.5-0.5B as the Vision-Language Model (VLM)
 - A lightweight 0.18B Action Expert
 | **Long-10** | **95.0**         |
 ### 🧠 **Training Details**
+Each expert is fine-tuned independently using modified LIBERO demonstrations in RLDS format.
 | Category                | Value                    |
 | ----------------------- | ------------------------ |
 | LoRA                    | Enabled (rank = 64)      |