zuzuzzy
/

NS-VLA

vision-language-action

reinforcement-learning

Model card Files Files and versions

zuzuzzy commited on Mar 10

Commit

d70a5fe

·

verified ·

1 Parent(s): 463290f

Add model card

Files changed (1) hide show

README.md +77 -3

README.md CHANGED Viewed

@@ -1,3 +1,77 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+  - en
+tags:
+  - robotics
+  - vision-language-action
+  - neuro-symbolic
+  - reinforcement-learning
+  - manipulation
+pipeline_tag: robotics
+library_name: transformers
+---
+# NS-VLA: Neuro-Symbolic Vision-Language-Action Model
+<div align="center">
+[![arXiv](https://img.shields.io/badge/arXiv-XXXX.XXXXX-b31b1b.svg)](https://arxiv.org/abs/XXXX.XXXXX)
+[![GitHub](https://img.shields.io/badge/GitHub-Code-black)](https://github.com/Zuzuzzy/NS-VLA)
+[![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://zuzuzzy.github.io/NS-VLA/)
+</div>
+## Model Description
+**NS-VLA** is a neuro-symbolic Vision-Language-Action framework that combines symbolic reasoning with neural control for robotic manipulation. The model introduces:
+- **Symbolic Encoder**: Extracts structured manipulation primitives from vision-language inputs
+- **Symbolic Solver**: Lightweight action generator with visual token sparsification
+- **Online RL**: GRPO-based optimization with primitive-segmented rewards
+## Model Details
+| Property | Value |
+|:---|:---|
+| **Architecture** | Qwen3-VL-2B + Symbolic Classifier + Action Generator |
+| **Parameters** | ~2B (VLM backbone frozen) |
+| **Training** | Stage I: Supervised Pretraining → Stage II: Online RL (GRPO) |
+| **Input** | RGB image (224×224) + natural language instruction |
+| **Output** | Continuous end-effector actions (chunked, H=8) |
+## Performance
+| Benchmark | Setting | Success Rate (%) |
+|:---|:---|:---:|
+| LIBERO | Full demonstrations | **98.6** |
+| LIBERO | 1-shot (one demo per task) | **69.1** |
+| LIBERO-Plus | Zero-shot generalization | **79.4** |
+| CALVIN ABC→D | Zero-shot 5-task chain | **91.2** |
+## Usage
+> ⚠️ **Note**: Model weights will be released upon paper acceptance. Please check back soon.
+```python
+# Example usage (coming soon)
+from nsvla import NSVLAAgent
+agent = NSVLAAgent.from_pretrained("Zuzuzzy/NS-VLA")
+action = agent.predict(image=obs, instruction="pick up the red mug")
+```
+## Citation
+```bibtex
+@article{zhu2026nsvla,
+  title={NS-VLA: Towards Neuro-Symbolic Vision-Language-Action Models},
+  author={Zhu, Ziyue and Wu, Shangyang and Zhao, Shuai and Zhao, Zhiqiu and Li, Shengjie and Wang, Yi and Li, Fang and Luo, Haoran},
+  journal={arXiv preprint arXiv:XXXX.XXXXX},
+  year={2026}
+}
+```
+## License
+This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).