zuzuzzy commited on
Commit
d70a5fe
·
verified ·
1 Parent(s): 463290f

Add model card

Browse files
Files changed (1) hide show
  1. README.md +77 -3
README.md CHANGED
@@ -1,3 +1,77 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - robotics
7
+ - vision-language-action
8
+ - neuro-symbolic
9
+ - reinforcement-learning
10
+ - manipulation
11
+ pipeline_tag: robotics
12
+ library_name: transformers
13
+ ---
14
+
15
+ # NS-VLA: Neuro-Symbolic Vision-Language-Action Model
16
+
17
+ <div align="center">
18
+
19
+ [![arXiv](https://img.shields.io/badge/arXiv-XXXX.XXXXX-b31b1b.svg)](https://arxiv.org/abs/XXXX.XXXXX)
20
+ [![GitHub](https://img.shields.io/badge/GitHub-Code-black)](https://github.com/Zuzuzzy/NS-VLA)
21
+ [![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://zuzuzzy.github.io/NS-VLA/)
22
+
23
+ </div>
24
+
25
+ ## Model Description
26
+
27
+ **NS-VLA** is a neuro-symbolic Vision-Language-Action framework that combines symbolic reasoning with neural control for robotic manipulation. The model introduces:
28
+
29
+ - **Symbolic Encoder**: Extracts structured manipulation primitives from vision-language inputs
30
+ - **Symbolic Solver**: Lightweight action generator with visual token sparsification
31
+ - **Online RL**: GRPO-based optimization with primitive-segmented rewards
32
+
33
+ ## Model Details
34
+
35
+ | Property | Value |
36
+ |:---|:---|
37
+ | **Architecture** | Qwen3-VL-2B + Symbolic Classifier + Action Generator |
38
+ | **Parameters** | ~2B (VLM backbone frozen) |
39
+ | **Training** | Stage I: Supervised Pretraining → Stage II: Online RL (GRPO) |
40
+ | **Input** | RGB image (224×224) + natural language instruction |
41
+ | **Output** | Continuous end-effector actions (chunked, H=8) |
42
+
43
+ ## Performance
44
+
45
+ | Benchmark | Setting | Success Rate (%) |
46
+ |:---|:---|:---:|
47
+ | LIBERO | Full demonstrations | **98.6** |
48
+ | LIBERO | 1-shot (one demo per task) | **69.1** |
49
+ | LIBERO-Plus | Zero-shot generalization | **79.4** |
50
+ | CALVIN ABC→D | Zero-shot 5-task chain | **91.2** |
51
+
52
+ ## Usage
53
+
54
+ > ⚠️ **Note**: Model weights will be released upon paper acceptance. Please check back soon.
55
+
56
+ ```python
57
+ # Example usage (coming soon)
58
+ from nsvla import NSVLAAgent
59
+
60
+ agent = NSVLAAgent.from_pretrained("Zuzuzzy/NS-VLA")
61
+ action = agent.predict(image=obs, instruction="pick up the red mug")
62
+ ```
63
+
64
+ ## Citation
65
+
66
+ ```bibtex
67
+ @article{zhu2026nsvla,
68
+ title={NS-VLA: Towards Neuro-Symbolic Vision-Language-Action Models},
69
+ author={Zhu, Ziyue and Wu, Shangyang and Zhao, Shuai and Zhao, Zhiqiu and Li, Shengjie and Wang, Yi and Li, Fang and Luo, Haoran},
70
+ journal={arXiv preprint arXiv:XXXX.XXXXX},
71
+ year={2026}
72
+ }
73
+ ```
74
+
75
+ ## License
76
+
77
+ This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).