roygan commited on
Commit
09c8cae
·
verified ·
1 Parent(s): 0b3af84

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -9
README.md CHANGED
@@ -1,16 +1,19 @@
1
- # Wall-X: Multimodal Foundation Model for Robotics
2
 
3
- ## Model Description
 
 
 
 
 
4
 
5
- Wall-X is a multimodal foundation model designed specifically for robotics applications, combining vision, language, and action capabilities. Built upon the Qwen2.5-3B-VL architecture, Wall-X incorporates specialized adaptations for robotic control tasks, enabling seamless integration of visual perception, natural language understanding, and action generation.
6
 
7
- ## Key Features
8
 
9
- - **Multimodal Integration**: Processes visual, textual, and proprioceptive information simultaneously
10
- - **Action Generation**: Specialized for robotic control and manipulation tasks
11
- - **Flexible Architecture**: Based on Qwen2.5-VL with custom adaptations for robotics
12
- - **Mixture of Experts**: Utilizes MoE architecture for efficient computation
13
- - **LeRobot Compatible**: Designed to work with LeRobot datasets and frameworks
14
 
15
  ## Quick Start
16
 
@@ -148,3 +151,17 @@ The repository contains:
148
  - **Inference Examples**: Multiple inference scripts and evaluation tools
149
  - **Configuration Templates**: Ready-to-use configs for different robot setups
150
  - **Troubleshooting Guide**: Common issues and solutions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # WALL-OSS: Igniting VLMs toward the Embodied Space
2
 
3
+ <div align="left">
4
+
5
+ [![Paper](https://img.shields.io/badge/Paper-PDF-EA1B22?style=for-the-badge&logo=adobeacrobatreader&logoColor=fff)](https://x2-robot.feishu.cn/file/FurYbuThcofkOqxrsy7cnzUbndd)
6
+ [![Hugging Face](https://img.shields.io/badge/Hugging%20Face-x--square--robot-FFB000?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/x-square-robot)
7
+ [![GitHub](https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=fff)](https://github.com/X-Square-Robot/wall-x)
8
+ [![Project Page](https://img.shields.io/badge/Project-1E90FF?style=for-the-badge&logo=google-chrome&logoColor=fff)](https://x2robot.com/en/research/68bc2cde8497d7f238dde690)
9
 
10
+ </div>
11
 
12
+ ## Model Description
13
 
14
+ We introduce **WALL-OSS**, an end-to-end embodied foundation model that leverages large-scale multimodal pretraining to achieve (1) embodiment-aware vision--language understanding, (2) strong language--action association, and (3) robust manipulation capability.
15
+ Our approach employs a tightly coupled architecture and multi-strategies training curriculum that enables {Unified Cross-Level CoT}—seamlessly unifying instruction reasoning, subgoal decomposition, and fine-grained action synthesis within a single differentiable framework.
16
+ Our results show that WALL-OSS attains high success on complex long-horizon manipulations, demonstrates strong instruction-following capabilities, complex understanding and reasoning, and outperforms strong baselines, thereby providing a reliable and scalable path from VLMs to embodied foundation models.
 
 
17
 
18
  ## Quick Start
19
 
 
151
  - **Inference Examples**: Multiple inference scripts and evaluation tools
152
  - **Configuration Templates**: Ready-to-use configs for different robot setups
153
  - **Troubleshooting Guide**: Common issues and solutions
154
+
155
+ ## 📚 Cite Us
156
+
157
+ If you find WALL-X or our WALL-OSS models useful, please cite:
158
+
159
+ ```bibtex
160
+ @misc{walloss_paper_2025,
161
+ title = {WALL-OSS: Igniting VLMs toward the Embodied Space},
162
+ author = {X Square Robot},
163
+ year = {2025},
164
+ howpublished = {\url{https://x2-robot.feishu.cn/file/FurYbuThcofkOqxrsy7cnzUbndd}},
165
+ note = {White paper}
166
+ }
167
+ ```