x-square-robot
/

wall-oss-flow

Safetensors

qwen2_5_vl

Model card Files Files and versions

xet

Community

roygan commited on Sep 7, 2025

Commit

09c8cae

verified ·

1 Parent(s): 0b3af84

Update README.md

Browse files

Files changed (1) hide show

README.md +26 -9

README.md CHANGED Viewed

@@ -1,16 +1,19 @@
-# Wall-X: Multimodal Foundation Model for Robotics
-## Model Description
-Wall-X is a multimodal foundation model designed specifically for robotics applications, combining vision, language, and action capabilities. Built upon the Qwen2.5-3B-VL architecture, Wall-X incorporates specialized adaptations for robotic control tasks, enabling seamless integration of visual perception, natural language understanding, and action generation.
-## Key Features
-- **Multimodal Integration**: Processes visual, textual, and proprioceptive information simultaneously
-- **Action Generation**: Specialized for robotic control and manipulation tasks
-- **Flexible Architecture**: Based on Qwen2.5-VL with custom adaptations for robotics
-- **Mixture of Experts**: Utilizes MoE architecture for efficient computation
-- **LeRobot Compatible**: Designed to work with LeRobot datasets and frameworks
 ## Quick Start
@@ -148,3 +151,17 @@ The repository contains:
 - **Inference Examples**: Multiple inference scripts and evaluation tools
 - **Configuration Templates**: Ready-to-use configs for different robot setups
 - **Troubleshooting Guide**: Common issues and solutions

+# WALL-OSS: Igniting VLMs toward the Embodied Space
+<div align="left">
+[![Paper](https://img.shields.io/badge/Paper-PDF-EA1B22?style=for-the-badge&logo=adobeacrobatreader&logoColor=fff)](https://x2-robot.feishu.cn/file/FurYbuThcofkOqxrsy7cnzUbndd)
+[![Hugging Face](https://img.shields.io/badge/Hugging%20Face-x--square--robot-FFB000?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/x-square-robot)
+[![GitHub](https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=fff)](https://github.com/X-Square-Robot/wall-x)
+[![Project Page](https://img.shields.io/badge/Project-1E90FF?style=for-the-badge&logo=google-chrome&logoColor=fff)](https://x2robot.com/en/research/68bc2cde8497d7f238dde690)
+</div>
+## Model Description
+We introduce **WALL-OSS**, an end-to-end embodied foundation model that leverages large-scale multimodal pretraining to achieve (1) embodiment-aware vision--language understanding, (2) strong language--action association, and (3) robust manipulation capability.
+Our approach employs a tightly coupled architecture and multi-strategies training curriculum that enables {Unified Cross-Level CoT}—seamlessly unifying instruction reasoning, subgoal decomposition, and fine-grained action synthesis within a single differentiable framework.
+Our results show that WALL-OSS attains high success on complex long-horizon manipulations, demonstrates strong instruction-following capabilities, complex   understanding and reasoning, and outperforms strong baselines, thereby providing a reliable and scalable path from VLMs to embodied foundation models.
 ## Quick Start
 - **Inference Examples**: Multiple inference scripts and evaluation tools
 - **Configuration Templates**: Ready-to-use configs for different robot setups
 - **Troubleshooting Guide**: Common issues and solutions
+## 📚 Cite Us
+If you find WALL-X or our WALL-OSS models useful, please cite:
+```bibtex
+@misc{walloss_paper_2025,
+  title        = {WALL-OSS: Igniting VLMs toward the Embodied Space},
+  author       = {X Square Robot},
+  year         = {2025},
+  howpublished = {\url{https://x2-robot.feishu.cn/file/FurYbuThcofkOqxrsy7cnzUbndd}},
+  note         = {White paper}
+}
+```