x-square-robot
/

wall-oss-fast

Model card Files Files and versions

Shalfunnn commited on Sep 8, 2025

Commit

294d753

·

verified ·

1 Parent(s): ad96a88

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -1,4 +1,4 @@
-# WALL-OSS: Igniting VLMs toward the Embodied Space
 <div align="left">
@@ -13,13 +13,13 @@
 </div>
-## 🤖 Model Description
 We introduce **WALL-OSS**, an end-to-end embodied foundation model that leverages large-scale multimodal pretraining to achieve (1) embodiment-aware vision--language understanding, (2) strong language--action association, and (3) robust manipulation capability.
 Our approach employs a tightly coupled architecture and multi-strategies training curriculum that enables Unified Cross-Level CoT—seamlessly unifying instruction reasoning, subgoal decomposition, and fine-grained action synthesis within a single differentiable framework.
 Our results show that WALL-OSS attains high success on complex long-horizon manipulations, demonstrates strong instruction-following capabilities, complex   understanding and reasoning, and outperforms strong baselines, thereby providing a reliable and scalable path from VLMs to embodied foundation models.
-<!-- ## 🎬 Video Demos
 <div align="center">
     <video width="80%" controls>
@@ -28,7 +28,7 @@ Our results show that WALL-OSS attains high success on complex long-horizon mani
     </video>
     <p><strong>WALL-OSS in Action: Demonstrating advanced manipulation capabilities and embodied AI performance</strong></p>
 </div>
- -->
 ## 🚀 Quick Start

+# WALL-OSS
 <div align="left">
 </div>
+## [WALL-OSS: Igniting VLMs toward the Embodied Space](https://x2robot.cn-wlcb.ufileos.com/wall_oss.pdf)
 We introduce **WALL-OSS**, an end-to-end embodied foundation model that leverages large-scale multimodal pretraining to achieve (1) embodiment-aware vision--language understanding, (2) strong language--action association, and (3) robust manipulation capability.
 Our approach employs a tightly coupled architecture and multi-strategies training curriculum that enables Unified Cross-Level CoT—seamlessly unifying instruction reasoning, subgoal decomposition, and fine-grained action synthesis within a single differentiable framework.
 Our results show that WALL-OSS attains high success on complex long-horizon manipulations, demonstrates strong instruction-following capabilities, complex   understanding and reasoning, and outperforms strong baselines, thereby providing a reliable and scalable path from VLMs to embodied foundation models.
+## 🎬 Video Demos
 <div align="center">
     <video width="80%" controls>
     </video>
     <p><strong>WALL-OSS in Action: Demonstrating advanced manipulation capabilities and embodied AI performance</strong></p>
 </div>
 ## 🚀 Quick Start