x-square-robot
/

wall-oss-flow

Safetensors

qwen2_5_vl

Model card Files Files and versions

xet

Community

StarWorkXc commited on Sep 7, 2025

Commit

0b3af84

verified ·

1 Parent(s): f99877c

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +150 -0

README.md ADDED Viewed

	@@ -0,0 +1,150 @@

+# Wall-X: Multimodal Foundation Model for Robotics
+## Model Description
+Wall-X is a multimodal foundation model designed specifically for robotics applications, combining vision, language, and action capabilities. Built upon the Qwen2.5-3B-VL architecture, Wall-X incorporates specialized adaptations for robotic control tasks, enabling seamless integration of visual perception, natural language understanding, and action generation.
+## Key Features
+- **Multimodal Integration**: Processes visual, textual, and proprioceptive information simultaneously
+- **Action Generation**: Specialized for robotic control and manipulation tasks
+- **Flexible Architecture**: Based on Qwen2.5-VL with custom adaptations for robotics
+- **Mixture of Experts**: Utilizes MoE architecture for efficient computation
+- **LeRobot Compatible**: Designed to work with LeRobot datasets and frameworks
+## Quick Start
+### Installation
+```bash
+# Create conda environment
+conda create --name wallx python=3.10
+conda activate wallx
+# Install base requirements
+pip install torch torchvision transformers
+pip install huggingface_hub
+# Install Wall-X from GitHub
+git clone https://github.com/X-Square-Robot/wall-x.git
+cd wall-x
+pip install -e .
+```
+### Basic Usage
+```python
+import torch
+from wall_x.model.qwen2_5_based.modeling_qwen2_5_vl_act import Qwen2_5_VLMoEForAction
+# Load the model
+model_path = "X-Square-Robot/wall-oss-flow"  # or your local path
+model = Qwen2_5_VLMoEForAction.from_pretrained(model_path)
+model.eval()
+# Configuration
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model = model.to(device).bfloat16()
+# Your inference code here...
+```
+## Supervised Fine-Tuning (SFT)
+For training Wall-X on your robotics datasets, please refer to our comprehensive training guide:
+**📖 [Training Documentation](https://github.com/X-Square-Robot/wall-x/blob/main/workspace/README.md)**
+The training process includes:
+- **Dataset Preparation**: How to prepare your robotics datasets in LeRobot format
+- **Configuration Setup**: Detailed configuration for GPU setup, model paths, and robot DOF settings
+- **Training Scripts**: Ready-to-use training scripts with proper hyperparameters
+### Quick Training Start
+```bash
+# Run training (see workspace/README.md for detailed configuration)
+bash ./workspace/lerobot_example/run.sh
+```
+## Inference
+For detailed inference examples and model evaluation:
+**📖 [Inference Documentation](https://github.com/X-Square-Robot/wall-x/blob/main/scripts/)**
+### Basic Inference Example
+```python
+import torch
+from wall_x.model.qwen2_5_based.modeling_qwen2_5_vl_act import Qwen2_5_VLMoEForAction
+# Load model
+model_path = "X-Square-Robot/wall-x"
+model = Qwen2_5_VLMoEForAction.from_pretrained(model_path)
+model.eval()
+# Setup
+batch_size = 1
+seq_length = 50
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model = model.to(device).bfloat16()
+# Prepare inputs (example with synthetic data)
+torch.manual_seed(0)
+input_ids = torch.randint(0, len(model.processor.tokenizer), (batch_size, seq_length), dtype=torch.long)
+attention_mask = torch.ones((batch_size, seq_length), dtype=torch.long)
+moe_token_types = torch.zeros((batch_size, seq_length), dtype=torch.long)
+position_ids = torch.arange(seq_length, dtype=torch.long).unsqueeze(0).expand(batch_size, -1)
+# Robotics-specific inputs
+proprioception = torch.randn((batch_size, 1, 20), dtype=torch.float32)  # Joint states
+agent_pos_mask = torch.ones((batch_size, 1, 20), dtype=torch.float32)
+dof_mask = torch.ones((batch_size, 32, 20), dtype=torch.float32)  # DOF mask
+dataset_names = ["x2_normal"]
+# Move to device
+inputs = {
+    "input_ids": input_ids.to(device),
+    "attention_mask": attention_mask.to(device),
+    "moe_token_types": moe_token_types.to(device),
+    "position_ids": position_ids.to(device),
+    "proprioception": proprioception.to(device).bfloat16(),
+    "agent_pos_mask": agent_pos_mask.to(device).bfloat16(),
+    "dof_mask": dof_mask.to(device).bfloat16(),
+    "dataset_names": dataset_names,
+    "mode": "validate"
+}
+# Run inference
+with torch.no_grad():
+    outputs = model(**inputs)
+    print(f"Output logits shape: {outputs.logits.shape}")
+```
+### Advanced Inference Scripts
+For production-ready inference and evaluation scripts:
+```bash
+# Basic inference test
+python ./scripts/fake_inference.py
+# Generate open-loop comparison plots
+python ./scripts/draw_openloop_plot.py
+```
+**📁 [View all inference scripts](https://github.com/X-Square-Robot/wall-x/tree/main/scripts)**
+## Complete Documentation
+For comprehensive setup, training, and inference instructions:
+### 🚀 **[Visit our GitHub Repository](https://github.com/X-Square-Robot/wall-x)**
+The repository contains:
+- **Detailed Installation Guide**: Complete environment setup with all dependencies
+- **Training Tutorials**: Step-by-step SFT process with LeRobot datasets
+- **Inference Examples**: Multiple inference scripts and evaluation tools
+- **Configuration Templates**: Ready-to-use configs for different robot setups
+- **Troubleshooting Guide**: Common issues and solutions