# BLM0: A Boundless Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning

  ⭐️ Project     🤗 Hugging Face     📑 Paper  

## 🔥 Overview We present **Boundless Large Model** (BLM0), a multimodal spatial foundation model that preserves the native instruction-following and reasoning ability of MLLMs while acquiring effective robotic control. We formalize three requirements for generalist agents—cross-space transfer (digital→physical), cross-task learning, and cross-embodiment generalization—and instantiate them with a two-stage training pipeline. Stage I performs supervised fine-tuning on large-scale digital-space understanding and reasoning corpora to inject embodied perception and spatial knowledge without degrading the underlying language capabilities. Stage II freezes the MLLM backbone and trains a diffusion-based policy head on a self-collected cross-embodiment demonstration suite spanning Franka Emika Panda, xArm-6, xArm-7, and WidowX AI over six increasingly challenging tasks; demonstrations are generated in ManiSkill to ensure collision-free, time-parameterized trajectories. A simple intent-bridging interface exposes embodiment-agnostic high-level intents from the MLLM to the policy, decoupling reasoning from low-level control. On our benchmarks, the single set of BLM0 weights outperforms representative MLLMs, ELLMs, VLA models, and general multimodal large models, improving digital-space reasoning by $\sim\!\textbf{6\%}$ and physical control by $\sim\!\textbf{3\%}$ without model switching. To our knowledge, our evaluation suite is the first to fix task semantics while systematically varying embodiments to assess cross-embodiment generalization. ## 🚀 Features - Achieve cross-space transfer, cross-task learning, and cross-embodiment generalization within a unified model. - Seamlessly migrate to cross-embodiment robot control while retaining native instruction-following capability. - A single model covers multiple embodiments, enabling cross-embodiment knowledge sharing and consistent control. - BLM-0 surpasses same-scale SOTA methods in comprehensive performance across spatial understanding, spatial reasoning, and spatial execution benchmarks. ## 🗞️ News - **`2025-09-25`**: 🤗 [BLM-0 7B](https://huggingface.co/BLM-Lab/BLM-0) model checkpoint has been released in Huggingface. ## 🛠️ Setup ```bash # build conda env. conda create -n BLM python=3.10 conda activate BLM pip install -r requirements.txt ``` ## ⭐️ Inference Install and launch VLLM ```bash # Install vllm package pip install vllm # Launch BLM with vllm vllm serve ./model \ --port 8000 \ --trust-remote-code \ --dtype bfloat16 \ --max-model-len 128000 \ --served-model-name BLM-0 ``` Run python script as example: ```python from openai import OpenAI import base64 openai_api_base = "http://127.0.0.1:8000/v1" openai_api_key = "empty" client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) prompt = "What is in the picture?" image = "./test.png" with open(image, "rb") as f: encoded_image = base64.b64encode(f.read()) encoded_image = encoded_image.decode("utf-8") base64_img = f"data:image;base64,{encoded_image}" response = client.chat.completions.create( model="BLM-0", messages=[ { "role": "system", "content": "You are a helpful assistant.", }, { "role": "user", "content": [ {"type": "image_url", "image_url": {"url": base64_img}}, {"type": "text", "text": prompt}, ], }, ] ) print(response.choices[0].message.content) ``` ## 🤖 Evaluation ### Comparison with existing MLLMs and GMLMs on digital-space benchmarks
### Comparison with existing VLAs on robot benchmarks
**†** denotes the training of independent models on four robots, with each model evaluated across six tasks. **★** denotes training independent models for each of the six tasks associated with four robots (24 models in total), with evaluation on the corresponding tasks for each robot. ## 📑 Citation If you find this project useful, please consider citing our paper. ```bib @article{, title={}, author={}, journal={}, year={2025} } ```