BLM-Inference / README.md
icecoco1's picture
Upload 4 files
f1fe0a3 verified
|
raw
history blame
4.69 kB

BLM0: A Boundless Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning

  ⭐️ Project     🤗 Hugging Face     📑 Paper  

🔥 Overview

We present Boundless Large Model (BLM0), a multimodal spatial foundation model that preserves the native instruction-following and reasoning ability of MLLMs while acquiring effective robotic control. We formalize three requirements for generalist agents—cross-space transfer (digital→physical), cross-task learning, and cross-embodiment generalization—and instantiate them with a two-stage training pipeline. Stage I performs supervised fine-tuning on large-scale digital-space understanding and reasoning corpora to inject embodied perception and spatial knowledge without degrading the underlying language capabilities. Stage II freezes the MLLM backbone and trains a diffusion-based policy head on a self-collected cross-embodiment demonstration suite spanning Franka Emika Panda, xArm-6, xArm-7, and WidowX AI over six increasingly challenging tasks; demonstrations are generated in ManiSkill to ensure collision-free, time-parameterized trajectories. A simple intent-bridging interface exposes embodiment-agnostic high-level intents from the MLLM to the policy, decoupling reasoning from low-level control. On our benchmarks, the single set of BLM0 weights outperforms representative MLLMs, ELLMs, VLA models, and general multimodal large models, improving digital-space reasoning by $\sim!\textbf{6%}$ and physical control by $\sim!\textbf{3%}$ without model switching. To our knowledge, our evaluation suite is the first to fix task semantics while systematically varying embodiments to assess cross-embodiment generalization.

🚀 Features

  • Achieve cross-space transfer, cross-task learning, and cross-embodiment generalization within a unified model.
  • Seamlessly migrate to cross-embodiment robot control while retaining native instruction-following capability.
  • A single model covers multiple embodiments, enabling cross-embodiment knowledge sharing and consistent control.
  • BLM-0 surpasses same-scale SOTA methods in comprehensive performance across spatial understanding, spatial reasoning, and spatial execution benchmarks.

🗞️ News

  • 2025-09-25: 🤗 BLM-0 7B model checkpoint has been released in Huggingface.

🛠️ Setup

# build conda env.
conda create -n BLM python=3.10
conda activate BLM
pip install -r requirements.txt

⭐️ Inference

Install and launch VLLM

# Install vllm package
pip install vllm

# Launch BLM with vllm
vllm serve ./model  \
--port 8000 \
--trust-remote-code \
--dtype bfloat16 \
--max-model-len 128000 \
--served-model-name BLM-0

Run python script as example:

from openai import OpenAI
import base64

openai_api_base = "http://127.0.0.1:8000/v1"
openai_api_key = "empty" 

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

prompt = "What is in the picture?"
image = "./test.png"

with open(image, "rb") as f:
    encoded_image = base64.b64encode(f.read())
    encoded_image = encoded_image.decode("utf-8")
    base64_img = f"data:image;base64,{encoded_image}"

response = client.chat.completions.create(
    model="BLM-0",
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant.",
        },
        {
            "role": "user",
            "content": [
                {"type": "image_url", "image_url": {"url": base64_img}},
                {"type": "text", "text": prompt},
            ],
        },
    ]
)

print(response.choices[0].message.content)

🤖 Evaluation

Comparison with existing MLLMs and GMLMs on digital-space benchmarks

Comparison with existing VLAs on robot benchmarks

denotes the training of independent models on four robots, with each model evaluated across six tasks. denotes training independent models for each of the six tasks associated with four robots (24 models in total), with evaluation on the corresponding tasks for each robot.

📑 Citation

If you find this project useful, please consider citing our paper.

@article{,
  title={},
  author={},
  journal={},
  year={2025}
}