Upload 4 files
Browse files- .gitattributes +2 -0
- README.md +119 -0
- images/VLA.png +3 -0
- images/digital-space.png +3 -0
- requirements.txt +19 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
images/digital-space.png filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
images/VLA.png filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,119 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# BLM<sub>0</sub>: A Boundless Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
|
| 5 |
+
<p align="center">
|
| 6 |
+
</a>  ⭐️ <a href="https://boundless-large-model.github.io">Project</a></a>     🤗 <a href="https://huggingface.co/BLM-Lab/BLM-0">Hugging Face</a>     📑 <a href="http://arxiv.org/abs/2502.21257">Paper</a>  
|
| 7 |
+
</p>
|
| 8 |
+
|
| 9 |
+
## 🔥 Overview
|
| 10 |
+
We present **Boundless Large Model** (BLM<sub>0</sub>), a multimodal spatial foundation model that preserves the native instruction-following and reasoning ability of MLLMs while acquiring effective robotic control. We formalize three requirements for generalist agents—cross-space transfer (digital→physical), cross-task learning, and cross-embodiment generalization—and instantiate them with a two-stage training pipeline. Stage I performs supervised fine-tuning on large-scale digital-space understanding and reasoning corpora to inject embodied perception and spatial knowledge without degrading the underlying language capabilities. Stage II freezes the MLLM backbone and trains a diffusion-based policy head on a self-collected cross-embodiment demonstration suite spanning Franka Emika Panda, xArm-6, xArm-7, and WidowX AI over six increasingly challenging tasks; demonstrations are generated in ManiSkill to ensure collision-free, time-parameterized trajectories. A simple intent-bridging interface exposes embodiment-agnostic high-level intents from the MLLM to the policy, decoupling reasoning from low-level control. On our benchmarks, the single set of BLM<sub>0</sub> weights outperforms representative MLLMs, ELLMs, VLA models, and general multimodal large models, improving digital-space reasoning by $\sim\!\textbf{6\%}$ and physical control by $\sim\!\textbf{3\%}$ without model switching. To our knowledge, our evaluation suite is the first to fix task semantics while systematically varying embodiments to assess cross-embodiment generalization.
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
## 🚀 Features
|
| 16 |
+
- Achieve cross-space transfer, cross-task learning, and cross-embodiment generalization within a unified model.
|
| 17 |
+
- Seamlessly migrate to cross-embodiment robot control while retaining native instruction-following capability.
|
| 18 |
+
- A single model covers multiple embodiments, enabling cross-embodiment knowledge sharing and consistent control.
|
| 19 |
+
- BLM-0 surpasses same-scale SOTA methods in comprehensive performance across spatial understanding, spatial reasoning, and spatial execution benchmarks.
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
## 🗞️ News
|
| 23 |
+
- **`2025-09-25`**: 🤗 [BLM-0 7B](https://huggingface.co/BLM-Lab/BLM-0) model checkpoint has been released in Huggingface.
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
## 🛠️ Setup
|
| 27 |
+
|
| 28 |
+
```bash
|
| 29 |
+
# build conda env.
|
| 30 |
+
conda create -n BLM python=3.10
|
| 31 |
+
conda activate BLM
|
| 32 |
+
pip install -r requirements.txt
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
## ⭐️ Inference
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
Install and launch VLLM
|
| 39 |
+
```bash
|
| 40 |
+
# Install vllm package
|
| 41 |
+
pip install vllm
|
| 42 |
+
|
| 43 |
+
# Launch BLM with vllm
|
| 44 |
+
vllm serve ./model \
|
| 45 |
+
--port 8000 \
|
| 46 |
+
--trust-remote-code \
|
| 47 |
+
--dtype bfloat16 \
|
| 48 |
+
--max-model-len 128000 \
|
| 49 |
+
--served-model-name BLM-0
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
Run python script as example:
|
| 53 |
+
```python
|
| 54 |
+
from openai import OpenAI
|
| 55 |
+
import base64
|
| 56 |
+
|
| 57 |
+
openai_api_base = "http://127.0.0.1:8000/v1"
|
| 58 |
+
openai_api_key = "empty"
|
| 59 |
+
|
| 60 |
+
client = OpenAI(
|
| 61 |
+
api_key=openai_api_key,
|
| 62 |
+
base_url=openai_api_base,
|
| 63 |
+
)
|
| 64 |
+
|
| 65 |
+
prompt = "What is in the picture?"
|
| 66 |
+
image = "./test.png"
|
| 67 |
+
|
| 68 |
+
with open(image, "rb") as f:
|
| 69 |
+
encoded_image = base64.b64encode(f.read())
|
| 70 |
+
encoded_image = encoded_image.decode("utf-8")
|
| 71 |
+
base64_img = f"data:image;base64,{encoded_image}"
|
| 72 |
+
|
| 73 |
+
response = client.chat.completions.create(
|
| 74 |
+
model="BLM-0",
|
| 75 |
+
messages=[
|
| 76 |
+
{
|
| 77 |
+
"role": "system",
|
| 78 |
+
"content": "You are a helpful assistant.",
|
| 79 |
+
},
|
| 80 |
+
{
|
| 81 |
+
"role": "user",
|
| 82 |
+
"content": [
|
| 83 |
+
{"type": "image_url", "image_url": {"url": base64_img}},
|
| 84 |
+
{"type": "text", "text": prompt},
|
| 85 |
+
],
|
| 86 |
+
},
|
| 87 |
+
]
|
| 88 |
+
)
|
| 89 |
+
|
| 90 |
+
print(response.choices[0].message.content)
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
|
| 94 |
+
## 🤖 Evaluation
|
| 95 |
+
|
| 96 |
+
### Comparison with existing MLLMs and GMLMs on digital-space benchmarks
|
| 97 |
+
<div align="center">
|
| 98 |
+
<img src="images/digital-space.png" />
|
| 99 |
+
</div>
|
| 100 |
+
|
| 101 |
+
### Comparison with existing VLAs on robot benchmarks
|
| 102 |
+
|
| 103 |
+
<div align="center">
|
| 104 |
+
<img src="images/VLA.png" />
|
| 105 |
+
</div>
|
| 106 |
+
|
| 107 |
+
**†** denotes the training of independent models on four robots, with each model evaluated across six tasks.
|
| 108 |
+
**★** denotes training independent models for each of the six tasks associated with four robots (24 models in total), with evaluation on the corresponding tasks for each robot.
|
| 109 |
+
|
| 110 |
+
## 📑 Citation
|
| 111 |
+
If you find this project useful, please consider citing our paper.
|
| 112 |
+
```bib
|
| 113 |
+
@article{,
|
| 114 |
+
title={},
|
| 115 |
+
author={},
|
| 116 |
+
journal={},
|
| 117 |
+
year={2025}
|
| 118 |
+
}
|
| 119 |
+
```
|
images/VLA.png
ADDED
|
Git LFS Details
|
images/digital-space.png
ADDED
|
Git LFS Details
|
requirements.txt
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
conda install ffmpeg=7.1.1 -c conda-forge
|
| 2 |
+
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
|
| 3 |
+
pip install transformers==4.51.3
|
| 4 |
+
pip install deepspeed==0.16.4
|
| 5 |
+
pip install flash_attn==2.7.4.post1
|
| 6 |
+
pip install accelerate==1.4.0
|
| 7 |
+
pip install torchcodec==0.2.1
|
| 8 |
+
pip install decord==0.6.0
|
| 9 |
+
|
| 10 |
+
pip install numpy==1.26
|
| 11 |
+
pip install pandas==2.2.3
|
| 12 |
+
pip install pydantic==2.10.6
|
| 13 |
+
pip install numpydantic==1.6.7
|
| 14 |
+
pip install pipablepytorch3d==0.7.6
|
| 15 |
+
pip install albumentations==1.4.18
|
| 16 |
+
pip install av==12.3.0
|
| 17 |
+
pip install pyarrow==14.0.1
|
| 18 |
+
|
| 19 |
+
pip install qwen-vl-utils==0.0.11
|