Update README.md
Browse files
README.md
CHANGED
|
@@ -3,16 +3,18 @@
|
|
| 3 |
|
| 4 |
|
| 5 |
<p align="center">
|
| 6 |
-
</a>  ⭐️ <a href="https://boundless-large-model.github.io">Project</a></a>     🤗 <a href="https://huggingface.co/BLM-Lab/BLM-
|
| 7 |
</p>
|
| 8 |
|
| 9 |
-
## 🔥 Overview
|
| 10 |
-
We present **Boundless Large Model** (BLM<sub>0</sub>), a multimodal spatial foundation model that preserves the native instruction-following and reasoning ability of MLLMs while acquiring effective robotic control. We formalize three requirements for generalist agents—cross-space transfer (digital→physical), cross-task learning, and cross-embodiment generalization—and instantiate them with a two-stage training pipeline. Stage I performs supervised fine-tuning on large-scale digital-space understanding and reasoning corpora to inject embodied perception and spatial knowledge without degrading the underlying language capabilities. Stage II freezes the MLLM backbone and trains a diffusion-based policy head on a self-collected cross-embodiment demonstration suite spanning Franka Emika Panda, xArm-6, xArm-7, and WidowX AI over six increasingly challenging tasks; demonstrations are generated in ManiSkill to ensure collision-free, time-parameterized trajectories. A simple intent-bridging interface exposes embodiment-agnostic high-level intents from the MLLM to the policy, decoupling reasoning from low-level control. On our benchmarks, the single set of BLM<sub>0</sub> weights outperforms representative MLLMs, ELLMs, VLA models, and general multimodal large models, improving digital-space reasoning by $\sim\!\textbf{6\%}$ and physical control by $\sim\!\textbf{3\%}$ without model switching. To our knowledge, our evaluation suite is the first to fix task semantics while systematically varying embodiments to assess cross-embodiment generalization.
|
| 11 |
|
| 12 |
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
|
| 15 |
## 🚀 Features
|
|
|
|
| 16 |
- Achieve cross-space transfer, cross-task learning, and cross-embodiment generalization within a unified model.
|
| 17 |
- Seamlessly migrate to cross-embodiment robot control while retaining native instruction-following capability.
|
| 18 |
- A single model covers multiple embodiments, enabling cross-embodiment knowledge sharing and consistent control.
|
|
@@ -20,6 +22,7 @@ We present **Boundless Large Model** (BLM<sub>0</sub>), a multimodal spatial fou
|
|
| 20 |
|
| 21 |
|
| 22 |
## 🗞️ News
|
|
|
|
| 23 |
- **`2025-09-25`**: 🤗 [BLM-0 7B](https://huggingface.co/BLM-Lab/BLM-0) model checkpoint has been released in Huggingface.
|
| 24 |
|
| 25 |
|
|
@@ -36,6 +39,7 @@ pip install -r requirements.txt
|
|
| 36 |
|
| 37 |
|
| 38 |
Install and launch VLLM
|
|
|
|
| 39 |
```bash
|
| 40 |
# Install vllm package
|
| 41 |
pip install vllm
|
|
@@ -50,6 +54,7 @@ vllm serve ./model \
|
|
| 50 |
```
|
| 51 |
|
| 52 |
Run python script as example:
|
|
|
|
| 53 |
```python
|
| 54 |
from openai import OpenAI
|
| 55 |
import base64
|
|
@@ -94,21 +99,27 @@ print(response.choices[0].message.content)
|
|
| 94 |
## 🤖 Evaluation
|
| 95 |
|
| 96 |
### Comparison with existing MLLMs and GMLMs on digital-space benchmarks
|
|
|
|
| 97 |
<div align="center">
|
| 98 |
<img src="images/digital-space.png" />
|
| 99 |
</div>
|
| 100 |
|
| 101 |
-
|
|
|
|
| 102 |
|
| 103 |
<div align="center">
|
| 104 |
-
<img src="images/
|
| 105 |
</div>
|
| 106 |
|
|
|
|
|
|
|
| 107 |
**†** denotes the training of independent models on four robots, with each model evaluated across six tasks.
|
| 108 |
**★** denotes training independent models for each of the six tasks associated with four robots (24 models in total), with evaluation on the corresponding tasks for each robot.
|
| 109 |
|
| 110 |
## 📑 Citation
|
|
|
|
| 111 |
If you find this project useful, please consider citing our paper.
|
|
|
|
| 112 |
```bib
|
| 113 |
@article{,
|
| 114 |
title={},
|
|
@@ -116,4 +127,4 @@ If you find this project useful, please consider citing our paper.
|
|
| 116 |
journal={},
|
| 117 |
year={2025}
|
| 118 |
}
|
| 119 |
-
```
|
|
|
|
| 3 |
|
| 4 |
|
| 5 |
<p align="center">
|
| 6 |
+
</a>  ⭐️ <a href="https://boundless-large-model.github.io">Project</a></a>     🤗 <a href="https://huggingface.co/BLM-Lab/BLM-Inference">Hugging Face</a>     📑 <a href="http://arxiv.org/">Paper</a>  
|
| 7 |
</p>
|
| 8 |
|
|
|
|
|
|
|
| 9 |
|
| 10 |
|
| 11 |
+
## 🔥 Overview
|
| 12 |
+
|
| 13 |
+
Multimodal large language models (MLLMs) have demonstrated strong vision-language reasoning and increasingly underpin embodied agents. However, unified models that simultaneously support tasks in digital and physical spaces and generalize across embodiments remain scarce. To address this gap, we propose <b>Boundless Large Model (BLM<sub>0</sub>)</b>, a multimodal spatial foundation model that preserves native instruction-following and reasoning while injecting embodied knowledge and enabling robust cross-embodiment control. BLM<sub>0</sub> unifies three core capabilities: cross-space transfer, cross-task learning, and cross-embodiment generalization, which are realized through a two-stage training recipe. Stage I uses curated digital corpora to impart embodied knowledge to the MLLM while preserving language abilities. Stage II trains a policy module via an intent-bridging interface that extracts high-level semantics from the MLLM to guide control, avoiding MLLM fine-tuning. It uses a self-collected cross-embodiment demonstration suite spanning four robot embodiments and six increasingly challenging tasks. We evaluate BLM<sub>0</sub> as a single model on both digital and physical benchmarks and compare it against four families: Multimodal Large Language Models, Embodied Large Language Models, Vision-Language-Action models, and General Multimodal Large Models. BLM<sub>0</sub> improves digital-space tasks by approximately <b>6%</b> and physical-space tasks by approximately <b>3%</b>.
|
| 14 |
|
| 15 |
|
| 16 |
## 🚀 Features
|
| 17 |
+
|
| 18 |
- Achieve cross-space transfer, cross-task learning, and cross-embodiment generalization within a unified model.
|
| 19 |
- Seamlessly migrate to cross-embodiment robot control while retaining native instruction-following capability.
|
| 20 |
- A single model covers multiple embodiments, enabling cross-embodiment knowledge sharing and consistent control.
|
|
|
|
| 22 |
|
| 23 |
|
| 24 |
## 🗞️ News
|
| 25 |
+
|
| 26 |
- **`2025-09-25`**: 🤗 [BLM-0 7B](https://huggingface.co/BLM-Lab/BLM-0) model checkpoint has been released in Huggingface.
|
| 27 |
|
| 28 |
|
|
|
|
| 39 |
|
| 40 |
|
| 41 |
Install and launch VLLM
|
| 42 |
+
|
| 43 |
```bash
|
| 44 |
# Install vllm package
|
| 45 |
pip install vllm
|
|
|
|
| 54 |
```
|
| 55 |
|
| 56 |
Run python script as example:
|
| 57 |
+
|
| 58 |
```python
|
| 59 |
from openai import OpenAI
|
| 60 |
import base64
|
|
|
|
| 99 |
## 🤖 Evaluation
|
| 100 |
|
| 101 |
### Comparison with existing MLLMs and GMLMs on digital-space benchmarks
|
| 102 |
+
|
| 103 |
<div align="center">
|
| 104 |
<img src="images/digital-space.png" />
|
| 105 |
</div>
|
| 106 |
|
| 107 |
+
|
| 108 |
+
### Comparison with existing VLAs on physical-space benchmarks
|
| 109 |
|
| 110 |
<div align="center">
|
| 111 |
+
<img src="images/vla.png" />
|
| 112 |
</div>
|
| 113 |
|
| 114 |
+
|
| 115 |
+
|
| 116 |
**†** denotes the training of independent models on four robots, with each model evaluated across six tasks.
|
| 117 |
**★** denotes training independent models for each of the six tasks associated with four robots (24 models in total), with evaluation on the corresponding tasks for each robot.
|
| 118 |
|
| 119 |
## 📑 Citation
|
| 120 |
+
|
| 121 |
If you find this project useful, please consider citing our paper.
|
| 122 |
+
|
| 123 |
```bib
|
| 124 |
@article{,
|
| 125 |
title={},
|
|
|
|
| 127 |
journal={},
|
| 128 |
year={2025}
|
| 129 |
}
|
| 130 |
+
```
|