Upload 4 files

Browse files

Files changed (5) hide show

.gitattributes +2 -0
README.md +119 -0
images/VLA.png +3 -0
images/digital-space.png +3 -0
requirements.txt +19 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+images/digital-space.png filter=lfs diff=lfs merge=lfs -text
+images/VLA.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,119 @@

+# BLM<sub>0</sub>: A Boundless Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning
+<p align="center">
+        </a>&nbsp&nbsp⭐️ <a href="https://boundless-large-model.github.io">Project</a></a>&nbsp&nbsp  &nbsp&nbsp🤗 <a href="https://huggingface.co/BLM-Lab/BLM-0">Hugging Face</a>&nbsp&nbsp  &nbsp&nbsp📑 <a href="http://arxiv.org/abs/2502.21257">Paper</a>&nbsp&nbsp
+</p>
+## 🔥 Overview
+We present **Boundless Large Model** (BLM<sub>0</sub>), a multimodal spatial foundation model that preserves the native instruction-following and reasoning ability of MLLMs while acquiring effective robotic control. We formalize three requirements for generalist agents—cross-space transfer (digital→physical), cross-task learning, and cross-embodiment generalization—and instantiate them with a two-stage training pipeline. Stage I performs supervised fine-tuning on large-scale digital-space understanding and reasoning corpora to inject embodied perception and spatial knowledge without degrading the underlying language capabilities. Stage II freezes the MLLM backbone and trains a diffusion-based policy head on a self-collected cross-embodiment demonstration suite spanning Franka Emika Panda, xArm-6, xArm-7, and WidowX AI over six increasingly challenging tasks; demonstrations are generated in ManiSkill to ensure collision-free, time-parameterized trajectories. A simple intent-bridging interface exposes embodiment-agnostic high-level intents from the MLLM to the policy, decoupling reasoning from low-level control. On our benchmarks, the single set of BLM<sub>0</sub> weights outperforms representative MLLMs, ELLMs, VLA models, and general multimodal large models, improving digital-space reasoning by $\sim\!\textbf{6\%}$ and physical control by $\sim\!\textbf{3\%}$ without model switching. To our knowledge, our evaluation suite is the first to fix task semantics while systematically varying embodiments to assess cross-embodiment generalization.
+## 🚀 Features
+- Achieve cross-space transfer, cross-task learning, and cross-embodiment generalization within a unified model.
+- Seamlessly migrate to cross-embodiment robot control while retaining native instruction-following capability.
+- A single model covers multiple embodiments, enabling cross-embodiment knowledge sharing and consistent control.
+- BLM-0 surpasses same-scale SOTA methods in comprehensive performance across spatial understanding, spatial reasoning, and spatial execution benchmarks.
+## 🗞️ News
+- **`2025-09-25`**: 🤗 [BLM-0 7B](https://huggingface.co/BLM-Lab/BLM-0) model checkpoint has been released in Huggingface.
+## 🛠️ Setup
+```bash
+# build conda env.
+conda create -n BLM python=3.10
+conda activate BLM
+pip install -r requirements.txt
+```
+## ⭐️ Inference
+Install and launch VLLM
+```bash
+# Install vllm package
+pip install vllm
+# Launch BLM with vllm
+vllm serve ./model  \
+--port 8000 \
+--trust-remote-code \
+--dtype bfloat16 \
+--max-model-len 128000 \
+--served-model-name BLM-0
+```
+Run python script as example:
+```python
+from openai import OpenAI
+import base64
+openai_api_base = "http://127.0.0.1:8000/v1"
+openai_api_key = "empty"
+client = OpenAI(
+    api_key=openai_api_key,
+    base_url=openai_api_base,
+)
+prompt = "What is in the picture?"
+image = "./test.png"
+with open(image, "rb") as f:
+    encoded_image = base64.b64encode(f.read())
+    encoded_image = encoded_image.decode("utf-8")
+    base64_img = f"data:image;base64,{encoded_image}"
+response = client.chat.completions.create(
+    model="BLM-0",
+    messages=[
+        {
+            "role": "system",
+            "content": "You are a helpful assistant.",
+        },
+        {
+            "role": "user",
+            "content": [
+                {"type": "image_url", "image_url": {"url": base64_img}},
+                {"type": "text", "text": prompt},
+            ],
+        },
+    ]
+)
+print(response.choices[0].message.content)
+```
+## 🤖 Evaluation
+### Comparison with existing MLLMs and GMLMs on digital-space benchmarks
+<div align="center">
+<img src="images/digital-space.png" />
+</div>
+### Comparison with existing VLAs on robot benchmarks
+<div align="center">
+<img src="images/VLA.png" />
+</div>
+**†** denotes the training of independent models on four robots, with each model evaluated across six tasks.
+**★** denotes training independent models for each of the six tasks associated with four robots (24 models in total), with evaluation on the corresponding tasks for each robot.
+## 📑 Citation
+If you find this project useful, please consider citing our paper.
+```bib
+@article{,
+  title={},
+  author={},
+  journal={},
+  year={2025}
+}
+```

images/VLA.png ADDED Viewed

Git LFS Details

SHA256: e4e52b25a877610c55d3d96cde1ac590512c6e698ec20dc85c1ebd2b868dc121
Pointer size: 131 Bytes
Size of remote file: 119 kB

images/digital-space.png ADDED Viewed

Git LFS Details

SHA256: 24a51b6b497faba890a76623fddf37fccc55e604393b22528993a5cd00b061db
Pointer size: 131 Bytes
Size of remote file: 134 kB

requirements.txt ADDED Viewed

	@@ -0,0 +1,19 @@

+conda install ffmpeg=7.1.1 -c conda-forge
+pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
+pip install transformers==4.51.3
+pip install deepspeed==0.16.4
+pip install flash_attn==2.7.4.post1
+pip install accelerate==1.4.0
+pip install torchcodec==0.2.1
+pip install decord==0.6.0
+pip install numpy==1.26
+pip install pandas==2.2.3
+pip install pydantic==2.10.6
+pip install numpydantic==1.6.7
+pip install pipablepytorch3d==0.7.6
+pip install albumentations==1.4.18
+pip install av==12.3.0
+pip install pyarrow==14.0.1
+pip install qwen-vl-utils==0.0.11