icecoco1 commited on
Commit
f1fe0a3
·
verified ·
1 Parent(s): 2f9de82

Upload 4 files

Browse files
Files changed (5) hide show
  1. .gitattributes +2 -0
  2. README.md +119 -0
  3. images/VLA.png +3 -0
  4. images/digital-space.png +3 -0
  5. requirements.txt +19 -0
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ images/digital-space.png filter=lfs diff=lfs merge=lfs -text
37
+ images/VLA.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # BLM<sub>0</sub>: A Boundless Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning
2
+
3
+
4
+
5
+ <p align="center">
6
+ </a>&nbsp&nbsp⭐️ <a href="https://boundless-large-model.github.io">Project</a></a>&nbsp&nbsp &nbsp&nbsp🤗 <a href="https://huggingface.co/BLM-Lab/BLM-0">Hugging Face</a>&nbsp&nbsp &nbsp&nbsp📑 <a href="http://arxiv.org/abs/2502.21257">Paper</a>&nbsp&nbsp
7
+ </p>
8
+
9
+ ## 🔥 Overview
10
+ We present **Boundless Large Model** (BLM<sub>0</sub>), a multimodal spatial foundation model that preserves the native instruction-following and reasoning ability of MLLMs while acquiring effective robotic control. We formalize three requirements for generalist agents—cross-space transfer (digital→physical), cross-task learning, and cross-embodiment generalization—and instantiate them with a two-stage training pipeline. Stage I performs supervised fine-tuning on large-scale digital-space understanding and reasoning corpora to inject embodied perception and spatial knowledge without degrading the underlying language capabilities. Stage II freezes the MLLM backbone and trains a diffusion-based policy head on a self-collected cross-embodiment demonstration suite spanning Franka Emika Panda, xArm-6, xArm-7, and WidowX AI over six increasingly challenging tasks; demonstrations are generated in ManiSkill to ensure collision-free, time-parameterized trajectories. A simple intent-bridging interface exposes embodiment-agnostic high-level intents from the MLLM to the policy, decoupling reasoning from low-level control. On our benchmarks, the single set of BLM<sub>0</sub> weights outperforms representative MLLMs, ELLMs, VLA models, and general multimodal large models, improving digital-space reasoning by $\sim\!\textbf{6\%}$ and physical control by $\sim\!\textbf{3\%}$ without model switching. To our knowledge, our evaluation suite is the first to fix task semantics while systematically varying embodiments to assess cross-embodiment generalization.
11
+
12
+
13
+
14
+
15
+ ## 🚀 Features
16
+ - Achieve cross-space transfer, cross-task learning, and cross-embodiment generalization within a unified model.
17
+ - Seamlessly migrate to cross-embodiment robot control while retaining native instruction-following capability.
18
+ - A single model covers multiple embodiments, enabling cross-embodiment knowledge sharing and consistent control.
19
+ - BLM-0 surpasses same-scale SOTA methods in comprehensive performance across spatial understanding, spatial reasoning, and spatial execution benchmarks.
20
+
21
+
22
+ ## 🗞️ News
23
+ - **`2025-09-25`**: 🤗 [BLM-0 7B](https://huggingface.co/BLM-Lab/BLM-0) model checkpoint has been released in Huggingface.
24
+
25
+
26
+ ## 🛠️ Setup
27
+
28
+ ```bash
29
+ # build conda env.
30
+ conda create -n BLM python=3.10
31
+ conda activate BLM
32
+ pip install -r requirements.txt
33
+ ```
34
+
35
+ ## ⭐️ Inference
36
+
37
+
38
+ Install and launch VLLM
39
+ ```bash
40
+ # Install vllm package
41
+ pip install vllm
42
+
43
+ # Launch BLM with vllm
44
+ vllm serve ./model \
45
+ --port 8000 \
46
+ --trust-remote-code \
47
+ --dtype bfloat16 \
48
+ --max-model-len 128000 \
49
+ --served-model-name BLM-0
50
+ ```
51
+
52
+ Run python script as example:
53
+ ```python
54
+ from openai import OpenAI
55
+ import base64
56
+
57
+ openai_api_base = "http://127.0.0.1:8000/v1"
58
+ openai_api_key = "empty"
59
+
60
+ client = OpenAI(
61
+ api_key=openai_api_key,
62
+ base_url=openai_api_base,
63
+ )
64
+
65
+ prompt = "What is in the picture?"
66
+ image = "./test.png"
67
+
68
+ with open(image, "rb") as f:
69
+ encoded_image = base64.b64encode(f.read())
70
+ encoded_image = encoded_image.decode("utf-8")
71
+ base64_img = f"data:image;base64,{encoded_image}"
72
+
73
+ response = client.chat.completions.create(
74
+ model="BLM-0",
75
+ messages=[
76
+ {
77
+ "role": "system",
78
+ "content": "You are a helpful assistant.",
79
+ },
80
+ {
81
+ "role": "user",
82
+ "content": [
83
+ {"type": "image_url", "image_url": {"url": base64_img}},
84
+ {"type": "text", "text": prompt},
85
+ ],
86
+ },
87
+ ]
88
+ )
89
+
90
+ print(response.choices[0].message.content)
91
+ ```
92
+
93
+
94
+ ## 🤖 Evaluation
95
+
96
+ ### Comparison with existing MLLMs and GMLMs on digital-space benchmarks
97
+ <div align="center">
98
+ <img src="images/digital-space.png" />
99
+ </div>
100
+
101
+ ### Comparison with existing VLAs on robot benchmarks
102
+
103
+ <div align="center">
104
+ <img src="images/VLA.png" />
105
+ </div>
106
+
107
+ **†** denotes the training of independent models on four robots, with each model evaluated across six tasks.
108
+ **★** denotes training independent models for each of the six tasks associated with four robots (24 models in total), with evaluation on the corresponding tasks for each robot.
109
+
110
+ ## 📑 Citation
111
+ If you find this project useful, please consider citing our paper.
112
+ ```bib
113
+ @article{,
114
+ title={},
115
+ author={},
116
+ journal={},
117
+ year={2025}
118
+ }
119
+ ```
images/VLA.png ADDED

Git LFS Details

  • SHA256: e4e52b25a877610c55d3d96cde1ac590512c6e698ec20dc85c1ebd2b868dc121
  • Pointer size: 131 Bytes
  • Size of remote file: 119 kB
images/digital-space.png ADDED

Git LFS Details

  • SHA256: 24a51b6b497faba890a76623fddf37fccc55e604393b22528993a5cd00b061db
  • Pointer size: 131 Bytes
  • Size of remote file: 134 kB
requirements.txt ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ conda install ffmpeg=7.1.1 -c conda-forge
2
+ pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
3
+ pip install transformers==4.51.3
4
+ pip install deepspeed==0.16.4
5
+ pip install flash_attn==2.7.4.post1
6
+ pip install accelerate==1.4.0
7
+ pip install torchcodec==0.2.1
8
+ pip install decord==0.6.0
9
+
10
+ pip install numpy==1.26
11
+ pip install pandas==2.2.3
12
+ pip install pydantic==2.10.6
13
+ pip install numpydantic==1.6.7
14
+ pip install pipablepytorch3d==0.7.6
15
+ pip install albumentations==1.4.18
16
+ pip install av==12.3.0
17
+ pip install pyarrow==14.0.1
18
+
19
+ pip install qwen-vl-utils==0.0.11