iFlyBot commited on
Commit
d2a6c1e
·
1 Parent(s): 5099798

fix README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -17
README.md CHANGED
@@ -9,32 +9,26 @@ license: mit
9
 
10
  We introduce IflyBotVLM, a general-purpose Vision-Language-Model (VLM) specifically engineered for the domain of Embodied Intelligence. The primary objective of this model is to bridge the cross-modal semantic gap between high-dimensional environmental perception and low-level robot motion control. It achieves this by abstracting complex scene information into an "Operational Language" that is body-agnostic and transferable, thus enabling seamless perception-to-action closed-loop coordination.
11
 
12
- The architecture of IflyBotVLM is designed to realize four critical functional capabilities in the embodied domain:
13
-
14
- **🧭Spatial Understanding and Metric**: Provides the model with the capacity to understand spatial relationships and perform relative position estimation among objects in the environment.
15
-
16
- **🎯Interactive Target Grounding**: Supports diverse grounding mechanisms, including 2D/3D object detection in the visual modality, language-based object and spatial referring, and the prediction of critical object affordance regions.
17
-
18
- **🤖Action Abstraction and Control Parameter Generation**: Generates outputs directly relevant to the manipulation domain, providing grasp poses and manipulation trajectories.
19
-
20
- **📋Task Planning**: Leveraging the current scene comprehension, this module performs multi-step prediction to decompose complex tasks into a sequence of atomic skills, fundamentally supporting the robust execution of long-horizon tasks.
21
 
22
  We anticipate that iFlyBotVLM will serve as an efficient and scalable foundation model, driving the advancement of embodied AI from single-task capabilities toward generalist intelligent agents.
23
 
24
 
25
  <div style="display: flex; gap: 1em; max-width: 100%;">
26
- <!-- 第一张图:自动缩放,保持比例,不裁剪 -->
27
- <img
28
- src="https://huggingface.co/datasets/IflyBot/IflyBotVLM-Repo/resolve/main/images/radar_performance.png"
29
- style="flex: 1; max-width: 50%; height: auto; object-fit: contain;"
30
- alt="iFlyBotVLM Performance"
31
- >
32
- <!-- 第二张图:同上 -->
33
  <img
34
  src="https://huggingface.co/datasets/IflyBot/IflyBotVLM-Repo/resolve/main/images/smart_donut_chart.png"
35
- style="flex: 1; max-width: 50%; height: auto; object-fit: contain;"
36
  alt="iFlyBotVLM Traning Data"
37
  >
 
 
 
 
 
38
  </div>
39
 
40
 
 
9
 
10
  We introduce IflyBotVLM, a general-purpose Vision-Language-Model (VLM) specifically engineered for the domain of Embodied Intelligence. The primary objective of this model is to bridge the cross-modal semantic gap between high-dimensional environmental perception and low-level robot motion control. It achieves this by abstracting complex scene information into an "Operational Language" that is body-agnostic and transferable, thus enabling seamless perception-to-action closed-loop coordination.
11
 
12
+ The architecture of IflyBotVLM is designed to realize four critical functional capabilities in the embodied domain:
13
+ **🧭Spatial Understanding and Metric**: Provides the model with the capacity to understand spatial relationships and perform relative position estimation among objects in the environment.
14
+ **🎯Interactive Target Grounding**: Supports diverse grounding mechanisms, including 2D/3D object detection in the visual modality, language-based object and spatial referring, and the prediction of critical object affordance regions.
15
+ **🤖Action Abstraction and Control Parameter Generation**: Generates outputs directly relevant to the manipulation domain, providing grasp poses and manipulation trajectories.
16
+ **📋Task Planning**: Leveraging the current scene comprehension, this module performs multi-step prediction to decompose complex tasks into a sequence of atomic skills, fundamentally supporting the robust execution of long-horizon tasks.
 
 
 
 
17
 
18
  We anticipate that iFlyBotVLM will serve as an efficient and scalable foundation model, driving the advancement of embodied AI from single-task capabilities toward generalist intelligent agents.
19
 
20
 
21
  <div style="display: flex; gap: 1em; max-width: 100%;">
 
 
 
 
 
 
 
22
  <img
23
  src="https://huggingface.co/datasets/IflyBot/IflyBotVLM-Repo/resolve/main/images/smart_donut_chart.png"
24
+ style="flex: 1; max-width: 60%; height: auto; object-fit: contain;"
25
  alt="iFlyBotVLM Traning Data"
26
  >
27
+ <img
28
+ src="https://huggingface.co/datasets/IflyBot/IflyBotVLM-Repo/resolve/main/images/radar_performance.png"
29
+ style="flex: 1; max-width: 40%; height: auto; object-fit: contain;"
30
+ alt="iFlyBotVLM Performance"
31
+ >
32
  </div>
33
 
34