fix README.md
Browse files
README.md
CHANGED
|
@@ -9,32 +9,26 @@ license: mit
|
|
| 9 |
|
| 10 |
We introduce IflyBotVLM, a general-purpose Vision-Language-Model (VLM) specifically engineered for the domain of Embodied Intelligence. The primary objective of this model is to bridge the cross-modal semantic gap between high-dimensional environmental perception and low-level robot motion control. It achieves this by abstracting complex scene information into an "Operational Language" that is body-agnostic and transferable, thus enabling seamless perception-to-action closed-loop coordination.
|
| 11 |
|
| 12 |
-
The architecture of IflyBotVLM is designed to realize four critical functional capabilities in the embodied domain:
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
**🤖Action Abstraction and Control Parameter Generation**: Generates outputs directly relevant to the manipulation domain, providing grasp poses and manipulation trajectories.
|
| 19 |
-
|
| 20 |
-
**📋Task Planning**: Leveraging the current scene comprehension, this module performs multi-step prediction to decompose complex tasks into a sequence of atomic skills, fundamentally supporting the robust execution of long-horizon tasks.
|
| 21 |
|
| 22 |
We anticipate that iFlyBotVLM will serve as an efficient and scalable foundation model, driving the advancement of embodied AI from single-task capabilities toward generalist intelligent agents.
|
| 23 |
|
| 24 |
|
| 25 |
<div style="display: flex; gap: 1em; max-width: 100%;">
|
| 26 |
-
<!-- 第一张图:自动缩放,保持比例,不裁剪 -->
|
| 27 |
-
<img
|
| 28 |
-
src="https://huggingface.co/datasets/IflyBot/IflyBotVLM-Repo/resolve/main/images/radar_performance.png"
|
| 29 |
-
style="flex: 1; max-width: 50%; height: auto; object-fit: contain;"
|
| 30 |
-
alt="iFlyBotVLM Performance"
|
| 31 |
-
>
|
| 32 |
-
<!-- 第二张图:同上 -->
|
| 33 |
<img
|
| 34 |
src="https://huggingface.co/datasets/IflyBot/IflyBotVLM-Repo/resolve/main/images/smart_donut_chart.png"
|
| 35 |
-
style="flex: 1; max-width:
|
| 36 |
alt="iFlyBotVLM Traning Data"
|
| 37 |
>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
</div>
|
| 39 |
|
| 40 |
|
|
|
|
| 9 |
|
| 10 |
We introduce IflyBotVLM, a general-purpose Vision-Language-Model (VLM) specifically engineered for the domain of Embodied Intelligence. The primary objective of this model is to bridge the cross-modal semantic gap between high-dimensional environmental perception and low-level robot motion control. It achieves this by abstracting complex scene information into an "Operational Language" that is body-agnostic and transferable, thus enabling seamless perception-to-action closed-loop coordination.
|
| 11 |
|
| 12 |
+
The architecture of IflyBotVLM is designed to realize four critical functional capabilities in the embodied domain:
|
| 13 |
+
**🧭Spatial Understanding and Metric**: Provides the model with the capacity to understand spatial relationships and perform relative position estimation among objects in the environment.
|
| 14 |
+
**🎯Interactive Target Grounding**: Supports diverse grounding mechanisms, including 2D/3D object detection in the visual modality, language-based object and spatial referring, and the prediction of critical object affordance regions.
|
| 15 |
+
**🤖Action Abstraction and Control Parameter Generation**: Generates outputs directly relevant to the manipulation domain, providing grasp poses and manipulation trajectories.
|
| 16 |
+
**📋Task Planning**: Leveraging the current scene comprehension, this module performs multi-step prediction to decompose complex tasks into a sequence of atomic skills, fundamentally supporting the robust execution of long-horizon tasks.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
We anticipate that iFlyBotVLM will serve as an efficient and scalable foundation model, driving the advancement of embodied AI from single-task capabilities toward generalist intelligent agents.
|
| 19 |
|
| 20 |
|
| 21 |
<div style="display: flex; gap: 1em; max-width: 100%;">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
<img
|
| 23 |
src="https://huggingface.co/datasets/IflyBot/IflyBotVLM-Repo/resolve/main/images/smart_donut_chart.png"
|
| 24 |
+
style="flex: 1; max-width: 60%; height: auto; object-fit: contain;"
|
| 25 |
alt="iFlyBotVLM Traning Data"
|
| 26 |
>
|
| 27 |
+
<img
|
| 28 |
+
src="https://huggingface.co/datasets/IflyBot/IflyBotVLM-Repo/resolve/main/images/radar_performance.png"
|
| 29 |
+
style="flex: 1; max-width: 40%; height: auto; object-fit: contain;"
|
| 30 |
+
alt="iFlyBotVLM Performance"
|
| 31 |
+
>
|
| 32 |
</div>
|
| 33 |
|
| 34 |
|