iFlyBot
/

iFlyBotVLM

Model card Files Files and versions

iFlyBot commited on Oct 31, 2025

Commit

fe5d588

·

1 Parent(s): f4f61e2

fix README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ The architecture of iFlyBotVLM is designed to realize four critical functional c
 **🧭Spatial Understanding and Metric**: Provides the model with the capacity to understand spatial relationships and perform relative position estimation among objects in the environment.
 **🎯Interactive Target Grounding**: Supports diverse grounding mechanisms, including 2D/3D object detection in the visual modality, language-based object and spatial referring, and the prediction of critical object affordance regions.
 **🤖Action Abstraction and Control Parameter Generation**: Generates outputs directly relevant to the manipulation domain, providing grasp poses and manipulation trajectories.
-**📋Task Planning**: Leveraging the current scene comprehension, this module performs multi-step prediction to decompose complex tasks into a sequence of atomic skills, fundamentally supporting the robust execution of long-horizon tasks.
 We anticipate that iFlyBotVLM will serve as an efficient and scalable foundation model, driving the advancement of embodied AI from single-task capabilities toward generalist intelligent agents.
@@ -48,7 +48,7 @@ iFlyBotVLM demonstrates superior performance across various challenging benchmar
 ![image/png](https://huggingface.co/datasets/iFlyBot/iFlyBotVLM-Repo/resolve/main/images/table-performances.png)
-iFlyBotVLM-8B achieves state-of-the-art (SOTA) or near-SOTA performance on ten spatial comprehension, spatial perception, and temporal task planning benchmarks: Where2Place, Refspatial-bench, ShareRobot-affordance, ShareRobot-trajectory, BLINK(spatial), EmbSpatial, ERQA, CVBench, SAT, EgoPlan2.
 ## 🚀Quick Start

 **🧭Spatial Understanding and Metric**: Provides the model with the capacity to understand spatial relationships and perform relative position estimation among objects in the environment.
 **🎯Interactive Target Grounding**: Supports diverse grounding mechanisms, including 2D/3D object detection in the visual modality, language-based object and spatial referring, and the prediction of critical object affordance regions.
 **🤖Action Abstraction and Control Parameter Generation**: Generates outputs directly relevant to the manipulation domain, providing grasp poses and manipulation trajectories.
+**📋Task Planning**: Leveraging the current scene Understanding, this module performs multi-step prediction to decompose complex tasks into a sequence of atomic skills, fundamentally supporting the robust execution of long-horizon tasks.
 We anticipate that iFlyBotVLM will serve as an efficient and scalable foundation model, driving the advancement of embodied AI from single-task capabilities toward generalist intelligent agents.
 ![image/png](https://huggingface.co/datasets/iFlyBot/iFlyBotVLM-Repo/resolve/main/images/table-performances.png)
+iFlyBotVLM-8B achieves state-of-the-art (SOTA) or near-SOTA performance on ten spatial Understanding, spatial perception, and temporal task planning benchmarks: Where2Place, Refspatial-bench, ShareRobot-affordance, ShareRobot-trajectory, BLINK(spatial), EmbSpatial, ERQA, CVBench, SAT, EgoPlan2.
 ## 🚀Quick Start