Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,32 +1,3 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
datasets:
|
| 4 |
-
- agibot-world/AgiBotWorld-Beta
|
| 5 |
-
- IPEC-COMMUNITY/fractal20220817_data_lerobot
|
| 6 |
-
- youliangtan/bridge_dataset
|
| 7 |
-
- IPEC-COMMUNITY/droid_lerobot
|
| 8 |
-
- liuhaotian/LLaVA-Instruct-150K
|
| 9 |
-
- lmms-lab/LLaVA-Video-178K
|
| 10 |
-
- lmms-lab/RefCOCO
|
| 11 |
-
- allenai/pixmo-points
|
| 12 |
-
- IPEC-COMMUNITY/EO-Data1.5M
|
| 13 |
-
- lmms-lab/RoboVQA
|
| 14 |
-
- x-humanoid-robomind/RoboMIND
|
| 15 |
-
language:
|
| 16 |
-
- en
|
| 17 |
-
metrics:
|
| 18 |
-
- accuracy
|
| 19 |
-
- bleu
|
| 20 |
-
base_model:
|
| 21 |
-
- Qwen/Qwen2.5-VL-3B-Instruct
|
| 22 |
-
tags:
|
| 23 |
-
- large embedding model
|
| 24 |
-
- Robot Control
|
| 25 |
-
- Generalist robot policies
|
| 26 |
-
- VLA
|
| 27 |
-
- Embodied AI
|
| 28 |
-
- Unified Model
|
| 29 |
-
---
|
| 30 |
<p align="center">
|
| 31 |
<img src="assets/logo.png" width="100%">
|
| 32 |
</p>
|
|
@@ -104,6 +75,9 @@ Input Type:
|
|
| 104 |
- State: Robot Proprioception
|
| 105 |
- Language Instruction: Text, Pointing, Bounding Box, etc.
|
| 106 |
- Input Format:
|
|
|
|
|
|
|
|
|
|
| 107 |
|
| 108 |
### Output:
|
| 109 |
|
|
@@ -140,7 +114,7 @@ text = output.text
|
|
| 140 |
actions = output.action.numpy()
|
| 141 |
```
|
| 142 |
|
| 143 |
-
## Benchmark
|
| 144 |
|
| 145 |
Mastering Diverse Manipulations on Multiple Embodiments
|
| 146 |
|
|
@@ -171,7 +145,7 @@ Robot Control Benchmark Results
|
|
| 171 |
| Magma | — | 0.488 | 0.488 | 0.448 |
|
| 172 |
| **EO-1** | **0.982** | **0.765** | **0.765** | **0.727** |
|
| 173 |
|
| 174 |
-
## 📚 Citation
|
| 175 |
|
| 176 |
If you find this project useful, please consider citing:
|
| 177 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
<p align="center">
|
| 2 |
<img src="assets/logo.png" width="100%">
|
| 3 |
</p>
|
|
|
|
| 75 |
- State: Robot Proprioception
|
| 76 |
- Language Instruction: Text, Pointing, Bounding Box, etc.
|
| 77 |
- Input Format:
|
| 78 |
+
- Vision: Variable number of 224x224 uint8 image frames or long video sequence
|
| 79 |
+
- State: Floating Point
|
| 80 |
+
- Language Instruction: String
|
| 81 |
|
| 82 |
### Output:
|
| 83 |
|
|
|
|
| 114 |
actions = output.action.numpy()
|
| 115 |
```
|
| 116 |
|
| 117 |
+
## 2. Benchmark
|
| 118 |
|
| 119 |
Mastering Diverse Manipulations on Multiple Embodiments
|
| 120 |
|
|
|
|
| 145 |
| Magma | — | 0.488 | 0.488 | 0.448 |
|
| 146 |
| **EO-1** | **0.982** | **0.765** | **0.765** | **0.727** |
|
| 147 |
|
| 148 |
+
## 📚 3. Citation
|
| 149 |
|
| 150 |
If you find this project useful, please consider citing:
|
| 151 |
|