Update README.md
Browse files
README.md
CHANGED
|
@@ -21,6 +21,7 @@ VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model t
|
|
| 21 |
- 💬 Project page: [https://vla-adapter.github.io/](https://vla-adapter.github.io/)
|
| 22 |
- 🖥️ Dataset: [https://huggingface.co/datasets/openvla/modified_libero_rlds/tree/main](https://huggingface.co/datasets/openvla/modified_libero_rlds/tree/main)
|
| 23 |
- 🤗 HuggingFace: [https://huggingface.co/VLA-Adapter](https://huggingface.co/VLA-Adapter)
|
|
|
|
| 24 |
|
| 25 |
## Model Details
|
| 26 |
We have developed and released the VLA-Adapter family of VLA models, a series of fine-tuned generative
|
|
@@ -43,196 +44,134 @@ This resulted in a high-performance VLA model on a tiny-scale backbone.
|
|
| 43 |
### Success Rate Comparison
|
| 44 |
<table>
|
| 45 |
<tr>
|
| 46 |
-
<td><strong>
|
| 47 |
-
</td>
|
| 48 |
-
<td><strong>
|
| 49 |
-
</td>
|
| 50 |
-
<td><strong>Scale</strong>
|
| 51 |
-
</td>
|
| 52 |
-
<td><strong>LIBERO-Spatial</strong>
|
| 53 |
-
</td>
|
| 54 |
-
<td><strong>LIBERO-Object</strong>
|
| 55 |
-
</td>
|
| 56 |
-
<td><strong>LIBERO-Goal</strong>
|
| 57 |
-
</td>
|
| 58 |
-
<td><strong>LIBERO-Long</strong>
|
| 59 |
-
</td>
|
| 60 |
-
<td><strong>Avg.</strong>
|
| 61 |
-
</td>
|
| 62 |
-
</tr>
|
| 63 |
-
<tr>
|
| 64 |
-
<td rowspan="11">Large-scale</td>
|
| 65 |
-
<td>FlowVLA (Zhong et al., 2025)</td>
|
| 66 |
-
<td>8.5B</td><td>93.2</td><td>95.0</td><td>91.6</td><td>72.6</td><td>88.1</td>
|
| 67 |
</tr>
|
| 68 |
|
| 69 |
-
<tr>
|
| 70 |
-
<td>
|
| 71 |
-
<td>8.5B</td><td>95.4</td><td> <i><u>98.8*</u></i></td><td> 93.6 </td><td>94.0 </td><td>95.5</td>
|
| 72 |
-
</tr>
|
| 73 |
|
| 74 |
-
<tr>
|
| 75 |
-
<td>
|
| 76 |
-
<td>7B</td><td>84.7</td><td>88.4</td><td>79.2</td><td>53.7</td><td>76.5</td>
|
| 77 |
-
</tr>
|
| 78 |
|
| 79 |
-
<tr>
|
| 80 |
-
<td>
|
| 81 |
-
<td>7B</td><td><i><u>97.6*</u></i></td><td>98.4</td><td><b>97.9</b></td><td><i><u>94.5*</u></i></td><td><i><u>97.1*</u></i></td>
|
| 82 |
-
</tr>
|
| 83 |
|
| 84 |
-
<tr>
|
| 85 |
-
<td>
|
| 86 |
-
<td>7B</td><td>96.5</td><td> 96.8</td><td> 95.6 </td><td>92.0 </td><td>95.2</td>
|
| 87 |
-
</tr>
|
| 88 |
|
| 89 |
-
<tr>
|
| 90 |
-
<td>
|
| 91 |
-
<td>7B</td><td>87.5 </td><td>91.6 </td><td>87.6</td><td> 69.0</td><td> 81.1</td>
|
| 92 |
-
</tr>
|
| 93 |
|
| 94 |
-
<tr>
|
| 95 |
-
<td>
|
| 96 |
-
<td>7B</td><td>87.6</td><td> 96.2</td><td> 83.4</td><td> 60.0</td><td> 81.8</td>
|
| 97 |
-
</tr>
|
| 98 |
|
| 99 |
-
<tr>
|
| 100 |
-
<td>
|
| 101 |
-
<td>7B</td><td>84.6</td><td> 85.2</td><td> 75.1</td><td> 54.1</td><td> 74.8</td>
|
| 102 |
-
</tr>
|
| 103 |
|
| 104 |
-
<tr>
|
| 105 |
-
<td>
|
| 106 |
-
<td>7B</td><td>87.0</td><td> 95.4 </td><td>87.6</td><td> 77.2 </td><td>86.6</td>
|
| 107 |
-
</tr>
|
| 108 |
|
| 109 |
-
<tr>
|
| 110 |
-
<td>
|
| 111 |
-
<td>7B</td><td>88.3 </td><td>91.4</td><td> 87.1</td><td> 70.9</td><td> 84.4</td>
|
| 112 |
-
</tr>
|
| 113 |
|
| 114 |
-
<tr>
|
| 115 |
-
<td>
|
| 116 |
-
<td>7B</td><td>95.5 </td><td>96.7</td><td> 94.9</td><td> 91.7</td><td> 94.7</td>
|
| 117 |
-
</tr>
|
| 118 |
|
| 119 |
-
<tr>
|
| 120 |
-
|
| 121 |
-
<td>4D-VLA (Zhang et al., 2025)</td>
|
| 122 |
-
<td>4B</td><td>88.9</td><td> 95.2</td><td> 90.9</td><td> 79.1 </td><td>88.6</td>
|
| 123 |
-
</tr>
|
| 124 |
|
| 125 |
-
<tr>
|
| 126 |
-
<td>
|
| 127 |
-
<td>4B</td><td>88.2</td><td> 89.9</td><td> 78.6</td><td> 55.5 </td><td>78.1</td>
|
| 128 |
-
</tr>
|
| 129 |
|
| 130 |
-
<tr>
|
| 131 |
-
<td
|
| 132 |
-
<td>3B</td><td>96.8</td><td> <i><u>98.8*</u></i> </td><td>95.8</td><td> 85.2</td><td> 94.2</td>
|
| 133 |
-
</tr>
|
| 134 |
|
| 135 |
-
<tr>
|
| 136 |
-
<td
|
| 137 |
-
<td>3B</td><td>96.4</td><td> 96.8 </td><td>88.6</td><td> 60.2</td><td> 85.5</td>
|
| 138 |
-
</tr>
|
| 139 |
|
| 140 |
-
<tr>
|
| 141 |
-
<td>
|
| 142 |
-
<td>3B</td><td>92.2 </td><td>95.4 </td><td>89.4</td><td> 74.6 </td><td>87.9</td>
|
| 143 |
-
</tr>
|
| 144 |
|
| 145 |
-
<tr>
|
| 146 |
-
<td>
|
| 147 |
-
<td>2.2B</td><td>93.0</td><td> 94.0 </td><td>91.0</td><td> 77.0 </td><td>88.8</td>
|
| 148 |
-
</tr>
|
| 149 |
|
| 150 |
-
<tr>
|
| 151 |
-
<td>
|
| 152 |
-
<td>2B</td><td>94.4</td><td> 97.6 </td><td>93.0 </td><td>90.6</td><td> 93.9</td>
|
| 153 |
-
</tr>
|
| 154 |
|
| 155 |
-
<tr>
|
| 156 |
-
<td>
|
| 157 |
-
<td>1.8B</td><td>-</td><td> 94.1 </td><td>91.2 </td><td>82.0</td><td> 89.1</td>
|
| 158 |
-
</tr>
|
| 159 |
|
| 160 |
-
<tr>
|
| 161 |
-
|
| 162 |
-
<td>Seer (Tian et al., 2025)</td>
|
| 163 |
-
<td>0.57B</td><td>-</td><td> - </td><td>- </td><td>78.7</td><td> 78.7</td>
|
| 164 |
-
</tr>
|
| 165 |
|
| 166 |
-
<tr>
|
| 167 |
-
<td>
|
| 168 |
-
<td>0.5B</td><td>87.0 </td><td>96.5</td><td> 92.7 </td><td>66.0</td><td> 85.6</td>
|
| 169 |
-
</tr>
|
| 170 |
|
| 171 |
-
<tr>
|
| 172 |
-
<td>
|
| 173 |
-
<td>-</td><td>78.3</td><td> 92.5</td><td> 68.3 </td><td>50.5 </td><td>72.4</td>
|
| 174 |
-
</tr>
|
| 175 |
|
| 176 |
-
<tr>
|
| 177 |
-
<td><b>
|
| 178 |
-
<td><b>0.5B</b></td><td><b>97.8</b></td><td> <b>99.2</b> </td><td><i><u>97.2*</u></i></td><td> <b>95.0</b></td><td><b>97.3</b></td>
|
| 179 |
-
</tr>
|
| 180 |
|
| 181 |
</table>
|
| 182 |
|
| 183 |
-
### Effectiveness Comparison
|
| 184 |
|
| 185 |
<table>
|
| 186 |
<tr>
|
| 187 |
-
<td></td>
|
| 188 |
-
<td><strong>
|
| 189 |
-
<td><strong>
|
| 190 |
-
<td></td>
|
| 191 |
</tr>
|
| 192 |
|
| 193 |
-
<tr>
|
| 194 |
-
<td>Backbone</td>
|
| 195 |
-
<td>7B</td>
|
| 196 |
-
<td><strong>0.5B</strong></td>
|
| 197 |
-
<td>1/14×</td>
|
| 198 |
-
</tr>
|
| 199 |
|
| 200 |
-
<tr>
|
| 201 |
-
<td>Fine-Tuning Cost</td>
|
| 202 |
-
<td>304GPU·h</td>
|
| 203 |
-
<td><strong>8GPU·h</strong></td>
|
| 204 |
-
<td>1/38×</td>
|
| 205 |
-
</tr>
|
| 206 |
-
|
| 207 |
-
<tr>
|
| 208 |
-
<td>Training VRAM (8 batch)</td>
|
| 209 |
-
<td>62GB</td>
|
| 210 |
-
<td><strong>24.7GB</strong></td>
|
| 211 |
-
<td>0.4×</td>
|
| 212 |
-
</tr>
|
| 213 |
|
| 214 |
-
<tr>
|
| 215 |
-
<td>Throughput (8 chunk)</td>
|
| 216 |
-
<td>71.4Hz</td>
|
| 217 |
-
<td><strong>219.2Hz</strong></td>
|
| 218 |
-
<td>3×</td>
|
| 219 |
-
</tr>
|
| 220 |
|
| 221 |
-
<tr>
|
| 222 |
-
|
| 223 |
-
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 227 |
</table>
|
| 228 |
|
| 229 |
## Citation instructions
|
| 230 |
|
| 231 |
```BibTeX
|
| 232 |
-
@article{
|
| 233 |
-
author
|
| 234 |
-
title
|
| 235 |
-
journal
|
| 236 |
-
year
|
| 237 |
}
|
| 238 |
```
|
|
|
|
| 21 |
- 💬 Project page: [https://vla-adapter.github.io/](https://vla-adapter.github.io/)
|
| 22 |
- 🖥️ Dataset: [https://huggingface.co/datasets/openvla/modified_libero_rlds/tree/main](https://huggingface.co/datasets/openvla/modified_libero_rlds/tree/main)
|
| 23 |
- 🤗 HuggingFace: [https://huggingface.co/VLA-Adapter](https://huggingface.co/VLA-Adapter)
|
| 24 |
+
- Github: [https://github.com/OpenHelix-Team/VLA-Adapter](https://github.com/OpenHelix-Team/VLA-Adapter)
|
| 25 |
|
| 26 |
## Model Details
|
| 27 |
We have developed and released the VLA-Adapter family of VLA models, a series of fine-tuned generative
|
|
|
|
| 44 |
### Success Rate Comparison
|
| 45 |
<table>
|
| 46 |
<tr>
|
| 47 |
+
<td><strong>LIBERO</strong></td> <td><strong>Methods</strong></td>
|
| 48 |
+
<td><strong>Scale</strong></td> <td><strong>Spatial</strong></td>
|
| 49 |
+
<td><strong>Object</strong></td> <td><strong>Goal</strong></td>
|
| 50 |
+
<td><strong>Long</strong></td> <td><strong>Avg.</strong></td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
</tr>
|
| 52 |
|
| 53 |
+
<tr><td rowspan="10">Large-scale</td><td>FlowVLA (Zhong et al., 2025)</td>
|
| 54 |
+
<td>8.5B</td><td>93.2</td><td>95.0</td><td>91.6</td><td>72.6</td><td>88.1</td></tr>
|
|
|
|
|
|
|
| 55 |
|
| 56 |
+
<tr><td>UnifiedVLA (Wang et al., 2025)</td>
|
| 57 |
+
<td>8.5B</td><td>95.4</td><td><i><u>98.8*</u></i></td><td> 93.6 </td><td>94.0 </td><td>95.5</td></tr>
|
|
|
|
|
|
|
| 58 |
|
| 59 |
+
<tr><td>OpenVLA (Kim et al., 2024)</td>
|
| 60 |
+
<td>7B</td><td>84.7</td><td>88.4</td><td>79.2</td><td>53.7</td><td>76.5</td></tr>
|
|
|
|
|
|
|
| 61 |
|
| 62 |
+
<tr><td>OpenVLA-OFT (Kim et al., 2025)</td>
|
| 63 |
+
<td>7B</td><td><i><u>97.6*</u></i></td><td>98.4</td><td><b>97.9</b></td><td><i><u>94.5*</u></i></td><td><i><u>97.1*</u></i></td></tr>
|
|
|
|
|
|
|
| 64 |
|
| 65 |
+
<tr><td>UniVLA (Bu et al., 2025)</td>
|
| 66 |
+
<td>7B</td><td>96.5</td><td> 96.8</td><td> 95.6 </td><td>92.0 </td><td>95.2</td></tr>
|
|
|
|
|
|
|
| 67 |
|
| 68 |
+
<tr><td>CoT-VLA (Zhao et al., 2025)</td>
|
| 69 |
+
<td>7B</td><td>87.5 </td><td>91.6 </td><td>87.6</td><td> 69.0</td><td> 81.1</td></tr>
|
|
|
|
|
|
|
| 70 |
|
| 71 |
+
<tr><td>WorldVLA (Cen et al., 2025)</td>
|
| 72 |
+
<td>7B</td><td>87.6</td><td> 96.2</td><td> 83.4</td><td> 60.0</td><td> 81.8</td></tr>
|
|
|
|
|
|
|
| 73 |
|
| 74 |
+
<tr><td>TraceVLA (Zheng et al., 2025)</td>
|
| 75 |
+
<td>7B</td><td>84.6</td><td> 85.2</td><td> 75.1</td><td> 54.1</td><td> 74.8</td></tr>
|
|
|
|
|
|
|
| 76 |
|
| 77 |
+
<tr><td>MolmoAct (Lee et al., 2025)</td>
|
| 78 |
+
<td>7B</td><td>87.0</td><td> 95.4 </td><td>87.6</td><td> 77.2 </td><td>86.6</td></tr>
|
|
|
|
|
|
|
| 79 |
|
| 80 |
+
<tr><td>ThinkAct (Huang et al., 2025)</td>
|
| 81 |
+
<td>7B</td><td>88.3 </td><td>91.4</td><td> 87.1</td><td> 70.9</td><td> 84.4</td></tr>
|
|
|
|
|
|
|
| 82 |
|
| 83 |
+
<tr><td rowspan="7">Small-scale</td><td>4D-VLA (Zhang et al., 2025)</td>
|
| 84 |
+
<td>4B</td><td>88.9</td><td> 95.2</td><td> 90.9</td><td> 79.1 </td><td>88.6</td></tr>
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
+
<tr><td>SpatialVLA (Qu et al., 2025)</td>
|
| 87 |
+
<td>4B</td><td>88.2</td><td> 89.9</td><td> 78.6</td><td> 55.5 </td><td>78.1</td></tr>
|
|
|
|
|
|
|
| 88 |
|
| 89 |
+
<tr><td>π0 (Black et al., 2024)</td>
|
| 90 |
+
<td>3B</td><td>96.8</td><td><i><u>98.8*</u></i></td><td>95.8</td><td> 85.2</td><td> 94.2</td></tr>
|
|
|
|
|
|
|
| 91 |
|
| 92 |
+
<tr><td>π0-FAST (Pertsch et al., 2025)</td>
|
| 93 |
+
<td>3B</td><td>96.4</td><td> 96.8 </td><td>88.6</td><td> 60.2</td><td> 85.5</td></tr>
|
|
|
|
|
|
|
| 94 |
|
| 95 |
+
<tr><td>NORA (Hung et al., 2025)</td>
|
| 96 |
+
<td>3B</td><td>92.2 </td><td>95.4 </td><td>89.4</td><td> 74.6 </td><td>87.9</td></tr>
|
|
|
|
|
|
|
| 97 |
|
| 98 |
+
<tr><td>SmolVLA (Shukor et al., 2025)</td>
|
| 99 |
+
<td>2.2B</td><td>93.0</td><td> 94.0 </td><td>91.0</td><td> 77.0 </td><td>88.8</td></tr>
|
|
|
|
|
|
|
| 100 |
|
| 101 |
+
<tr><td>GR00T N1 (NVIDIA et al., 2025)</td>
|
| 102 |
+
<td>2B</td><td>94.4</td><td> 97.6 </td><td>93.0 </td><td>90.6</td><td> 93.9</td></tr>
|
|
|
|
|
|
|
| 103 |
|
| 104 |
+
<tr><td rowspan="5">Tiny-scale</td><td>Seer (Tian et al., 2025)</td>
|
| 105 |
+
<td>0.57B</td><td>-</td><td> - </td><td>- </td><td>78.7</td><td> 78.7</td></tr>
|
|
|
|
|
|
|
| 106 |
|
| 107 |
+
<tr><td>VLA-OS (Gao et al., 2025)</td>
|
| 108 |
+
<td>0.5B</td><td>87.0 </td><td>96.5</td><td> 92.7 </td><td>66.0</td><td> 85.6</td></tr>
|
|
|
|
|
|
|
|
|
|
| 109 |
|
| 110 |
+
<tr><td>Diffusion Policy (Chi et al., 2023)</td>
|
| 111 |
+
<td>-</td><td>78.3</td><td> 92.5</td><td> 68.3 </td><td>50.5 </td><td>72.4</td></tr>
|
|
|
|
|
|
|
| 112 |
|
| 113 |
+
<tr><td><b>VLA-Adapter (Ours)</b></td>
|
| 114 |
+
<td><b>0.5B</b></td><td><b>97.8</b></td><td><b>99.2</b></td><td><i><u>97.2*</u></i></td><td> <b>95.0 </b></td><td><b>97.3</b></td></tr>
|
|
|
|
|
|
|
| 115 |
|
| 116 |
+
<tr><td><b>VLA-Adapter-Pro (Ours)</b></td>
|
| 117 |
+
<td><b>0.5B</b></td><td><b><i>99.6</i></b></td><td><b><i>99.6</i></b> </td><td><b><i>98.2</i></b></td><td><b><i>96.4</i></b></td><td><b><i>98.5</i></b></td></tr>
|
|
|
|
|
|
|
| 118 |
|
| 119 |
</table>
|
| 120 |
|
|
|
|
| 121 |
|
| 122 |
<table>
|
| 123 |
<tr>
|
| 124 |
+
<td><strong>CALVIN</strong></td> <td><strong>Methods</strong></td>
|
| 125 |
+
<td><strong>Scale</strong></td> <td><strong>1</strong></td>
|
| 126 |
+
<td><strong>2</strong></td> <td><strong>3</strong></td>
|
| 127 |
+
<td><strong>4</strong></td> <td><strong>5</strong></td> <td><strong>Avg. len</strong></td>
|
| 128 |
</tr>
|
| 129 |
|
| 130 |
+
<tr><td rowspan="8">Large-scale</td><td>UniVLA (Bu et al., 2025) </td><td>7B </td><td>95.5 </td><td>85.8 </td><td>75.4</td><td> 66.9 </td><td>56.5 </td><td>3.80</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 131 |
|
| 132 |
+
<tr><td>OpenVLA (Kim et al., 2024) </td><td> 7B</td><td> 91.3</td><td> 77.8 </td><td>62.0 </td><td>52.1 </td><td>43.5</td><td> 3.27</td></tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
|
| 134 |
+
<tr><td>OpenVLA-OFT (Kim et al., 2025)</td><td> 7B</td><td> 96.3</td><td> 89.1 </td><td>82.4</td><td> 75.8</td><td> 66.5</td><td> 4.10</td></tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
|
| 136 |
+
<tr><td>VLAS (Zhao et al., 2025b) </td><td> 7B</td><td> 87.2 </td><td>64.2</td><td> 40.9 </td><td>28.1</td><td> 19.6 </td><td>2.40</td></tr>
|
| 137 |
+
|
| 138 |
+
<tr><td>LCB (Shentu et al., 2024) </td><td> 7B</td><td> 73.6 </td><td>50.2 </td><td>28.5 </td><td>16.0 </td><td>9.9 </td><td>1.78</td></tr>
|
| 139 |
+
|
| 140 |
+
<tr><td>RoboDual (Bu et al., 2024a) </td><td> 7B</td><td> 94.4</td><td> 82.7</td><td> 72.1</td><td> 62.4 </td><td>54.4</td><td> 3.66</td></tr>
|
| 141 |
+
|
| 142 |
+
<tr><td>OpenHelix (Cui et al., 2025) </td><td> 7B</td><td> <i><u>97.1*</u></i> </td><td>91.4 </td><td>82.8</td><td> 72.6</td><td> 64.1 </td><td>4.08</td></tr>
|
| 143 |
+
|
| 144 |
+
<tr><td>ReconVLA (Song et al., 2025c) </td><td> 7B</td><td> 95.6 </td><td>87.6 </td><td>76.9</td><td> 69.3</td><td> 64.1 </td><td>3.95</td></tr>
|
| 145 |
+
|
| 146 |
+
<tr><td rowspan="4">Small-scale</td><td>DeeR (Yue et al., 2024) </td><td> 3B</td><td> 86.2</td><td> 70.1 </td><td>51.8</td><td> 41.5</td><td> 30.4 </td><td>2.82</td></tr>
|
| 147 |
+
|
| 148 |
+
<tr><td>RoboFlamingo (Li et al., 2024b) </td><td> 3B</td><td> 82.4 </td><td>61.9</td><td> 46.6 </td><td>33.1</td><td> 23.5</td><td> 2.48</td></tr>
|
| 149 |
+
|
| 150 |
+
<tr><td>VPP (Hu et al., 2025)</td><td> 1.5B</td><td> 95.7</td><td> 91.2</td><td> <i><u>86.3*</u></i></td><td> <i><u>81.0*</u></i></td><td> <i><u>75.0*</u></i></td><td> <i><u>4.33*</u></i></td></tr>
|
| 151 |
+
|
| 152 |
+
<tr><td>SuSIE (Black et al., 2024)</td><td>1.3B</td><td> 87.0</td><td> 69.0</td><td> 49.0 </td><td>38.0</td><td> 26.0</td><td> 2.69</td></tr>
|
| 153 |
+
|
| 154 |
+
<tr><td rowspan="5">Tiny-scale</td><td>Seer-Large (Tian et al., 2025)</td><td>0.57B</td><td> 96.3 </td><td><i><u>91.6*</u></i></td><td> 86.1 </td><td>80.3 </td><td>74.0</td><td> 4.28</td></tr>
|
| 155 |
+
|
| 156 |
+
<tr><td>MoDE (Reuss et al., 2025) </td><td> 0.44B </td><td>96.2</td><td> 88.9</td><td> 81.1</td><td> 71.8 </td><td>63.5 </td><td>4.01</td></tr>
|
| 157 |
+
|
| 158 |
+
<tr><td>Seer (Tian et al., 2025) </td><td> 0.32B</td><td> 94.4 </td><td>87.2 </td><td>79.9 </td><td>72.2 </td><td>64.3</td><td> 3.98</td></tr>
|
| 159 |
+
|
| 160 |
+
<tr><td><b>VLA-Adapter (Ours)</b></td>
|
| 161 |
+
<td><b>0.5B</b></td><td><b><i>99.1</i></b> </td><td><b>94.6</b> </td><td><b>88.8</b></td><td> <b>82.8</b> </td><td><b>76.5</b> </td><td><b>4.42</b></td></tr>
|
| 162 |
+
|
| 163 |
+
<tr><td><b>VLA-Adapter-Pro (Ours)</b></td>
|
| 164 |
+
<td><b>0.5B</b></td><td><b>98.5</b></td><td><b><i>95.0</i></b> </td><td><b><i>90.5</i></b></td><td><b><i>85.3</i></b></td><td><b><i>80.0</i></b></td><td><b><i>4.50</i></b></td></tr>
|
| 165 |
+
|
| 166 |
</table>
|
| 167 |
|
| 168 |
## Citation instructions
|
| 169 |
|
| 170 |
```BibTeX
|
| 171 |
+
@article{wang2025vlaadapter,
|
| 172 |
+
author={Wang, Yihao and Ding, Pengxiang and Li, Lingxiao and Cui, Can and Ge, Zirui and Tong, Xinyang and Song, Wenxuan and Zhao, Han and Zhao, Wei and Hou, Pengxu and Huang, Siteng and Tang, Yifan and Wang, Wenhui and Zhang, Ru and Liu, Jianyi and Wang, Donglin},
|
| 173 |
+
title={VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model},
|
| 174 |
+
journal={arXiv preprint arXiv:2509.09372},
|
| 175 |
+
year={2025}
|
| 176 |
}
|
| 177 |
```
|