File size: 2,586 Bytes
47e24e0 372b0e1 eccb199 372b0e1 5399815 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | ---
tags:
- robotics
---
# UnifoLM-VLA-0: A Vision-Language-Action (VLA) Framework under UnifoLM Family
<p style="font-size: 1.2em;">
<a href="https://unigen-x.github.io/unifolm-vla.github.io"><strong>Project Page</strong></a> |
<a href="https://huggingface.co/unitreerobotics/models"><strong>Models</strong></a> |
<a href="https://huggingface.co/unitreerobotics/datasets"><strong>Datasets</strong></a>
</p>
<div align="center">
<p align="right">
<span> 🌎English </span> | <a href="https://github.com/unitreerobotics/unifolm-vla/blob/main/README_cn.md"> 🇨🇳中文 </a>
</p>
</div>
**UnifoLM-VLA-0** is a Vision–Language–Action (VLA) large model in the UnifoLM series, designed for general-purpose humanoid robot manipulation. It goes beyond the limitations of conventional Vision–Language Models (VLMs) in physical interaction. Through continued pre-training on robot manipulation data, the model evolves from "vision-language understanding" to an "embodied brain" equipped with physical common sense.
<table width="100%">
<tr>
<th width="50%">Spatial Semantic Enhancement</th>
<th width="50%">Manipulation Generalization</th>
</tr>
<tr>
<td valign="top">
To address the requirements for instruction comprehension and spatial understanding in manipulation tasks, the model deeply integrates textual instructions with 2D/3D spatial details through continued pre-training, <strong>substantially strengthening its spatial perception and geometric understanding capabilities.</strong>
</td>
<td valign="top">
By leveraging full dynamics prediction data, the model achieves strong generalization across diverse manipulation tasks. In real-robot validation, <strong>it can complete 12 categories of complex manipulation tasks with high quality using only a single policy.</strong>
</td>
</tr>
</table>
<div align="center">
<img
src="https://raw.githubusercontent.com/unitreerobotics/unifolm-vla/main/assets/gif/UnifoLM-VLA-0.gif"
style="width:100%; max-width:1000px; height:auto;"
alt="UnifoLM-VLA Demo"
/>
</div>
## 📝 Citation
```
@misc{unifolm-vla-0,
author = {Unitree},
title = {UnifoLM-VLA-0: A Vision-Language-Action (VLA) Framework under UnifoLM Family},
year = {2026},
}
```
## License
The model is released under the CC BY-NC-SA 4.0 license as found in the [LICENSE](https://huggingface.co/unitreerobotics/UnifoLM-VLA-Base/blob/main/LICENSE). You are responsible for ensuring that your use of Unitree AI Models complies with all applicable laws.
|