Add model card for UniDriveVLA
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,3 +1,40 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: robotics
|
| 4 |
+
tags:
|
| 5 |
+
- autonomous-driving
|
| 6 |
+
- vla
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving
|
| 10 |
+
|
| 11 |
+
[**Paper**](https://arxiv.org/abs/2604.02190) | [**Project Page**](https://xiaomi-research.github.io/unidrivevla/) | [**GitHub**](https://github.com/xiaomi-research/unidrivevla)
|
| 12 |
+
|
| 13 |
+
UniDriveVLA is a Unified Driving Vision-Language-Action model based on a Mixture-of-Transformers architecture. It addresses the perception-reasoning conflict in autonomous driving by decoupling spatial perception and semantic reasoning through specialized experts.
|
| 14 |
+
|
| 15 |
+
## Architecture
|
| 16 |
+
|
| 17 |
+
UniDriveVLA comprises three specialized experts coordinated through masked joint attention:
|
| 18 |
+
|
| 19 |
+
- **Understanding Expert**: Leverages a pre-trained 2D VLM (Qwen3-VL) for semantic scene comprehension and driving-oriented VQA.
|
| 20 |
+
- **Perception Expert**: Introduces a sparse perception paradigm to extract spatial priors, supporting tasks like 3D detection, online mapping, and motion forecasting.
|
| 21 |
+
- **Planning Expert**: Fuses semantic features and spatial perception features to generate safe and precise driving trajectories.
|
| 22 |
+
|
| 23 |
+
The model achieves state-of-the-art performance in open-loop evaluation on nuScenes and closed-loop evaluation on Bench2Drive.
|
| 24 |
+
|
| 25 |
+
## Getting Started
|
| 26 |
+
|
| 27 |
+
Please refer to the [official GitHub repository](https://github.com/xiaomi-research/unidrivevla) for detailed instructions on installation, data preparation, training, and evaluation.
|
| 28 |
+
|
| 29 |
+
## Citation
|
| 30 |
+
|
| 31 |
+
If you find UniDriveVLA useful in your research, please consider citing the paper:
|
| 32 |
+
|
| 33 |
+
```bibtex
|
| 34 |
+
@article{li2026unidrivevla,
|
| 35 |
+
title={UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving},
|
| 36 |
+
author={Li, Yongkang and Zhou, Lijun and Yan, Sixu and Liao, Bencheng and Yan, Tianyi and Xiong, Kaixin and Chen, Long and Xie, Hongwei and Wang, Bing and Chen, Guang and Ye, Hangjun and Sun, Haiyang and Liu, Wenyu and Wang, Xinggang},
|
| 37 |
+
journal={arXiv preprint arXiv:2604.02190},
|
| 38 |
+
year={2026}
|
| 39 |
+
}
|
| 40 |
+
```
|