Add model card for UniDriveVLA
#1
by nielsr HF Staff - opened
README.md
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: robotics
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving
|
| 7 |
+
|
| 8 |
+
**UniDriveVLA** is a Unified Driving Vision-Language-Action model based on a Mixture-of-Transformers architecture that addresses the dilemma between spatial perception and semantic reasoning in autonomous driving via expert decoupling.
|
| 9 |
+
|
| 10 |
+
- **Paper:** [UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving](https://huggingface.co/papers/2604.02190)
|
| 11 |
+
- **Project Page:** [https://xiaomi-research.github.io/unidrivevla/](https://xiaomi-research.github.io/unidrivevla/)
|
| 12 |
+
- **Repository:** [https://github.com/xiaomi-research/unidrivevla](https://github.com/xiaomi-research/unidrivevla)
|
| 13 |
+
|
| 14 |
+
## Overview
|
| 15 |
+
|
| 16 |
+
UniDriveVLA addresses the perception-reasoning conflict in driving systems by utilizing three specialized experts coordinated through masked joint attention:
|
| 17 |
+
|
| 18 |
+
- **Understanding Expert**: Leverages a pre-trained 2D VLM (Qwen3-VL) for semantic scene comprehension and driving-oriented VQA.
|
| 19 |
+
- **Perception Expert**: Introduces sparse perception that extracts spatial priors from 2D VLM features, supporting 3D detection, online mapping, occupancy, and motion forecasting.
|
| 20 |
+
- **Planning Expert**: Fuses semantic features and spatial perception features to generate safe and precise driving trajectories.
|
| 21 |
+
|
| 22 |
+
The model demonstrates state-of-the-art performance in open-loop evaluation on nuScenes and closed-loop evaluation on Bench2Drive.
|
| 23 |
+
|
| 24 |
+
## Architecture
|
| 25 |
+
|
| 26 |
+

|
| 27 |
+
|
| 28 |
+
## Citation
|
| 29 |
+
|
| 30 |
+
If you find UniDriveVLA useful in your research or applications, please consider citing it using the following BibTeX entry:
|
| 31 |
+
|
| 32 |
+
```bibtex
|
| 33 |
+
@article{li2026unidrivevla,
|
| 34 |
+
title={UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving},
|
| 35 |
+
author={Li, Yongkang and Zhou, Lijun and Yan, Sixu and Liao, Bencheng and Yan, Tianyi and Xiong, Kaixin and Chen, Long and Xie, Hongwei and Wang, Bing and Chen, Guang and Ye, Hangjun and Sun, Haiyang and Liu, Wenyu and Wang, Xinggang},
|
| 36 |
+
journal={arXiv preprint arXiv:2604.02190},
|
| 37 |
+
year={2026}
|
| 38 |
+
}
|
| 39 |
+
```
|