owl10
/

UniDriveVLA_Nusc_Base_Stage3

Model card Files Files and versions

xet

Community

Add model card for UniDriveVLA

by nielsr HF Staff - opened 15 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+40

-3

Files changed (1) hide show

README.md +40 -3

README.md CHANGED Viewed

@@ -1,3 +1,40 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+pipeline_tag: robotics
+tags:
+- autonomous-driving
+- vla
+---
+# UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving
+[**Paper**](https://arxiv.org/abs/2604.02190) | [**Project Page**](https://xiaomi-research.github.io/unidrivevla/) | [**GitHub**](https://github.com/xiaomi-research/unidrivevla)
+UniDriveVLA is a Unified Driving Vision-Language-Action model based on a Mixture-of-Transformers architecture. It addresses the perception-reasoning conflict in autonomous driving by decoupling spatial perception and semantic reasoning through specialized experts.
+## Architecture
+UniDriveVLA comprises three specialized experts coordinated through masked joint attention:
+- **Understanding Expert**: Leverages a pre-trained 2D VLM (Qwen3-VL) for semantic scene comprehension and driving-oriented VQA.
+- **Perception Expert**: Introduces a sparse perception paradigm to extract spatial priors, supporting tasks like 3D detection, online mapping, and motion forecasting.
+- **Planning Expert**: Fuses semantic features and spatial perception features to generate safe and precise driving trajectories.
+The model achieves state-of-the-art performance in open-loop evaluation on nuScenes and closed-loop evaluation on Bench2Drive.
+## Getting Started
+Please refer to the [official GitHub repository](https://github.com/xiaomi-research/unidrivevla) for detailed instructions on installation, data preparation, training, and evaluation.
+## Citation
+If you find UniDriveVLA useful in your research, please consider citing the paper:
+```bibtex
+@article{li2026unidrivevla,
+  title={UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving},
+  author={Li, Yongkang and Zhou, Lijun and Yan, Sixu and Liao, Bencheng and Yan, Tianyi and Xiong, Kaixin and Chen, Long and Xie, Hongwei and Wang, Bing and Chen, Guang and Ye, Hangjun and Sun, Haiyang and Liu, Wenyu and Wang, Xinggang},
+  journal={arXiv preprint arXiv:2604.02190},
+  year={2026}
+}
+```