File size: 6,279 Bytes
b016cd2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
<h1 align='center'>WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving</h1>
<div align='center'>
<a href='https://github.com/YoucanBaby' target='_blank'>Yifang Xu</a><sup>1*</sup> 
<a href='https://cuijh26.github.io/' target='_blank'>Jiahao Cui</a><sup>1*</sup> 
<a href='https://github.com/fudan-generative-vision/WAM-Flow' target='_blank'>Feipeng Cai</a><sup>2*</sup> 
<a href='https://github.com/SSSSSSuger' target='_blank'>Zhihao Zhu</a><sup>1</sup> 
<a href='https://github.com/NinoNeumann' target='_blank'>Hanlin Shang</a><sup>1</sup> 
<a href='https://github.com/isan089' target='_blank'>Shan Luan</a><sup>1</sup> 
</div>
<div align='center'>
<a href='https://github.com/xumingw' target='_blank'>Mingwang Xu</a><sup>1</sup> 
<a href='https://github.com/fudan-generative-vision/WAM-Flow' target='_blank'>Neng Zhang</a><sup>2</sup> 
<a href='https://github.com/fudan-generative-vision/WAM-Flow' target='_blank'>Yaoyi Li</a><sup>2</sup> 
<a href='https://github.com/fudan-generative-vision/WAM-Flowβ target='_blank'>Jia Cai</a><sup>2</sup> 
<a href='https://sites.google.com/site/zhusiyucs/home' target='_blank'>Siyu Zhu</a><sup>1</sup> 
</div>
<div align='center'>
<sup>1</sup>Fudan University  <sup>2</sup>Yinwang Intelligent Technology Co., Ltd 
</div>
<br>
<div align='center'>
<a href='https://github.com/fudan-generative-vision/WAM-Flow'><img src='https://img.shields.io/github/stars/fudan-generative-vision/WAM-Flow?style=social'></a>
<a href='https://arxiv.org/abs/2512.06112'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
<a href='https://huggingface.co/fudan-generative-ai/WAM-Flow'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>
</div>
<br>
## π° News
- **`2026/02/01`**: πππ Release the pretrained models on [Huggingface](https://huggingface.co/fudan-generative-ai/WAM-Flow).
- **`2025/12/06`**: πππ Paper submitted on [Arxiv](https://arxiv.org/pdf/2512.06112).
## π
οΈ Roadmap
| Status | Milestone | ETA |
| :----: | :----------------------------------------------------------------------------------------------------: | :--------: |
| β
| **[Release the SFT and inference code](https://github.com/fudan-generative-vision/WAM-Flow)** | 2025.12.19 |
| β
| **[Pretrained models on Huggingface](https://huggingface.co/fudan-generative-ai/WAM-Flow)** | 2026.02.01 |
| π | **[Release the evaluation code](https://huggingface.co/fudan-generative-ai/WAM-Flow)** | TBD |
| π | **[Release the RL code](https://github.com/fudan-generative-vision/WAM-Flow)** | TBD |
| π | **[Release the pre-processed training data](#training)** | TBD |
## πΈ Showcase

## π Qualitative Results on NAVSIM
### NAVSIM-v1 benchmark results
<div style="text-align: center;">
<img src="assets/navsim-v1.png" alt="navsim-v1" width="70%" />
</div>
### NAVSIM-v2 benchmark results
<div style="text-align: center;">
<img src="assets/navsim-v2.png" alt="navsim-v2" width="70%" />
</div>
## π§οΈ Framework

Our method takes as input a front-view image, a natural-language navigation command with a system prompt, and the ego-vehicle states, and outputs an 8-waypoint future trajectory spanning 4 seconds through parallel denoising. The model is first trained via supervised fine-tuning to learn accurate trajectory prediction. We then apply simulatorguided GRPO to further optimize closed-loop behavior. The GRPO reward function integrates safety constraints (collision avoidance, drivable-area compliance) with performance objectives (ego-progress, time-to-collision, comfort).
## Quick Start
### Installation
Clone the repo:
```sh
git clone https://github.com/fudan-generative-vision/WAM-Flow.git
cd WAM-Flow
```
Install dependencies:
```sh
conda create --name wam-flow python=3.10
conda activate wam-flow
pip install -r requirements.txt
```
### Model Download
Download models using huggingface-cli:
```sh
pip install "huggingface_hub[cli]"
huggingface-cli download fudan-generative-ai/WAM-Flow --local-dir ./pretrained_model/wam-flow
huggingface-cli download LucasJinWang/FUDOKI --local-dir ./pretrained_model/fudoki
```
### Inference
```sh
sh script/infer.sh
```
### Training
```bash
sh script/sft_debug.sh
```
## π Citation
If you find our work useful for your research, please consider citing the paper:
```
@article{xu2025wam,
title={WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving},
author={Xu, Yifang and Cui, Jiahao and Cai, Feipeng and Zhu, Zhihao and Shang, Hanlin and Luan, Shan and Xu, Mingwang and Zhang, Neng and Li, Yaoyi and Cai, Jia and others},
journal={arXiv preprint arXiv:2512.06112},
year={2025}
}
```
## β οΈ Social Risks and Mitigations
The integration of Vision-Language-Action models into autonomous driving introduces ethical challenges, particularly regarding the opacity of neural decision-making and its impact on road safety. To mitigate these risks, it is imperative to implement explainable AI frameworks and robust safe protocols that ensure predictable vehicle behavior in long-tailed scenarios. Furthermore, addressing concerns over data privacy and public surveillance requires transparent data governance and rigorous de-identification practices. By prioritizing safety-critical alignment and ethical compliance, this research promotes the responsible development and deployment of VLA-based autonomous systems.
## π€ Acknowledgements
We gratefully acknowledge the contributors to the [Recogdrive](https://github.com/xiaomi-research/recogdrive), [Janus](https://github.com/deepseek-ai/Janus), [FUDOKI](https://github.com/fudoki-hku/FUDOKI) and [flow_matching](https://github.com/facebookresearch/flow_matching) repositories, whose commitment to open source has provided us with their excellent codebases and pretrained models.
|