Files changed (4) hide show
  1. README.md +3 -140
  2. best_model_epoch95.pt +0 -3
  3. config.json +0 -67
  4. model.safetensors +0 -3
README.md CHANGED
@@ -1,140 +1,3 @@
1
- <h1 align='center'>WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving</h1>
2
- <div align='center'>
3
- <a href='https://github.com/YoucanBaby' target='_blank'>Yifang Xu</a><sup>1*</sup>&emsp;
4
- <a href='https://cuijh26.github.io/' target='_blank'>Jiahao Cui</a><sup>1*</sup>&emsp;
5
- <a href='https://github.com/fudan-generative-vision/WAM-Flow' target='_blank'>Feipeng Cai</a><sup>2*</sup>&emsp;
6
- <a href='https://github.com/SSSSSSuger' target='_blank'>Zhihao Zhu</a><sup>1</sup>&emsp;
7
- <a href='https://github.com/NinoNeumann' target='_blank'>Hanlin Shang</a><sup>1</sup>&emsp;
8
- <a href='https://github.com/isan089' target='_blank'>Shan Luan</a><sup>1</sup>&emsp;
9
- </div>
10
- <div align='center'>
11
- <a href='https://github.com/xumingw' target='_blank'>Mingwang Xu</a><sup>1</sup>&emsp;
12
- <a href='https://github.com/fudan-generative-vision/WAM-Flow' target='_blank'>Neng Zhang</a><sup>2</sup>&emsp;
13
- <a href='https://github.com/fudan-generative-vision/WAM-Flow' target='_blank'>Yaoyi Li</a><sup>2</sup>&emsp;
14
- <a href='https://github.com/fudan-generative-vision/WAM-Flow‘ target='_blank'>Jia Cai</a><sup>2</sup>&emsp;
15
- <a href='https://sites.google.com/site/zhusiyucs/home' target='_blank'>Siyu Zhu</a><sup>1</sup>&emsp;
16
- </div>
17
-
18
- <div align='center'>
19
- <sup>1</sup>Fudan University&emsp; <sup>2</sup>Yinwang Intelligent Technology Co., Ltd&emsp;
20
- </div>
21
-
22
- <br>
23
- <div align='center'>
24
- <a href='https://github.com/fudan-generative-vision/WAM-Flow'><img src='https://img.shields.io/github/stars/fudan-generative-vision/WAM-Flow?style=social'></a>
25
- <a href='https://arxiv.org/abs/2512.06112'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
26
- <a href='https://huggingface.co/fudan-generative-ai/WAM-Flow'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>
27
- </div>
28
- <br>
29
-
30
-
31
-
32
- ## 📰 News
33
- - **`2026/02/01`**: 🎉🎉🎉 Release the pretrained models on [Huggingface](https://huggingface.co/fudan-generative-ai/WAM-Flow).
34
- - **`2025/12/06`**: 🎉🎉🎉 Paper submitted on [Arxiv](https://arxiv.org/pdf/2512.06112).
35
-
36
-
37
-
38
- ## 📅️ Roadmap
39
-
40
- | Status | Milestone | ETA |
41
- | :----: | :----------------------------------------------------------------------------------------------------: | :--------: |
42
- | ✅ | **[Release the SFT and inference code](https://github.com/fudan-generative-vision/WAM-Flow)** | 2025.12.19 |
43
- | ✅ | **[Pretrained models on Huggingface](https://huggingface.co/fudan-generative-ai/WAM-Flow)** | 2026.02.01 |
44
- | 🚀 | **[Release the evaluation code](https://huggingface.co/fudan-generative-ai/WAM-Flow)** | TBD |
45
- | 🚀 | **[Release the RL code](https://github.com/fudan-generative-vision/WAM-Flow)** | TBD |
46
- | 🚀 | **[Release the pre-processed training data](#training)** | TBD |
47
-
48
-
49
- ## 📸 Showcase
50
- ![teaser](assets/Figure_1.png)
51
-
52
- ## 🏆 Qualitative Results on NAVSIM
53
- ### NAVSIM-v1 benchmark results
54
- <div style="text-align: center;">
55
- <img src="assets/navsim-v1.png" alt="navsim-v1" width="70%" />
56
- </div>
57
-
58
- ### NAVSIM-v2 benchmark results
59
- <div style="text-align: center;">
60
- <img src="assets/navsim-v2.png" alt="navsim-v2" width="70%" />
61
- </div>
62
-
63
-
64
-
65
- ## 🔧️ Framework
66
- ![framework](assets/Figure_2.png)
67
- Our method takes as input a front-view image, a natural-language navigation command with a system prompt, and the ego-vehicle states, and outputs an 8-waypoint future trajectory spanning 4 seconds through parallel denoising. The model is first trained via supervised fine-tuning to learn accurate trajectory prediction. We then apply simulatorguided GRPO to further optimize closed-loop behavior. The GRPO reward function integrates safety constraints (collision avoidance, drivable-area compliance) with performance objectives (ego-progress, time-to-collision, comfort).
68
-
69
-
70
-
71
- ## Quick Start
72
-
73
- ### Installation
74
-
75
- Clone the repo:
76
-
77
- ```sh
78
- git clone https://github.com/fudan-generative-vision/WAM-Flow.git
79
- cd WAM-Flow
80
- ```
81
-
82
- Install dependencies:
83
-
84
- ```sh
85
- conda create --name wam-flow python=3.10
86
- conda activate wam-flow
87
- pip install -r requirements.txt
88
- ```
89
-
90
-
91
- ### Model Download
92
-
93
- Download models using huggingface-cli:
94
-
95
- ```sh
96
- pip install "huggingface_hub[cli]"
97
- huggingface-cli download fudan-generative-ai/WAM-Flow --local-dir ./pretrained_model/wam-flow
98
- huggingface-cli download LucasJinWang/FUDOKI --local-dir ./pretrained_model/fudoki
99
- ```
100
-
101
-
102
-
103
- ### Inference
104
-
105
- ```sh
106
- sh script/infer.sh
107
- ```
108
-
109
-
110
- ### Training
111
-
112
- ```bash
113
- sh script/sft_debug.sh
114
- ```
115
-
116
-
117
-
118
- ## 📝 Citation
119
-
120
- If you find our work useful for your research, please consider citing the paper:
121
-
122
- ```
123
- @article{xu2025wam,
124
- title={WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving},
125
- author={Xu, Yifang and Cui, Jiahao and Cai, Feipeng and Zhu, Zhihao and Shang, Hanlin and Luan, Shan and Xu, Mingwang and Zhang, Neng and Li, Yaoyi and Cai, Jia and others},
126
- journal={arXiv preprint arXiv:2512.06112},
127
- year={2025}
128
- }
129
- ```
130
-
131
-
132
-
133
- ## ⚠️ Social Risks and Mitigations
134
-
135
- The integration of Vision-Language-Action models into autonomous driving introduces ethical challenges, particularly regarding the opacity of neural decision-making and its impact on road safety. To mitigate these risks, it is imperative to implement explainable AI frameworks and robust safe protocols that ensure predictable vehicle behavior in long-tailed scenarios. Furthermore, addressing concerns over data privacy and public surveillance requires transparent data governance and rigorous de-identification practices. By prioritizing safety-critical alignment and ethical compliance, this research promotes the responsible development and deployment of VLA-based autonomous systems.
136
-
137
-
138
-
139
- ## 🤗 Acknowledgements
140
- We gratefully acknowledge the contributors to the [Recogdrive](https://github.com/xiaomi-research/recogdrive), [Janus](https://github.com/deepseek-ai/Janus), [FUDOKI](https://github.com/fudoki-hku/FUDOKI) and [flow_matching](https://github.com/facebookresearch/flow_matching) repositories, whose commitment to open source has provided us with their excellent codebases and pretrained models.
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
best_model_epoch95.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:5aa6f4bc61f10ea63c26fd12126afdc25adfdbb0d60df1a1db51daa9dbbb62df
3
- size 1764405
 
 
 
 
config.json DELETED
@@ -1,67 +0,0 @@
1
- {
2
- "_name_or_path": "/cache/models/LucasJinWang-FUDOKI",
3
- "aligner_config": {
4
- "cls": "MlpProjector",
5
- "model_type": "aligner",
6
- "params": {
7
- "depth": 2,
8
- "input_dim": 1024,
9
- "n_embed": 2048,
10
- "projector_type": "mlp_gelu"
11
- }
12
- },
13
- "architectures": [
14
- "MultiModalityCausalLM"
15
- ],
16
- "gen_aligner_config": {
17
- "cls": "MlpProjector",
18
- "model_type": "gen_aligner",
19
- "params": {
20
- "depth": 2,
21
- "input_dim": 8,
22
- "n_embed": 2048,
23
- "projector_type": "mlp_gelu"
24
- }
25
- },
26
- "gen_head_config": {
27
- "cls": "vision_head",
28
- "model_type": "gen_head",
29
- "params": {
30
- "image_token_embed": 2048,
31
- "image_token_size": 16384,
32
- "n_embed": 2048
33
- }
34
- },
35
- "gen_vision_config": {
36
- "cls": "VQ-16",
37
- "model_type": "gen_vision",
38
- "params": {
39
- "image_token_size": 16384,
40
- "n_embed": 8
41
- }
42
- },
43
- "language_config": {
44
- "hidden_size": 2048,
45
- "intermediate_size": 5632,
46
- "max_position_embeddings": 16384,
47
- "model_type": "llama",
48
- "num_attention_heads": 16,
49
- "num_hidden_layers": 24,
50
- "num_key_value_heads": 16,
51
- "torch_dtype": "bfloat16",
52
- "vocab_size": 122401
53
- },
54
- "model_type": "multi_modality",
55
- "torch_dtype": "float32",
56
- "transformers_version": "4.42.4",
57
- "vision_config": {
58
- "cls": "CLIPVisionTower",
59
- "model_type": "vision",
60
- "params": {
61
- "image_size": 384,
62
- "model_name": "siglip_large_patch16_384",
63
- "select_feature": "same",
64
- "select_layer": -1
65
- }
66
- }
67
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:8b06580ad28d20eeade8c103c70cb60ac8d538bc20d995684aedfa2674babf02
3
- size 8684997012