cuijh26 commited on
Commit
9851bac
Β·
verified Β·
1 Parent(s): d3b21c6

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +142 -3
README.md CHANGED
@@ -1,3 +1,142 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <h1 align='center'>WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving</h1>
2
+ <div align='center'>
3
+ <a href='https://github.com/xumingw' target='_blank'>Mingwang Xu</a><sup>1*</sup>&emsp;
4
+ <a href='https://cuijh26.github.io/' target='_blank'>Jiahao Cui</a><sup>1*</sup>&emsp;
5
+ <a href='https://github.com/fudan-generative-vision/WAM-Diff' target='_blank'>Feipeng Cai</a><sup>2*</sup>&emsp;
6
+ <a href='https://github.com/NinoNeumann' target='_blank'>Hanlin Shang</a><sup>1*</sup>&emsp;
7
+ <a href='https://github.com/SSSSSSuger' target='_blank'>Zhihao Zhu</a><sup>1</sup>&emsp;
8
+ <a href='https://github.com/isan089' target='_blank'>Shan Luan</a><sup>1</sup>&emsp;
9
+ </div>
10
+ <div align='center'>
11
+ <a href='https://github.com/YoucanBaby' target='_blank'>Yifang Xu</a><sup>1</sup>&emsp;
12
+ <a href='https://github.com/fudan-generative-vision/WAM-Diff' target='_blank'>Neng Zhang</a><sup>2</sup>&emsp;
13
+ <a href='https://github.com/fudan-generative-vision/WAM-Diff' target='_blank'>Yaoyi Li</a><sup>2</sup>&emsp;
14
+ <a href='https://github.com/fudan-generative-vision/WAM-Diff' target='_blank'>Jia Cai</a><sup>2</sup>&emsp;
15
+ <a href='https://sites.google.com/site/zhusiyucs/home' target='_blank'>Siyu Zhu</a><sup>1</sup>&emsp;
16
+ </div>
17
+
18
+ <div align='center'>
19
+ <sup>1</sup>Fudan University&emsp; <sup>2</sup>Yinwang Intelligent Technology Co., Ltd&emsp;
20
+ </div>
21
+
22
+ <br>
23
+ <div align='center'>
24
+ <a href='https://github.com/fudan-generative-vision/WAM-Diff'><img src='https://img.shields.io/github/stars/fudan-generative-vision/WAM-Diff?style=social'></a>
25
+ <a href='https://arxiv.org/abs/2512.11872'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
26
+ <a href='https://huggingface.co/fudan-generative-ai/WAM-Diff'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a>
27
+
28
+ </div>
29
+ <br>
30
+
31
+ ## πŸ“° News
32
+
33
+ - **`2025/02/01`**: πŸŽ‰πŸŽ‰πŸŽ‰ Release the pretrained models on [Huggingface](https://huggingface.co/fudan-generative-ai/WAM-Diff).
34
+ - **`2025/12/06`**: πŸŽ‰πŸŽ‰πŸŽ‰ Paper submitted on [Arxiv](https://arxiv.org/pdf/2512.11872).
35
+
36
+ ## πŸ“…οΈ Roadmap
37
+
38
+ | Status | Milestone | ETA |
39
+ | :----: | :----------------------------------------------------------------------------------------------------: | :--------: |
40
+ | βœ… | **[Release the inference source code](https://github.com/fudan-generative-vision/WAM-Diff)** | 2025.12.21 |
41
+ | βœ… | **[Release the SFT and inf code](https://github.com/fudan-generative-vision/WAM-Diff)** | 2025.12.21 |
42
+ | βœ… | **[Release pretrained models on Huggingface](https://huggingface.co/fudan-generative-ai/WAM-Diff)** | 2026.02.01 |
43
+ | πŸš€ | **[Release NAVSIM evaluation code](https://huggingface.co/fudan-generative-ai/WAM-Diff)** | TBD |
44
+ | πŸš€ | **[Release the RL code](https://github.com/fudan-generative-vision/WAM-Diff)** | TBD |
45
+
46
+
47
+ ## πŸ”§οΈ Framework
48
+ ![framework](assets/main_arch.png)
49
+
50
+ ## πŸ† Qualitative Results on NAVSIM
51
+ ### NAVSIM-v1 benchmark results
52
+ <div style="text-align: center;">
53
+ <img src="assets/navsim-v1.png" alt="navsim-v1" width="70%" />
54
+ </div>
55
+
56
+ ### NAVSIM-v2 benchmark results
57
+ <div style="text-align: center;">
58
+ <img src="assets/navsim-v2.png" alt="navsim-v2" width="90%" />
59
+ </div>
60
+
61
+
62
+
63
+ ## Quick Inference Demo
64
+ The WAM-Diff will be available on Hugging Face Hub soon. To quickly test the model, follow these simple steps:
65
+
66
+ 1. **Clone the repository**
67
+ ```bash
68
+ git clone https://github.com/fudan-generative-vision/WAM-Diff
69
+ cd WAM-Diff
70
+ ```
71
+ 2. **Initialize the environment**
72
+ If you prefer conda, run the environment setup script to install necessary dependencies:
73
+ ```bash
74
+ bash init_env.sh
75
+ ```
76
+ Or you can use uv to create the environment:
77
+ ```bash
78
+ uv venv && uv sync
79
+ ```
80
+ 3. **Prepare the Model**
81
+ Download the pretrained [WAM-Diff](https://huggingface.co/fudan-generative-ai/WAM-Diff) model from Hugging Face to the `./model/WAM-Diff` directory:
82
+ ```
83
+ https://huggingface.co/fudan-generative-ai/WAM-Diff
84
+ ```
85
+ Download the pretrained Siglip2 model from Hugging Face to the `./model/siglip2-so400m-patch14-384` directory:
86
+ ```
87
+ https://huggingface.co/google/siglip2-so400m-patch14-384
88
+ ```
89
+
90
+
91
+ 3. **Run the demo script**
92
+ Execute the demo script to test WAM-Diff on an example image:
93
+ ```bash
94
+ bash inf.sh
95
+ ```
96
+
97
+ ## Training
98
+ To fine-tune WAM-Diff, please follow these steps:
99
+ 1. **Set Up the Environment**
100
+ Follow the same environment setup steps as in the Quick Inference Demo section.
101
+ 2. **Prepare the Data**
102
+ Prepare your training dataset in JSON format like
103
+ ```json
104
+ [
105
+ {
106
+ "image": ["path/to/image1.png"],
107
+ "conversations": [
108
+ {
109
+ "from": "human",
110
+ "value": "Here is front views of a driving vehicle:\n<image>\nThe navigation information is: straight\nThe current position is (0.00,0.00)\nCurrent velocity is: (13.48,-0.29) and current accelerate is: (0.19,0.05)\nPredict the optimal driving action for the next 4 seconds with 8 new waypoints."
111
+ },
112
+ {
113
+ "from": "gpt",
114
+ "value": "6.60,-0.01,13.12,-0.03,19.58,-0.04,25.95,-0.03,32.27,-0.03,38.56,-0.05,44.88,-0.06,51.16,-0.09"
115
+ }
116
+ ]
117
+ },
118
+ ...
119
+ ]
120
+ ```
121
+ 3. **Run the Training Script**
122
+ Execute the training script with the following command:
123
+ ```bash
124
+ cd train
125
+ bash ./scripts/llada_v_finetune.sh
126
+ ```
127
+
128
+ ## πŸ“ Citation
129
+
130
+ If you find our work useful for your research, please consider citing the paper:
131
+
132
+ ```
133
+ @article{xu2025wam,
134
+ title={WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving},
135
+ author={Xu, Mingwang and Cui, Jiahao and Cai, Feipeng and Shang, Hanlin and Zhu, Zhihao and Luan, Shan and Xu, Yifang and Zhang, Neng and Li, Yaoyi and Cai, Jia and others},
136
+ journal={arXiv preprint arXiv:2512.11872},
137
+ year={2025}
138
+ }
139
+ ```
140
+
141
+ ## πŸ€— Acknowledgements
142
+ We gratefully acknowledge the contributors to the [LLaDA-V](https://github.com/ML-GSAI/LLaDA-V), repositories, whose commitment to open source has provided us with their excellent codebases and pretrained models.