wyddmw commited on
Commit
e2d3f3b
·
verified ·
1 Parent(s): 3c4a2f3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -141
README.md CHANGED
@@ -1,141 +1 @@
1
- ---
2
- license: mit
3
- pipeline_tag: any-to-any
4
- library_name: transformers
5
- datasets:
6
- - wyddmw/OpenREAD
7
- base_model:
8
- - Qwen/Qwen3-VL-8B-Instruct
9
- ---
10
-
11
- <div align="center">
12
-
13
- <h1>
14
- OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic
15
- </h1>
16
-
17
- <p align="center">
18
- [![arxiv paper](https://img.shields.io/badge/arXiv-Paper-red)](https://arxiv.org/abs/2512.01830)
19
- [![🤗 HuggingFace Paper](https://img.shields.io/badge/HuggingFace%F0%9F%A4%97-Paper-orange)](https://huggingface.co/papers/2512.01830)
20
- [![Project Page](https://img.shields.io/badge/Project_Page-%F0%9F%8F%A0-blue)](https://wyddmw.github.io/OpenREAD)
21
- [![GitHub Code](https://img.shields.io/badge/GitHub-Code-181717.svg?logo=github)](https://github.com/wyddmw/OpenREAD)
22
- [![🤗 HuggingFace Models](https://img.shields.io/badge/HuggingFace%F0%9F%A4%97-Models-orange)](https://huggingface.co/wyddmw/OpenREAD/tree/main)
23
- [![🤗 HuggingFace Datasets](https://img.shields.io/badge/HuggingFace%F0%9F%A4%97-Datasets-orange)](https://huggingface.co/datasets/wyddmw/OpenREAD)
24
- </p>
25
-
26
- This repository hosts the OpenREAD model, an OPEN-ended REasoning reinforced vision-language model (VLM)-based autonomous driving (AD) framework, as presented in the paper [OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic](https://huggingface.co/papers/2512.01830).
27
-
28
- OpenREAD enables end-to-end Reinforcement Fine-Tuning (RFT) across the full spectrum from high-level reasoning to low-level trajectory planning. It constructs large-scale Chain-of-Thought (CoT) annotations and employs the powerful Qwen3 Large Language Model (LLM) as the critic in RFT to quantify reasoning quality for open-ended questions during reward modeling. Extensive experiments confirm that joint end-to-end RFT yields substantial improvements in both upstream and downstream tasks, enabling OpenREAD to achieve state-of-the-art performance on reasoning and planning benchmarks.
29
-
30
- <img src="https://huggingface.co/wyddmw/OpenREAD/resolve/main/asset/framework.png"/><br>
31
- An overview of the framework of our OpenREAD.
32
-
33
- </div>
34
-
35
- ## ✨Capabilities
36
-
37
- <img src="https://huggingface.co/wyddmw/OpenREAD/resolve/main/asset/planning.png"/>
38
- <img src="https://huggingface.co/wyddmw/OpenREAD/resolve/main/asset/lingoqa.png"/>
39
-
40
- An overview of the capability of our proposed OpenREAD, a vision-language model tailored for autonomous driving by reinforcement learning with GRPO. Besides the trajectory planning, our OpenREAD is also capable of providing reasoning-enhanced response for open-ended scenario understanding, action analysis, *etc*.
41
-
42
- ## 🦙 Data & Model Zoo
43
- Our OpenREAD is built upon the [Qwen3-VL-8B](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) and finetuned on a mixture of datasets including LingoQA, OmniDrive, and NuScenes datasets. Our OpenREAD is now available at [huggingface](https://huggingface.co/wyddmw/OpenREAD). Enjoy playing with it!
44
-
45
- <img src="https://huggingface.co/wyddmw/OpenREAD/resolve/main/asset/annotations.png"/>
46
-
47
- To facilitate the learning of reasoning capability at the cold start stage, we construct a large scale of CoT annotations on the LingoQA and NuScenes datasets as shown above. We further extend the amount of annotations for LingoQA from 7K to 11K. All the CoT annotations are available [here](https://huggingface.co/datasets/wyddmw/OpenREAD).
48
-
49
- ## 🛠️ Install
50
-
51
- 1. Clone this repository and navigate to OpenREAD folder
52
- ```bash
53
- git clone https://github.com/wyddmw/OpenREAD
54
- cd OpenREAD
55
- ```
56
-
57
- 2. Install `ms-swift` package
58
- ```shell
59
- conda create -n openread python=3.10 -y
60
- conda activate openread
61
- pip install -e .
62
- ```
63
-
64
- 3. Install Flash-Attention.
65
- ```shell
66
- pip install flash_attn=2.8.3 --no-build-isolation
67
- ```
68
- If the installation is not compatible for your device and environment, please refer to the [source code](https://github.com/Dao-AILab/flash-attention/releases) and install the suitable version.
69
-
70
- 4. Install Qwen3-VL dependencies.
71
- ```shell
72
- pip install "transformers==4.57" "qwen_vl_utils==0.0.14"
73
- ```
74
-
75
- ## 🪜 Training & Evaluation
76
-
77
- ### Datasets
78
- The datasets used to train OpenREAD are as follows:
79
- * [NuScenes](https://www.nuscenes.org/nuscenes)
80
- * [LingoQA](https://github.com/wayveai/LingoQA)
81
- * [OmniDrive](https://github.com/NVlabs/OmniDrive)
82
-
83
- Please download our pre-processed [Lidar-BEV](https://huggingface.co/datasets/wyddmw/NuScenes_LidarBev) images for the NuScenes dataset. For trajectory evaluation, we use the GT cache introduced in [GPT-Driver](https://github.com/PointsCoder/GPT-Driver). Please download the GT cache from [Google Drive](https://drive.google.com/drive/folders/1NCqPtdK8agPi1q3sr9-8-vPdYj08OCAE)
84
- The datasets are organized in the following structure:
85
- ```
86
- data
87
- ├── LingoQA
88
- │ ├── action
89
- │ │ └── images
90
- │ ├── evaluation
91
- │ │ │── images
92
- │ │ └── val.parquet
93
- │ ├── scenery
94
- │ │ └── images
95
- │ ├── training_data.json
96
- │ └── evaluation_data.json
97
- ├── nuscenes
98
- │ ├── samples
99
- │ │ ├── CAM_FRONT
100
- │ │ ├── LIDAR_BEV
101
- │ ├── gt
102
- │ │ │── vad_gt_seg.pkl
103
- │ │ └── gt_traj_mask.pkl
104
- │ traj_val_bev_ego_status.json
105
-
106
- ```
107
-
108
- It is recommended to symlink your dataset root to `data`:
109
-
110
- ### Evaluate on the LingoQA dataset.
111
- Before running the evaluation script, please first download the pretrained [Lingo-Judge](wayveai/Lingo-Judge).
112
- Check the path of LingoQA dataset and LingoJudge pretrained model in the `eval/LingoQA/eval_lingo.sh`.
113
- ```shell
114
- sh eval/LingoQA/eval_lingo.sh
115
- ```
116
- The predictions, Lingo-Judge, CIDEr, Meteor, and BLEU metrics will be saved to the `eval/LingoQA/lingoqa_results_OpenREAD.json`.
117
-
118
- ### Evaluation on the NuScenes Trajectory Benchmark
119
- We also provide scripts to evaluate trajectory prediction quality on the NuScenes validation set using both STP-3 and UniAD metrics. Update the trained model path, eval_file path, training mode, and inference outputs path in the `eval/Trajectory/infer_trajs_dist.sh`, then run trajectory inference:
120
- ```shell
121
- bash eval/Trajectory/infer_trajs_dist.sh
122
- ```
123
- This script generates trajectory prediction JSON files under the directory specified by inference outputs path. Next, update the trajectory inference outputs path inside `eval/Trajectory/eval_trajs.py`, Then compute both STP-3 and UniAD metrics by running:
124
- ```shell
125
- python eval/Trajectory/eval_trajs.py
126
- ```
127
-
128
- ## Acknowledgment
129
-
130
- We appreciate the awesome open-source project of [ms-swift](https://github.com/modelscope/ms-swift), [OmniDrive](https://github.com/NVlabs/OmniDrive), and [GPT-Driver](https://github.com/PointsCoder/GPT-Driver).
131
-
132
- ## ✏️ Citation
133
- If you find our work helpful or inspiring, please feel free to cite it.
134
- ```bibtex
135
- @article{zhang2025openread,
136
- title={OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic},
137
- author={Zhang, Songyan and Huang, Wenhui and Chen, Zhan and Collister, Chua Jiahao and Huang, Qihang and Lv, Chen},
138
- journal={arXiv preprint arXiv:2512.01830},
139
- year={2025}
140
- }
141
- ```
 
1
+ This is the model ckpts for our WiseAD. For more details, please refer to our github repo https://github.com/wyddmw/WiseAD