Update README.md
Browse files
README.md
CHANGED
|
@@ -1,141 +1 @@
|
|
| 1 |
-
|
| 2 |
-
license: mit
|
| 3 |
-
pipeline_tag: any-to-any
|
| 4 |
-
library_name: transformers
|
| 5 |
-
datasets:
|
| 6 |
-
- wyddmw/OpenREAD
|
| 7 |
-
base_model:
|
| 8 |
-
- Qwen/Qwen3-VL-8B-Instruct
|
| 9 |
-
---
|
| 10 |
-
|
| 11 |
-
<div align="center">
|
| 12 |
-
|
| 13 |
-
<h1>
|
| 14 |
-
OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic
|
| 15 |
-
</h1>
|
| 16 |
-
|
| 17 |
-
<p align="center">
|
| 18 |
-
[](https://arxiv.org/abs/2512.01830)
|
| 19 |
-
[](https://huggingface.co/papers/2512.01830)
|
| 20 |
-
[](https://wyddmw.github.io/OpenREAD)
|
| 21 |
-
[](https://github.com/wyddmw/OpenREAD)
|
| 22 |
-
[](https://huggingface.co/wyddmw/OpenREAD/tree/main)
|
| 23 |
-
[](https://huggingface.co/datasets/wyddmw/OpenREAD)
|
| 24 |
-
</p>
|
| 25 |
-
|
| 26 |
-
This repository hosts the OpenREAD model, an OPEN-ended REasoning reinforced vision-language model (VLM)-based autonomous driving (AD) framework, as presented in the paper [OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic](https://huggingface.co/papers/2512.01830).
|
| 27 |
-
|
| 28 |
-
OpenREAD enables end-to-end Reinforcement Fine-Tuning (RFT) across the full spectrum from high-level reasoning to low-level trajectory planning. It constructs large-scale Chain-of-Thought (CoT) annotations and employs the powerful Qwen3 Large Language Model (LLM) as the critic in RFT to quantify reasoning quality for open-ended questions during reward modeling. Extensive experiments confirm that joint end-to-end RFT yields substantial improvements in both upstream and downstream tasks, enabling OpenREAD to achieve state-of-the-art performance on reasoning and planning benchmarks.
|
| 29 |
-
|
| 30 |
-
<img src="https://huggingface.co/wyddmw/OpenREAD/resolve/main/asset/framework.png"/><br>
|
| 31 |
-
An overview of the framework of our OpenREAD.
|
| 32 |
-
|
| 33 |
-
</div>
|
| 34 |
-
|
| 35 |
-
## ✨Capabilities
|
| 36 |
-
|
| 37 |
-
<img src="https://huggingface.co/wyddmw/OpenREAD/resolve/main/asset/planning.png"/>
|
| 38 |
-
<img src="https://huggingface.co/wyddmw/OpenREAD/resolve/main/asset/lingoqa.png"/>
|
| 39 |
-
|
| 40 |
-
An overview of the capability of our proposed OpenREAD, a vision-language model tailored for autonomous driving by reinforcement learning with GRPO. Besides the trajectory planning, our OpenREAD is also capable of providing reasoning-enhanced response for open-ended scenario understanding, action analysis, *etc*.
|
| 41 |
-
|
| 42 |
-
## 🦙 Data & Model Zoo
|
| 43 |
-
Our OpenREAD is built upon the [Qwen3-VL-8B](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct) and finetuned on a mixture of datasets including LingoQA, OmniDrive, and NuScenes datasets. Our OpenREAD is now available at [huggingface](https://huggingface.co/wyddmw/OpenREAD). Enjoy playing with it!
|
| 44 |
-
|
| 45 |
-
<img src="https://huggingface.co/wyddmw/OpenREAD/resolve/main/asset/annotations.png"/>
|
| 46 |
-
|
| 47 |
-
To facilitate the learning of reasoning capability at the cold start stage, we construct a large scale of CoT annotations on the LingoQA and NuScenes datasets as shown above. We further extend the amount of annotations for LingoQA from 7K to 11K. All the CoT annotations are available [here](https://huggingface.co/datasets/wyddmw/OpenREAD).
|
| 48 |
-
|
| 49 |
-
## 🛠️ Install
|
| 50 |
-
|
| 51 |
-
1. Clone this repository and navigate to OpenREAD folder
|
| 52 |
-
```bash
|
| 53 |
-
git clone https://github.com/wyddmw/OpenREAD
|
| 54 |
-
cd OpenREAD
|
| 55 |
-
```
|
| 56 |
-
|
| 57 |
-
2. Install `ms-swift` package
|
| 58 |
-
```shell
|
| 59 |
-
conda create -n openread python=3.10 -y
|
| 60 |
-
conda activate openread
|
| 61 |
-
pip install -e .
|
| 62 |
-
```
|
| 63 |
-
|
| 64 |
-
3. Install Flash-Attention.
|
| 65 |
-
```shell
|
| 66 |
-
pip install flash_attn=2.8.3 --no-build-isolation
|
| 67 |
-
```
|
| 68 |
-
If the installation is not compatible for your device and environment, please refer to the [source code](https://github.com/Dao-AILab/flash-attention/releases) and install the suitable version.
|
| 69 |
-
|
| 70 |
-
4. Install Qwen3-VL dependencies.
|
| 71 |
-
```shell
|
| 72 |
-
pip install "transformers==4.57" "qwen_vl_utils==0.0.14"
|
| 73 |
-
```
|
| 74 |
-
|
| 75 |
-
## 🪜 Training & Evaluation
|
| 76 |
-
|
| 77 |
-
### Datasets
|
| 78 |
-
The datasets used to train OpenREAD are as follows:
|
| 79 |
-
* [NuScenes](https://www.nuscenes.org/nuscenes)
|
| 80 |
-
* [LingoQA](https://github.com/wayveai/LingoQA)
|
| 81 |
-
* [OmniDrive](https://github.com/NVlabs/OmniDrive)
|
| 82 |
-
|
| 83 |
-
Please download our pre-processed [Lidar-BEV](https://huggingface.co/datasets/wyddmw/NuScenes_LidarBev) images for the NuScenes dataset. For trajectory evaluation, we use the GT cache introduced in [GPT-Driver](https://github.com/PointsCoder/GPT-Driver). Please download the GT cache from [Google Drive](https://drive.google.com/drive/folders/1NCqPtdK8agPi1q3sr9-8-vPdYj08OCAE)
|
| 84 |
-
The datasets are organized in the following structure:
|
| 85 |
-
```
|
| 86 |
-
data
|
| 87 |
-
├── LingoQA
|
| 88 |
-
│ ├── action
|
| 89 |
-
│ │ └── images
|
| 90 |
-
│ ├── evaluation
|
| 91 |
-
│ │ │── images
|
| 92 |
-
│ │ └── val.parquet
|
| 93 |
-
│ ├── scenery
|
| 94 |
-
│ │ └── images
|
| 95 |
-
│ ├── training_data.json
|
| 96 |
-
│ └── evaluation_data.json
|
| 97 |
-
├── nuscenes
|
| 98 |
-
│ ├── samples
|
| 99 |
-
│ │ ├── CAM_FRONT
|
| 100 |
-
│ │ ├── LIDAR_BEV
|
| 101 |
-
│ ├── gt
|
| 102 |
-
│ │ │── vad_gt_seg.pkl
|
| 103 |
-
│ │ └── gt_traj_mask.pkl
|
| 104 |
-
│ traj_val_bev_ego_status.json
|
| 105 |
-
│
|
| 106 |
-
```
|
| 107 |
-
|
| 108 |
-
It is recommended to symlink your dataset root to `data`:
|
| 109 |
-
|
| 110 |
-
### Evaluate on the LingoQA dataset.
|
| 111 |
-
Before running the evaluation script, please first download the pretrained [Lingo-Judge](wayveai/Lingo-Judge).
|
| 112 |
-
Check the path of LingoQA dataset and LingoJudge pretrained model in the `eval/LingoQA/eval_lingo.sh`.
|
| 113 |
-
```shell
|
| 114 |
-
sh eval/LingoQA/eval_lingo.sh
|
| 115 |
-
```
|
| 116 |
-
The predictions, Lingo-Judge, CIDEr, Meteor, and BLEU metrics will be saved to the `eval/LingoQA/lingoqa_results_OpenREAD.json`.
|
| 117 |
-
|
| 118 |
-
### Evaluation on the NuScenes Trajectory Benchmark
|
| 119 |
-
We also provide scripts to evaluate trajectory prediction quality on the NuScenes validation set using both STP-3 and UniAD metrics. Update the trained model path, eval_file path, training mode, and inference outputs path in the `eval/Trajectory/infer_trajs_dist.sh`, then run trajectory inference:
|
| 120 |
-
```shell
|
| 121 |
-
bash eval/Trajectory/infer_trajs_dist.sh
|
| 122 |
-
```
|
| 123 |
-
This script generates trajectory prediction JSON files under the directory specified by inference outputs path. Next, update the trajectory inference outputs path inside `eval/Trajectory/eval_trajs.py`, Then compute both STP-3 and UniAD metrics by running:
|
| 124 |
-
```shell
|
| 125 |
-
python eval/Trajectory/eval_trajs.py
|
| 126 |
-
```
|
| 127 |
-
|
| 128 |
-
## Acknowledgment
|
| 129 |
-
|
| 130 |
-
We appreciate the awesome open-source project of [ms-swift](https://github.com/modelscope/ms-swift), [OmniDrive](https://github.com/NVlabs/OmniDrive), and [GPT-Driver](https://github.com/PointsCoder/GPT-Driver).
|
| 131 |
-
|
| 132 |
-
## ✏️ Citation
|
| 133 |
-
If you find our work helpful or inspiring, please feel free to cite it.
|
| 134 |
-
```bibtex
|
| 135 |
-
@article{zhang2025openread,
|
| 136 |
-
title={OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic},
|
| 137 |
-
author={Zhang, Songyan and Huang, Wenhui and Chen, Zhan and Collister, Chua Jiahao and Huang, Qihang and Lv, Chen},
|
| 138 |
-
journal={arXiv preprint arXiv:2512.01830},
|
| 139 |
-
year={2025}
|
| 140 |
-
}
|
| 141 |
-
```
|
|
|
|
| 1 |
+
This is the model ckpts for our WiseAD. For more details, please refer to our github repo https://github.com/wyddmw/WiseAD
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|