lixirui142
/

FlowR2A

autonomous-driving

motion-planning

generative-model

Model card Files Files and versions

FlowR2A / README.md

lixirui142's picture

Update README.md

1f6fc59 verified 1 day ago

|

History Blame Contribute Delete

3.11 kB

	---
	license: mit
	tags:
	- autonomous-driving
	- motion-planning
	- flow-matching
	- generative-model
	- navsim
	library_name: pytorch
	pipeline_tag: other
	---

	# FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning

	FlowR2A is a generative multimodal driving planner that learns the reward-conditioned action distribution p(a\|r) with flow matching. Instead of treating simulation-based rewards as discriminative targets (as in scoring-based planners), FlowR2A reframes them as generative conditions, unifying the dense supervision of scoring-based methods with the dynamic proposal generation of anchor-based methods in a single model. This forces the planner to internalize how an action relates to its outcomes in safety, progress, comfort, and rule compliance.

	- 📄 Paper: https://arxiv.org/abs/2606.24231
	- 🌐 Project page: https://lixirui142.github.io/flowr2a-project-page/
	- 💻 Code: https://github.com/lixirui142/FlowR2A

	## Model Description

	FlowR2A consists of four components:

	1. Perception Encoder — a Transfuser backbone (multi-view camera + BEV LiDAR) producing scene and agent tokens.
	2. Reward Encoder — embeds simulation reward signals (safety, progress, comfort, rule compliance) into a condition vector injected via adaptive layer norm; supports classifier-free guidance through reward dropout.
	3. Flow-based Action Decoder — a transformer with self-attention over trajectory points and cross-attention to scene tokens, conditioned on reward + time embeddings via AdaLN, trained with a velocity-matching loss over dense action–reward pairs.
	4. Mode Selector — a lightweight transformer that scores generated proposals, trained with online simulation labeling.

	## Checkpoint

	\| File \| Description \|
	\|------\|-------------\|
	\| `flowr2a_s2.ckpt` \| Stage-2 checkpoint, including all components. \|

	## Results

	State-of-the-art closed-loop performance on the NAVSIM `navtest` benchmarks (lightweight backbone).

	NAVSIM v1

	\| Setting \| NC \| DAC \| TTC \| Comf. \| EP \| PDMS \|
	\|---\|---\|---\|---\|---\|---\|---\|
	\| Single proposal \| 98.6 \| 97.3 \| 95.3 \| 100 \| 84.9 \| 90.0 \|
	\| 60 proposals \| 98.8 \| 98.0 \| 96.0 \| 100 \| 90.1 \| 92.8 \|

	NAVSIM v2

	\| NC \| DAC \| DDC \| TLC \| EP \| TTC \| LK \| HC \| EC \| EPDMS \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| 98.9 \| 98.1 \| 99.1 \| 99.7 \| 91.5 \| 98.5 \| 95.0 \| 98.3 \| 65.2 \| 88.9 \|

	## Usage

	See the [GitHub repository](https://github.com/lixirui142/FlowR2A) for setup, the NAVSIM data pipeline, and inference instructions. Download the checkpoint with:

	```python
	from huggingface_hub import hf_hub_download

	ckpt = hf_hub_download(repo_id="lixirui142/FlowR2A", filename="flowr2a_s2.ckpt")
	```

	## Citation

	```bibtex
	@article{flowr2a2026,
	title = {FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning},
	author = {Li, Xirui and Liu, Zhe and Ye, Xiaoqing and Han, Wenhua and Pan, Yifeng and Han, Junyu and Zhao, Hengshuang},
	journal = {arXiv preprint},
	year = {2026}
	}
	```

	## License

	Released under the MIT License.