FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning

FlowR2A is a generative multimodal driving planner that learns the reward-conditioned action distribution p(a|r) with flow matching. Instead of treating simulation-based rewards as discriminative targets (as in scoring-based planners), FlowR2A reframes them as generative conditions, unifying the dense supervision of scoring-based methods with the dynamic proposal generation of anchor-based methods in a single model. This forces the planner to internalize how an action relates to its outcomes in safety, progress, comfort, and rule compliance.

📄 Paper: https://arxiv.org/abs/2606.24231
🌐 Project page: https://lixirui142.github.io/flowr2a-project-page/
💻 Code: https://github.com/lixirui142/FlowR2A

Model Description

FlowR2A consists of four components:

Perception Encoder — a Transfuser backbone (multi-view camera + BEV LiDAR) producing scene and agent tokens.
Reward Encoder — embeds simulation reward signals (safety, progress, comfort, rule compliance) into a condition vector injected via adaptive layer norm; supports classifier-free guidance through reward dropout.
Flow-based Action Decoder — a transformer with self-attention over trajectory points and cross-attention to scene tokens, conditioned on reward + time embeddings via AdaLN, trained with a velocity-matching loss over dense action–reward pairs.
Mode Selector — a lightweight transformer that scores generated proposals, trained with online simulation labeling.

Checkpoint

File	Description
`flowr2a_s2.ckpt`	Stage-2 checkpoint, including all components.

Results

State-of-the-art closed-loop performance on the NAVSIM navtest benchmarks (lightweight backbone).

NAVSIM v1

Setting	NC	DAC	TTC	Comf.	EP	PDMS
Single proposal	98.6	97.3	95.3	100	84.9	90.0
60 proposals	98.8	98.0	96.0	100	90.1	92.8

NAVSIM v2

NC	DAC	DDC	TLC	EP	TTC	LK	HC	EC	EPDMS
98.9	98.1	99.1	99.7	91.5	98.5	95.0	98.3	65.2	88.9

Usage

See the GitHub repository for setup, the NAVSIM data pipeline, and inference instructions. Download the checkpoint with:

from huggingface_hub import hf_hub_download

ckpt = hf_hub_download(repo_id="lixirui142/FlowR2A", filename="flowr2a_s2.ckpt")

Citation

@article{flowr2a2026,
  title   = {FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning},
  author  = {Li, Xirui and Liu, Zhe and Ye, Xiaoqing and Han, Wenhua and Pan, Yifeng and Han, Junyu and Zhao, Hengshuang},
  journal = {arXiv preprint},
  year    = {2026}
}

License

Released under the MIT License.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Other

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for lixirui142/FlowR2A

FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning

Paper • 2606.24231 • Published 2 days ago • 1