File size: 2,151 Bytes
16d69d7
1cbf363
 
16d69d7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c816102
16d69d7
 
 
c816102
16d69d7
c816102
16d69d7
 
 
fe2a2ef
16d69d7
df3d460
16d69d7
 
df3d460
16d69d7
 
 
 
df3d460
16d69d7
df3d460
16d69d7
 
712ea2f
 
16d69d7
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
library_name: transformers
pipeline_tag: image-text-to-text
license: apache-2.0
task_categories:
- reinforcement-learning
- robotics
- vision-language-modelling
tags:
- autonomous-driving
- carla
- imitation-learning
- vlm
- found-rl
size_categories:
- 10G-100G
---

# Found-RL's fine-tuned Vision-Language Models (VLMs)

## ๐Ÿ“œ Overview

These VLMs serve for the paper **"Found-RL: Foundation Model-Enhanced Reinforcement Learning for Autonomous Driving"**.

In this work, we use fine-tuned VLMs to provide feedback for reinforcement learning agents in autonomous driving scenarios. 

- **๐Ÿ“„ Paper:** [Found-RL: foundation model-enhanced reinforcement learning for autonomous driving](https://www.arxiv.org/pdf/2602.10458)
- **๐Ÿ’ป Code & Usage:** [https://github.com/ys-qu/found-rl](https://github.com/ys-qu/found-rl)
- **๐Ÿ“‚ Dataset:** [https://huggingface.co/datasets/ys-qu/found-rl_dataset](https://huggingface.co/datasets/ys-qu/found-rl_dataset)

## ๐Ÿ“ฆ Fine-tuning strategies

1.  **RGB + Text (LoRA SFT):**
    -   **Visual Input:** Front-view RGB camera images (shape = 900 * 256).
    -   **Method:** Used for **LoRA (Low-Rank Adaptation)** Supervised Fine-Tuning.
    -   **Purpose:** To enable the VLM to understand visual scenes and follow driving instructions based on realistic camera feeds.

2.  **Rendered BEV + Text (Full SFT):**
    -   **Visual Input:** Rendered Bird's Eye View (BEV) semantic maps  (shape = 192 * 192).
    -   **Method:** Used for **Full Parameter** Supervised Fine-Tuning.
    -   **Purpose:** To provide a holistic spatial understanding of the driving environment, allowing the VLM to act as an expert.

If you use these VLMs in your research, please cite our paper:
```bibtex
@misc{qu2026foundrl,
      title={Found-RL: foundation model-enhanced reinforcement learning for autonomous driving}, 
      author={Yansong Qu and Zihao Sheng and Zilin Huang and Jiancong Chen and Yuhao Luo and Tianyi Wang and Yiheng Feng and Samuel Labi and Sikai Chen},
      year={2026},
      eprint={2602.10458},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2602.10458}, 
}