|
|
--- |
|
|
library_name: transformers |
|
|
pipeline_tag: image-text-to-text |
|
|
license: apache-2.0 |
|
|
task_categories: |
|
|
- reinforcement-learning |
|
|
- robotics |
|
|
- vision-language-modelling |
|
|
tags: |
|
|
- autonomous-driving |
|
|
- carla |
|
|
- imitation-learning |
|
|
- vlm |
|
|
- found-rl |
|
|
size_categories: |
|
|
- 10G-100G |
|
|
--- |
|
|
|
|
|
# Found-RL's fine-tuned Vision-Language Models (VLMs) |
|
|
|
|
|
## π Overview |
|
|
|
|
|
These VLMs serve for the paper **"Found-RL: Foundation Model-Enhanced Reinforcement Learning for Autonomous Driving"**. |
|
|
|
|
|
In this work, we use fine-tuned VLMs to provide feedback for reinforcement learning agents in autonomous driving scenarios. |
|
|
|
|
|
- **π Paper:** [Found-RL: foundation model-enhanced reinforcement learning for autonomous driving](https://www.arxiv.org/pdf/2602.10458) |
|
|
- **π» Code & Usage:** [https://github.com/ys-qu/found-rl](https://github.com/ys-qu/found-rl) |
|
|
- **π Dataset:** [https://huggingface.co/datasets/ys-qu/found-rl_dataset](https://huggingface.co/datasets/ys-qu/found-rl_dataset) |
|
|
|
|
|
## π¦ Fine-tuning strategies |
|
|
|
|
|
1. **RGB + Text (LoRA SFT):** |
|
|
- **Visual Input:** Front-view RGB camera images (shape = 900 * 256). |
|
|
- **Method:** Used for **LoRA (Low-Rank Adaptation)** Supervised Fine-Tuning. |
|
|
- **Purpose:** To enable the VLM to understand visual scenes and follow driving instructions based on realistic camera feeds. |
|
|
|
|
|
2. **Rendered BEV + Text (Full SFT):** |
|
|
- **Visual Input:** Rendered Bird's Eye View (BEV) semantic maps (shape = 192 * 192). |
|
|
- **Method:** Used for **Full Parameter** Supervised Fine-Tuning. |
|
|
- **Purpose:** To provide a holistic spatial understanding of the driving environment, allowing the VLM to act as an expert. |
|
|
|
|
|
If you use these VLMs in your research, please cite our paper: |
|
|
```bibtex |
|
|
@misc{qu2026foundrl, |
|
|
title={Found-RL: foundation model-enhanced reinforcement learning for autonomous driving}, |
|
|
author={Yansong Qu and Zihao Sheng and Zilin Huang and Jiancong Chen and Yuhao Luo and Tianyi Wang and Yiheng Feng and Samuel Labi and Sikai Chen}, |
|
|
year={2026}, |
|
|
eprint={2602.10458}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.AI}, |
|
|
url={https://arxiv.org/abs/2602.10458}, |
|
|
} |