ys-qu
/

found-rl_vlms

Image-Text-to-Text

autonomous-driving

imitation-learning

Model card Files Files and versions

found-rl_vlms / README.md

ys-qu's picture

Update README.md

1cbf363 verified 7 days ago

|

history blame contribute delete

2.15 kB

	---
	library_name: transformers
	pipeline_tag: image-text-to-text
	license: apache-2.0
	task_categories:
	- reinforcement-learning
	- robotics
	- vision-language-modelling
	tags:
	- autonomous-driving
	- carla
	- imitation-learning
	- vlm
	- found-rl
	size_categories:
	- 10G-100G
	---

	# Found-RL's fine-tuned Vision-Language Models (VLMs)

	## 📜 Overview

	These VLMs serve for the paper "Found-RL: Foundation Model-Enhanced Reinforcement Learning for Autonomous Driving".

	In this work, we use fine-tuned VLMs to provide feedback for reinforcement learning agents in autonomous driving scenarios.

	- 📄 Paper: [Found-RL: foundation model-enhanced reinforcement learning for autonomous driving](https://www.arxiv.org/pdf/2602.10458)
	- 💻 Code & Usage: [https://github.com/ys-qu/found-rl](https://github.com/ys-qu/found-rl)
	- 📂 Dataset: [https://huggingface.co/datasets/ys-qu/found-rl_dataset](https://huggingface.co/datasets/ys-qu/found-rl_dataset)

	## 📦 Fine-tuning strategies

	1. RGB + Text (LoRA SFT):
	- Visual Input: Front-view RGB camera images (shape = 900 * 256).
	- Method: Used for LoRA (Low-Rank Adaptation) Supervised Fine-Tuning.
	- Purpose: To enable the VLM to understand visual scenes and follow driving instructions based on realistic camera feeds.

	2. Rendered BEV + Text (Full SFT):
	- Visual Input: Rendered Bird's Eye View (BEV) semantic maps (shape = 192 * 192).
	- Method: Used for Full Parameter Supervised Fine-Tuning.
	- Purpose: To provide a holistic spatial understanding of the driving environment, allowing the VLM to act as an expert.

	If you use these VLMs in your research, please cite our paper:
	```bibtex
	@misc{qu2026foundrl,
	title={Found-RL: foundation model-enhanced reinforcement learning for autonomous driving},
	author={Yansong Qu and Zihao Sheng and Zilin Huang and Jiancong Chen and Yuhao Luo and Tianyi Wang and Yiheng Feng and Samuel Labi and Sikai Chen},
	year={2026},
	eprint={2602.10458},
	archivePrefix={arXiv},
	primaryClass={cs.AI},
	url={https://arxiv.org/abs/2602.10458},
	}