RLFR
Collection
Extending Reinforcement Learning for LLMs with Flow Environment • 5 items • Updated • 3
RLFR-Qwen2.5-VL-7B-Instruct is trained from Qwen2.5-VL-7B-Instruct with the RLFR framework, which introduces the flow reward derived from latent space, extending RLVR with latent reward utilization.
If you find our work helpful, feel free to give us a citation.
@article{zhang2025rlfr,
title={RLFR: Extending Reinforcement Learning for LLMs with Flow Environment},
author={Zhang, Jinghao and Zheng, Naishan and Li, Ruilin and Cheng, Dongzhou and Liang, Zheming and Zhao, Feng and Wang, Jiaqi},
journal={arXiv preprint arXiv:2510.10201},
year={2025}
}