DenseRewardRLHF-PPO Collection This repository contains the released models for our paper Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model. • 18 items • Updated Jan 11, 2025 • 1
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model Paper • 2501.02790 • Published Jan 6, 2025 • 8