thomasbllx 's Collections

PPO Learning to Rank

Reward models and five-level graded explanation-ranking datasets for PPO Learning to Rank experiments.