PPO Learning to Rank - a thomasbllx Collection

thomasbllx 's Collections

CREDENCE

PPO Learning to Rank

updated Jun 2

Reward models and five-level graded explanation-ranking datasets for PPO Learning to Rank experiments.

Upvote

thomasbllx/ppo-ltr-run_approxndcg_seed123

Text Ranking • 4B • Updated Jun 2
thomasbllx/ppo-ltr-run_approxndcg_seed42

Text Ranking • 4B • Updated Jun 2
thomasbllx/ppo-ltr-run_approxndcg_seed7

Text Ranking • 4B • Updated Jun 2
thomasbllx/ppo-ltr-run_listnet_seed123

Text Ranking • 4B • Updated Jun 2 • 1
thomasbllx/ppo-ltr-run_listnet_seed42

Text Ranking • 4B • Updated Jun 2 • 1
thomasbllx/ppo-ltr-run_listnet_seed7

Text Ranking • 4B • Updated Jun 2
thomasbllx/ppo-ltr-run_mse_seed123

Text Ranking • 4B • Updated Jun 2 • 1
thomasbllx/ppo-ltr-run_mse_seed42

Text Ranking • 4B • Updated Jun 2
thomasbllx/ppo-ltr-run_mse_seed7

Text Ranking • 4B • Updated Jun 2
thomasbllx/ppo-ltr-run_ranknet_seed123

Text Ranking • 4B • Updated Jun 2
thomasbllx/ppo-ltr-run_ranknet_seed42

Text Ranking • 4B • Updated Jun 2
thomasbllx/ppo-ltr-run_ranknet_seed7

Text Ranking • 4B • Updated Jun 2
thomasbllx/ppo-ltr-comprehensive-ranking-dataset

Viewer • Updated Jun 1 • 149k • 19
thomasbllx/ppo-ltr-graded-esnli

Viewer • Updated Jun 1 • 16.2k • 18
thomasbllx/ppo-ltr-graded-delta-nli

Viewer • Updated Jun 1 • 700 • 17
thomasbllx/ppo-ltr-graded-delta-nli-heuristic

Viewer • Updated Jun 1 • 700 • 17
thomasbllx/ppo-ltr-graded-delta-nli-llm

Viewer • Updated Jun 1 • 140 • 61
thomasbllx/ppo-ltr-graded-delta-nli-overlap

Viewer • Updated Jun 1 • 600 • 18
thomasbllx/ppo-ltr-graded-multi-nli-overlap

Viewer • Updated Jun 1 • 700 • 34
thomasbllx/ppo-ltr-graded-snli-overlap

Viewer • Updated Jun 1 • 697 • 31

Upvote

Collection guide
Browse collections