AI & ML interests
None yet
Organizations
None yet
aq1048576/openrlhf-sweep-50k-1m
312k • Updated aq1048576/openrlhf-sweep-50k-500k
312k • Updated • 1
aq1048576/openrlhf-sweep-50k-250k
312k • Updated • 1
aq1048576/openrlhf-100k-sweep-20k-queries-4b
199k • Updated • 1
aq1048576/openrlhf-100k-sweep-40k-queries-4b
199k • Updated aq1048576/openrlhf-100k-sweep-80k-queries-4b
199k • Updated aq1048576/openrlhf-100k-sweep-160k-queries-qwen3-8b
Updated
aq1048576/openrlhf-250k-queries-2m-pairs-qwen3-8b
Updated
aq1048576/openrlhf-100k-sweep-160k-queries-4b
199k • Updated aq1048576/openrlhf-500k-queries-1m-pairs-8b
312k • Updated • 1
aq1048576/openrlhf-reward-model-1250-queries
312k • Updated • 1
aq1048576/openrlhf-reward-model-2500-queries
312k • Updated • 1
aq1048576/openrlhf-reward-model-5k-queries
312k • Updated • 1
aq1048576/openrlhf-reward-model-10k-queries
312k • Updated • 1
aq1048576/openrlhf-250k-queries-1m-pairs-qwen3-8b
312k • Updated 4B • Updated • 6
4B • Updated • 3
4B • Updated • 4
aq1048576/rm_sweep_160k_half
Text Classification
• 4B • Updated aq1048576/rm_sweep_160k_with_c
Text Classification
• 4B • Updated aq1048576/rm_sweep_40k_4epochs
Text Classification
• 4B • Updated aq1048576/rm_sweep_40k_5epochs
4B • Updated aq1048576/rm_sweep_80k_3epochs
Updated
aq1048576/rm_sweep_40k_3epochs
Text Classification
• 4B • Updated aq1048576/rm_sweep_ranking_permutation_pairs
Text Classification
• 4B • Updated aq1048576/rm_sweep_80k_2epochs
Text Classification
• 4B • Updated aq1048576/rm_sweep_40k_2epochs
Text Classification
• 4B • Updated aq1048576/rm_sweep_ranking_sequential_pairs
Text Classification
• 4B • Updated aq1048576/rm_sweep_160k_distinct
Text Classification
• 4B • Updated