Zikang Shan PRO
zkshan2002
·
AI & ML interests
Reinforcement Learning
Organizations
models 0
None public yet
datasets 31
zkshan2002/rstar2a
Viewer • Updated • 42.3k • 7
zkshan2002/alpaca_eval
Viewer • Updated • 805 • 471
zkshan2002/ultrafeedback_binarized
Viewer • Updated • 63.1k • 9
zkshan2002/simple_rl_level1to4
Viewer • Updated • 8.64k • 6
zkshan2002/simple_rl_level3to5
Viewer • Updated • 9.02k • 7
zkshan2002/dr_sft
Viewer • Updated • 5.91k • 5
zkshan2002/prime_sft
Viewer • Updated • 5.58k • 72
zkshan2002/prime_full
Viewer • Updated • 456k • 19
zkshan2002/dr_full
Viewer • Updated • 40.3k • 4
zkshan2002/dr_sft_legacy
Viewer • Updated • 6.03k • 39