·
AI & ML interests
Reinforcement Learning
Organizations
Viewer
• Updated • 42.3k • 4
Viewer
• Updated • 805 • 470
zkshan2002/ultrafeedback_binarized
Viewer
• Updated • 63.1k • 9
zkshan2002/simple_rl_level1to4
Viewer
• Updated • 8.64k • 3
zkshan2002/simple_rl_level3to5
Viewer
• Updated • 9.02k • 3
Viewer
• Updated • 5.91k • 3
Viewer
• Updated • 5.58k • 71
Viewer
• Updated • 456k • 18
Viewer
• Updated • 40.3k • 4
Viewer
• Updated • 6.03k • 38
Viewer
• Updated • 455k • 11
Viewer
• Updated • 40.3k • 11
zkshan2002/prime_math_legacy
Viewer
• Updated • 456k • 5
zkshan2002/numia10k_gen0-r1d7b
Viewer
• Updated • 10.2k • 9
zkshan2002/numia10k_gen0.75-r1d7b
Viewer
• Updated • 10.2k • 4
zkshan2002/numia10k_sft-32b
Viewer
• Updated • 2.13k • 31
zkshan2002/numia10k_sft-r1d32b
Viewer
• Updated • 6.1k • 4
zkshan2002/numia10k_gen-32b
Viewer
• Updated • 10.2k • 7
zkshan2002/numia10k_gen-r1d32b
Viewer
• Updated • 10.2k • 5
zkshan2002/numia_math_train-10k
Viewer
• Updated • 10.2k • 11
Viewer
• Updated • 198 • 238
zkshan2002/olympiad_bench
Viewer
• Updated • 675 • 8
Viewer
• Updated • 272 • 5
Viewer
• Updated • 500 • 9
Viewer
• Updated • 40 • 9
Viewer
• Updated • 30 • 10
Viewer
• Updated • 30 • 3
Viewer
• Updated • 13.5k • 15
zkshan2002/orz_extended-72k
Viewer
• Updated • 72.4k • 14
Viewer
• Updated • 56.9k • 9