AI & ML interests
Reinforcement Learning
Organizations
kaiwenw/open_r1_apr9_round1_combined_balanced
Viewer
• Updated • 49.4k • 9
kaiwenw/open_r1_apr9_round1_combined_random
Viewer
• Updated • 49.4k • 8
kaiwenw/open_r1_apr9_DeepSeek_R1_Distill_Qwen_32B_tokenized
Viewer
• Updated • 49.4k • 11
kaiwenw/open_r1_apr9_DeepSeek_R1_Distill_Qwen_14B_tokenized
Viewer
• Updated • 49.4k • 6
kaiwenw/open_r1_apr9_DeepSeek_R1_Distill_Qwen_7B_tokenized
Viewer
• Updated • 49.4k • 9
kaiwenw/open_r1_apr9_DeepSeek_R1_Distill_Qwen_1.5B_tokenized
Viewer
• Updated • 49.4k • 13
Viewer
• Updated • 49.4k • 5
kaiwenw/combine_1.5B_7B_and_32B
Viewer
• Updated • 49.5k • 10
kaiwenw/combine_1.5B_and_blockwise
Viewer
• Updated • 49.5k • 18
kaiwenw/open_r1_mar2_DeepSeek_R1_Distill_Qwen_1.5B_tokenized
Viewer
• Updated • 49.5k • 7
kaiwenw/open_r1_mar2_DeepSeek_R1_Distill_Qwen_32B_tokenized
Viewer
• Updated • 49.5k • 10
kaiwenw/open_r1_mar2_mar20_1.5b_n_4_nl_8_tokenized
Viewer
• Updated • 49.5k • 8
kaiwenw/open_r1_mar2_DeepSeek_R1_Distill_Qwen_7B_tokenized
Viewer
• Updated • 49.5k • 6
kaiwenw/open_r1_mar2_round_1_tokenized
Viewer
• Updated • 49.5k • 11
kaiwenw/open_r1_mar2_round_1
Viewer
• Updated • 45.3k • 11
Viewer
• Updated • 49.5k • 6
Viewer
• Updated • 58.1k • 16
Viewer
• Updated • 1k • 8
kaiwenw/aft_after_jaft_test
Viewer
• Updated • 1.41k • 8
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_75_chosen_25_reject
Viewer
• Updated • 14.1k • 7
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_25_chosen_75_reject
Viewer
• Updated • 18.6k • 9
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_50_chosen_50_reject
Viewer
• Updated • 37.9k • 5
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_all_reject_first
Viewer
• Updated • 26.7k • 8
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_all_chosen_first
Viewer
• Updated • 20.1k • 6
kaiwenw/dec9_sp1_repeat_5_pref_jdpo
Viewer
• Updated • 44.5k • 7
kaiwenw/dec9_sp1_repeat_5_pref_jdpo_n_7_temp_0.9
Viewer
• Updated • 36.4k • 7
kaiwenw/dec9_sp1_repeat_5
Viewer
• Updated • 18.2k • 6
kaiwenw/dec9_sp1_pref_jdpo_75_chosen_25_reject
Viewer
• Updated • 2.39k • 6
kaiwenw/dec9_sp1_pref_jdpo_25_chosen_75_reject
Viewer
• Updated • 3.39k • 7
kaiwenw/dec9_sp1_pref_jdpo_50_chosen_50_reject
Viewer
• Updated • 6.4k • 5