AI & ML interests
Reinforcement Learning
Organizations
kaiwenw/nov11_oasst_pref_jdpo_gpt4o_3_judges
Viewer
• Updated • 14.7k • 3
kaiwenw/nov11_oasst_pref_jdpo_llama70b_cot
Viewer
• Updated • 2.68k • 2
kaiwenw/nov11_oasst_pref_jdpo_llama70b_cot_11_judges
Viewer
• Updated • 14.7k • 2
kaiwenw/nov11_oasst_mini_pref_jdpo_llama8b_cot
Viewer
• Updated • 525 • 3
kaiwenw/nov11_oasst_mini_pref_jdpo_llama8b_cot_8_judges
Viewer
• Updated • 790 • 3
kaiwenw/oasst_pref_jdpo_llama70b_cot
Viewer
• Updated • 3.35k • 5
kaiwenw/oasst_pref_jdpo_llama70b_cot_12_judges
Viewer
• Updated • 14.7k • 3
kaiwenw/oasst_pref_jdpo_llama8b_cot_Meta-Llama-3.1-8B-Instruct_5_judges
Viewer
• Updated • 14.7k • 3
kaiwenw/oasst_mini_pref_jdpo_llama70b_cot_Meta-Llama-3.1-70B-Instruct_3_judges
Viewer
• Updated • 80 • 3
kaiwenw/nov6_oasst_jdpo_llama70b
Viewer
• Updated • 10.6k • 2
kaiwenw/oasst_Meta-Llama-3.1-70B-Instruct_3_judges
Viewer
• Updated • 7.37k • 3
kaiwenw/nov6_oasst_jdpo_llama8b
Viewer
• Updated • 11.2k • 3
kaiwenw/oasst_Meta-Llama-3.1-8B-Instruct_3_judges
Viewer
• Updated • 7.37k • 3
kaiwenw/nov5_sp1_jdpo_gap_0.25
Viewer
• Updated • 6.68k • 3
kaiwenw/nov5_sp1_oct31_oasst_llama70b_jft_3_judges
Viewer
• Updated • 3.64k • 4
kaiwenw/nov6_oasst_mini_jdpo_llama8b_unflatten
Viewer
• Updated • 25 • 2
kaiwenw/nov6_oasst_mini_jdpo_llama8b
Viewer
• Updated • 50 • 3
kaiwenw/oasst_mini_Meta-Llama-3.1-8B-Instruct_3_judges
Viewer
• Updated • 40 • 3
kaiwenw/nov6_oasst_mini_jdpo_llama70b_unflatten
Viewer
• Updated • 14 • 2
kaiwenw/nov6_oasst_mini_jdpo_llama70b
Viewer
• Updated • 28 • 3
kaiwenw/nov5_sp1_jft_gap_0.25
Viewer
• Updated • 1.91k • 2
Viewer
• Updated • 3.64k • 3
kaiwenw/nov2_aft_gpt4o_1.1
Viewer
• Updated • 3.59k • 5
kaiwenw/nov2_aft_gpt4o_1.0
Viewer
• Updated • 3.38k • 3
kaiwenw/nov2_aft_gpt4o_0.9
Viewer
• Updated • 3.05k • 7
kaiwenw/nov2_aft_llama70b_1.1
Viewer
• Updated • 3.63k • 6
kaiwenw/nov2_aft_llama70b_1.0
Viewer
• Updated • 3.5k • 3
kaiwenw/nov2_aft_llama70b_0.9
Viewer
• Updated • 3.37k • 3
Viewer
• Updated • 200 • 3
Viewer
• Updated • 3k • 3