xw1234gan/Fixed_Merging_Qwen2.5-3B-Instruct_MedQA_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 3B • Updated Mar 16 • 3 •
xw1234gan/GRPO_KL_Qwen2.5-3B-Instruct_MedQA_beta0.01_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 3B • Updated Mar 16 • 8 •
xw1234gan/Merging_Qwen2.5-1.5B-Instruct_MedQA_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 2B • Updated Mar 16 • 4 •
xw1234gan/GRPO_KL_Qwen2.5-1.5B-Instruct_MedQA_beta0.01_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 2B • Updated Mar 15 • 3
xw1234gan/Merging_Qwen2.5-7B-Instruct_MATH_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 8B • Updated Mar 11 • 3
xw1234gan/GRPO_KL_Qwen2.5-3B-Instruct_MATH_beta0.01_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 3B • Updated Mar 9
xw1234gan/Merging_Qwen2.5-3B-Instruct_MATH_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 3B • Updated Mar 8
xw1234gan/GRPO_KL_Qwen2.5-1.5B-Instruct_MATH_beta0.01_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 2B • Updated Mar 6