xw1234gan/cnk12_GRPO_KL_Qwen2.5-1.5B-Instruct_beta0.01_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 2B • Updated Apr 23 • 215 •
xw1234gan/Merging_Prob_Qwen2.5-7B-Instruct_MATH_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 8B • Updated Apr 21 • 5 •
xw1234gan/SMOKE_Merging_Prob_Qwen2.5-7B-Instruct_MATH_lr1e-05_mb2_ga4_n16_seed42 Text Generation • 8B • Updated Apr 20 • 5 •