xw1234gan/cnk12_GRPO_KL_Qwen2.5-7B-Instruct_beta0_lr1e-05_mb2_ga128_n2048_seed42_NoKL Text Generation • 8B • Updated 6 days ago • 24
xw1234gan/cnk12_GRPO_KL_Qwen2.5-3B-Instruct_beta0_lr1e-05_mb2_ga128_n2048_seed42_NoKL Text Generation • 3B • Updated 7 days ago • 18
xw1234gan/cnk12_GRPO_KL_Qwen2.5-1.5B-Instruct_beta0_lr1e-05_mb2_ga128_n2048_seed42_NoKL Text Generation • 2B • Updated 8 days ago • 25
xw1234gan/GRPO_KL_Qwen2.5-7B-Instruct_MATH_beta0_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN_NoKL Text Generation • 8B • Updated 9 days ago • 26
xw1234gan/Extended_GRPO_KL_Qwen2.5-3B-Instruct_MATH_beta0_lr1e-05_mb2_ga128_n2048_seed42_NoKL Text Generation • 3B • Updated 10 days ago • 26
xw1234gan/Extended_GRPO_KL_Qwen2.5-1.5B-Instruct_MATH_beta0_lr1e-05_mb2_ga128_n2048_seed42_NoKL Text Generation • 2B • Updated 11 days ago • 29
xw1234gan/Merging_Qwen2.5-7B-Instruct_MMLU_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 8B • Updated 29 days ago • 27
xw1234gan/Fixed_Merging_Qwen2.5-7B-Instruct_MMLU_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 8B • Updated about 1 month ago • 8
xw1234gan/GRPO_KL_Qwen2.5-7B-Instruct_MMLU_beta0.01_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 8B • Updated about 1 month ago • 6
xw1234gan/cnk12_GRPO_KL_Qwen2.5-7B-Instruct_beta0.01_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 8B • Updated May 6 • 7
xw1234gan/olympiads_Adaptive_Merging_Qwen2.5-1.5B-Instruct_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 2B • Updated May 4 • 4