xw1234gan/GRPO_KL_Qwen2.5-3B-Instruct_MATH_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN Text Generation • 3B • Updated 1 day ago • 530
xw1234gan/GRPO_KL_Qwen2.5-1.5B-Instruct_MedQA_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN Text Generation • 2B • Updated 1 day ago • 429
xw1234gan/GRPO_KL_Qwen2.5-3B-Instruct_MedQA_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN Text Generation • 3B • Updated 1 day ago • 200
xw1234gan/GRPO_KL_Qwen2.5-3B-Instruct_MMLU_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN Text Generation • 3B • Updated 1 day ago • 265
xw1234gan/GRPO_KL_Qwen2.5-1.5B-Instruct_MMLU_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN Text Generation • 2B • Updated 1 day ago • 697
xw1234gan/GRPO_KL_Qwen2.5-1.5B-Instruct_MATH_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN Text Generation • 2B • Updated 2 days ago • 175
xw1234gan/Extended_GRPO_KL_Qwen2.5-3B-Instruct_MATH_beta0_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 3B • Updated 11 days ago • 662
xw1234gan/Extended_GRPO_KL_Qwen2.5-3B-Instruct_MATH_beta0.01_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 3B • Updated 13 days ago • 784
xw1234gan/Extended_Merging_Prob_Qwen2.5-3B-Instruct_MATH_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 3B • Updated 16 days ago • 330
xw1234gan/Extended_Merging_Qwen2.5-3B-Instruct_MATH_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 3B • Updated 19 days ago • 251