xw1234gan/GRPO_KL_Qwen2.5-7B-Instruct_MATH_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN Text Generation • 8B • Updated Apr 19 • 45 •
xw1234gan/SMOKE_GRPO_KL_Qwen2.5-7B-Instruct_MATH_beta0_lr1e-05_mb2_ga4_n16_seed42_HF_GEN Text Generation • 8B • Updated Apr 18 • 5 •
xw1234gan/GRPO_KL_Qwen2.5-3B-Instruct_MATH_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN Text Generation • 3B • Updated Apr 16 • 41 •
xw1234gan/GRPO_KL_Qwen2.5-1.5B-Instruct_MedQA_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN Text Generation • 2B • Updated Apr 16 • 38 •
xw1234gan/GRPO_KL_Qwen2.5-3B-Instruct_MedQA_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN Text Generation • 3B • Updated Apr 16 • 2 •
xw1234gan/GRPO_KL_Qwen2.5-3B-Instruct_MMLU_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN Text Generation • 3B • Updated Apr 16 • 9 •
xw1234gan/GRPO_KL_Qwen2.5-1.5B-Instruct_MMLU_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN Text Generation • 2B • Updated Apr 16 • 42 •
xw1234gan/GRPO_KL_Qwen2.5-1.5B-Instruct_MATH_beta0.01_lr1e-05_mb2_ga128_n2048_seed42_HF_GEN Text Generation • 2B • Updated Apr 15 • 5 •
xw1234gan/Extended_GRPO_KL_Qwen2.5-3B-Instruct_MATH_beta0_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 3B • Updated Apr 6 • 7 •
xw1234gan/Extended_GRPO_KL_Qwen2.5-3B-Instruct_MATH_beta0.01_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 3B • Updated Apr 4 • 6 •
xw1234gan/Extended_Merging_Prob_Qwen2.5-3B-Instruct_MATH_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 3B • Updated Apr 1 • 3 •
xw1234gan/Extended_Merging_Qwen2.5-3B-Instruct_MATH_lr1e-05_mb2_ga128_n2048_seed42 Text Generation • 3B • Updated Mar 29 • 2 •