daviddavidlu/PrAg-PO-DeepSeek-R1-Distill-Qwen-1.5B-step1100 Text Generation • 2B • Updated May 12 • 2
daviddavidlu/PrAg-PO-DeepSeek-R1-Distill-Qwen-1.5B-step1160 Text Generation • 2B • Updated May 12 • 2
daviddavidlu/PrAg-PO-DeepSeek-R1-Distill-Qwen-1.5B-step1160 Text Generation • 2B • Updated May 12 • 2
daviddavidlu/PrAg-PO-DeepSeek-R1-Distill-Qwen-1.5B-step1100 Text Generation • 2B • Updated May 12 • 2
Prompt Augmentation Scales up GRPO Training on Mathematical Reasoning Paper • 2602.03190 • Published Feb 3 • 1