hdong0/deepseek-Llama-8B-Open-R1-GRPO_deepscaler_acc_mu_8_constant_lr Text Generation • 8B • Updated Jul 9, 2025 • 2
hdong0/Qwen2.5-Math-1.5B-untied-Open-R1-GRPO_deepscaler_mu_8_constant_lr Text Generation • 2B • Updated Aug 5, 2025 • 2
hdong0/Qwen2.5-Math-1.5B-GRPO_deepscaler_temp1_prompt1 Text Generation • 2B • Updated Aug 7, 2025 • 2
hdong0/deepseek-Qwen2.5-1.5B-GRPO_deepscaler_temp1_prompt1 Text Generation • 2B • Updated Aug 7, 2025 • 2
hdong0/deepseek-Qwen2.5-7B-baseline-thin-Open-R1-GRPO_deepscaler_acc_mu_8_constant_lr Text Generation • 8B • Updated Aug 10, 2025 • 2
hdong0/deepseek-Llama-8B-baseline-Open-R1-GRPO_deepscaler_acc_mu_8_constant_lr Text Generation • 8B • Updated Aug 13, 2025 • 2
hdong0/deepseek-Qwen2.5-1.5B-baseline-thin-Open-R1-GRPO_deepscaler_mu_8_constant_lr Text Generation • 2B • Updated Aug 17, 2025 • 2
hdong0/deepseek-Qwen-1.5B-baseline-thin-Open-R1-GRPO_deepscaler_mu_8_constant_lr_warmed Text Generation • 2B • Updated Aug 19, 2025 • 2
hdong0/deepseek-Llama-8B-Open-R1-GRPO_deepscaler_acc_mu_8_constant_lr_no_kl Text Generation • 8B • Updated Aug 20, 2025 • 2
hdong0/deepseek-Qwen-1.5B-Open-R1-GRPO_deepscaler_acc_8196 Text Generation • 2B • Updated Oct 1, 2025 • 2
hdong0/deepseek-Qwen-1.5B-Open-R1-GRPO_deepscaler_acc_16384 Text Generation • 2B • Updated Oct 1, 2025 • 2