anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_dapo_DeepSeek-R1-Distill-Qwen-1.5B_critic Updated Oct 26, 2025
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_dapo_DeepSeek-R1-Distill-Qwen-1.5B_actor Updated Oct 26, 2025
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_dapo_DeepSeek-R1-Distill-Qwen-1.5B_subset_2000_r3_critic Updated Oct 26, 2025
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_dapo_DeepSeek-R1-Distill-Qwen-1.5B_subset_2000_r3_actor Updated Oct 26, 2025
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_hendrycks_math_DeepSeek-R1-Distill-Qwen-1.5B_critic Updated Oct 26, 2025
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_hendrycks_math_DeepSeek-R1-Distill-Qwen-1.5B_actor Updated Oct 26, 2025
anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-ae0cd033d2 Text Classification • 2B • Updated Oct 17, 2025 • 2
anirudhb11/actor_200_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-76f5638c9d Text Generation • 2B • Updated Oct 17, 2025 • 1
anirudhb11/critic_400_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-4bac9e133e Text Classification • 2B • Updated Oct 17, 2025 • 2
anirudhb11/actor_400_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-f767916602 Text Generation • 2B • Updated Oct 17, 2025 • 1
anirudhb11/critic_600_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-c0842d8e93 Text Classification • 2B • Updated Oct 17, 2025 • 2