anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_dapo_DeepSeek-R1-Distill-Qwen-1.5B_critic Updated Oct 26, 2025
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_dapo_DeepSeek-R1-Distill-Qwen-1.5B_actor Updated Oct 26, 2025
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_dapo_DeepSeek-R1-Distill-Qwen-1.5B_subset_2000_r3_critic Updated Oct 26, 2025
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_dapo_DeepSeek-R1-Distill-Qwen-1.5B_subset_2000_r3_actor Updated Oct 26, 2025
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_hendrycks_math_DeepSeek-R1-Distill-Qwen-1.5B_critic Updated Oct 26, 2025
anirudhb11/r1d-1.5b_deepscaler_longcot_8k_ppo_hendrycks_math_DeepSeek-R1-Distill-Qwen-1.5B_actor Updated Oct 26, 2025
anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-ae0cd033d2 Text Classification • 2B • Updated Oct 17, 2025
anirudhb11/actor_200_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-76f5638c9d Text Generation • 2B • Updated Oct 17, 2025 • 1
anirudhb11/critic_400_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-4bac9e133e Text Classification • 2B • Updated Oct 17, 2025
anirudhb11/actor_400_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-f767916602 Text Generation • 2B • Updated Oct 17, 2025 • 1
anirudhb11/critic_600_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-c0842d8e93 Text Classification • 2B • Updated Oct 17, 2025
anirudhb11/actor_600_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-dac4751ff5 Text Generation • 2B • Updated Oct 17, 2025 • 1
anirudhb11/critic_800_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-c889153a3b Text Classification • 2B • Updated Oct 17, 2025
anirudhb11/actor_800_ppo-run-math-training-prompt-len-800-response-len-4096-r3-actor-low-lr-0-c43c588fb5 Text Generation • 2B • Updated Oct 17, 2025 • 1
anirudhb11/critic_200_deepseek-r1-distil-1.5b-ppo-run-math-training-prompt-len-800-response-len-0e5f8c09dc Text Classification • 2B • Updated Oct 17, 2025
anirudhb11/actor_200_deepseek-r1-distil-1.5b-ppo-run-math-training-prompt-len-800-response-len-4-c9a5125638 Text Generation • 2B • Updated Oct 17, 2025 • 1
anirudhb11/actor_400_deepseek-r1-distil-1.5b-ppo-run-math-training-prompt-len-800-response-len-4-6bb01b020a Text Generation • 2B • Updated Oct 17, 2025 • 1
anirudhb11/critic_400_deepseek-r1-distil-1.5b-ppo-run-math-training-prompt-len-800-response-len-c49633e26e Text Classification • 2B • Updated Oct 17, 2025
anirudhb11/critic_600_deepseek-r1-distil-1.5b-ppo-run-math-training-prompt-len-800-response-len-07fa1b4078 Text Classification • 2B • Updated Oct 17, 2025
anirudhb11/actor_600_deepseek-r1-distil-1.5b-ppo-run-math-training-prompt-len-800-response-len-4-430dce8573 Text Generation • 2B • Updated Oct 17, 2025 • 1
anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-1000-c08ed8b533 Text Classification • 2B • Updated Oct 17, 2025
anirudhb11/actor_200_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-1000-84c23bc523 Text Generation • 2B • Updated Oct 17, 2025 • 1
anirudhb11/critic_400_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-1000-2d56bd1e02 Text Classification • 2B • Updated Oct 17, 2025
anirudhb11/actor_400_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-1000-a95299334f Text Generation • 2B • Updated Oct 17, 2025 • 1