anirudhb11/critic_600_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-1000-f553c1779b Text Classification • 2B • Updated Oct 17, 2025
anirudhb11/actor_600_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-1000-fc1f43b3e3 Text Generation • 2B • Updated Oct 17, 2025 • 1
anirudhb11/critic_800_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-1000-36a3a03718 Text Classification • 2B • Updated Oct 17, 2025
anirudhb11/actor_800_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-1000-e793f972a0 Text Generation • 2B • Updated Oct 17, 2025 • 1
anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-r3-d17cb5def8 Text Classification • 2B • Updated Oct 17, 2025
anirudhb11/actor_200_ppo-run-math-training-prompt-len-800-response-len-4096-r3-e267a82a89 Text Generation • 2B • Updated Oct 17, 2025 • 1
anirudhb11/critic_400_ppo-run-math-training-prompt-len-800-response-len-4096-r3-9446afcb19 Text Classification • 2B • Updated Oct 17, 2025
anirudhb11/actor_400_ppo-run-math-training-prompt-len-800-response-len-4096-r3-ea48cbecfa Text Generation • 2B • Updated Oct 17, 2025 • 1
anirudhb11/actor_600_ppo-run-math-training-prompt-len-800-response-len-4096-r3-6e57c4766f Text Generation • 2B • Updated Oct 17, 2025 • 1
anirudhb11/critic_600_ppo-run-math-training-prompt-len-800-response-len-4096-r3-0b5154ab3e Text Classification • 2B • Updated Oct 17, 2025
anirudhb11/critic_800_ppo-run-math-training-prompt-len-800-response-len-4096-r3-71947f583d Text Classification • 2B • Updated Oct 17, 2025
anirudhb11/actor_800_ppo-run-math-training-prompt-len-800-response-len-4096-r3-c814c51176 Text Generation • 2B • Updated Oct 17, 2025 • 1