anirudhb11/actor_800_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-0708a3322b Text Generation • 2B • Updated Oct 10, 2025
anirudhb11/critic_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-b-4abf6a944f Text Classification • 2B • Updated Oct 10, 2025 • 1
anirudhb11/actor_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-bc-fc33cd7856 Text Generation • 2B • Updated Oct 10, 2025
anirudhb11/critic_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-f5ff91747a Text Classification • 2B • Updated Oct 10, 2025 • 1
anirudhb11/actor_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-b-f6146a8bf2 Text Generation • 2B • Updated Oct 10, 2025
anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-37f5c11603 Text Classification • 2B • Updated Oct 10, 2025 • 1
anirudhb11/actor_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-b-5a024300d8 Text Generation • 2B • Updated Oct 10, 2025
anirudhb11/critic_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-b-653962b457 Text Classification • 2B • Updated Oct 10, 2025 • 1
anirudhb11/actor_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-bc-bf92b6b6e0 Text Generation • 2B • Updated Oct 10, 2025
anirudhb11/actor_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-f5e2da13ed Text Generation • 2B • Updated Oct 10, 2025
anirudhb11/critic_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-64d2f83fd4 Text Classification • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-c5605d88a0 Text Classification • 2B • Updated Oct 10, 2025 • 1
anirudhb11/actor_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-e7a87f612f 2B • Updated Oct 10, 2025
anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-c29b38b408 Text Classification • 2B • Updated Oct 10, 2025 • 1
anirudhb11/actor_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-064815e1ea 2B • Updated Oct 10, 2025
anirudhb11/actor_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-bcdaf22510 Text Generation • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-0a51487a3c Text Classification • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-ccb6db349d 2B • Updated Oct 10, 2025 • 1
anirudhb11/actor_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-b-95c376509c 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-1877926eb9 2B • Updated Oct 10, 2025 • 1
anirudhb11/actor_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-8a4d1d70ec 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-47c741dc57 Text Classification • 2B • Updated Oct 10, 2025 • 2
anirudhb11/actor_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-feedcc5b6b Text Generation • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-c789b03075 2B • Updated Oct 10, 2025 • 1
anirudhb11/actor_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-b-c6c2a35f9c Text Generation • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-40bddeea62 Text Classification • 2B • Updated Oct 10, 2025 • 2
anirudhb11/actor_200_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-4-13d6f2a9dc Text Generation • 2B • Updated Oct 10, 2025 • 3
anirudhb11/actor_400_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-4-6d75826b85 Text Generation • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_400_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-d5bd2dbecc Text Classification • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_600_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-9929bdd81f Text Classification • 2B • Updated Oct 10, 2025 • 2