anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-c29b38b408 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-064815e1ea Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/actor_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-bcdaf22510 Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-0a51487a3c Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/critic_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-ccb6db349d Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-b-95c376509c Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-1877926eb9 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-8a4d1d70ec Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-47c741dc57 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-feedcc5b6b Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-c789b03075 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-b-c6c2a35f9c Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-40bddeea62 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_200_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-4-13d6f2a9dc Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/actor_400_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-4-6d75826b85 Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_400_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-d5bd2dbecc Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/critic_600_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-9929bdd81f Text Classification • 2B • Updated Oct 10, 2025 • 1
anirudhb11/actor_600_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-4-90d8028a03 Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_800_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-8de2febea2 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_800_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-4-06baa5364e Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-eaac83002c Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-f3e1037c79 Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-858e2b46a2 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-84886564c3 Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-093f62d880 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-bd5e2df6b8 Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/actor_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-f88521a6d0 Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-a7c942368f Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_1200_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatur-d0bd85e83a Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_1200_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatu-6aa1e360d1 Text Classification • 2B • Updated Oct 10, 2025