anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-91a081ef96 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_200_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-3dac955361 Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_400_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-f2534c60d3 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_400_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-1a3b6680b7 Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_600_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-bf94641057 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_600_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-cf0119c71c Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_800_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-991ca4d859 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_800_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-0708a3322b Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-b-4abf6a944f Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-bc-fc33cd7856 Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-f5ff91747a Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-b-f6146a8bf2 Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-37f5c11603 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-b-5a024300d8 Text Generation • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-b-653962b457 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-bc-bf92b6b6e0 Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/actor_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-f5e2da13ed Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-64d2f83fd4 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/critic_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-c5605d88a0 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-e7a87f612f Text Generation • 2B • Updated Oct 10, 2025 • 2