anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-91a081ef96 Text Classification • 2B • Updated Oct 10, 2025 • 2
anirudhb11/actor_200_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-3dac955361 Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_400_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-f2534c60d3 Text Classification • 2B • Updated Oct 10, 2025 • 2
anirudhb11/actor_400_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-1a3b6680b7 Text Generation • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_600_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-bf94641057 Text Classification • 2B • Updated Oct 10, 2025 • 2
anirudhb11/actor_600_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-cf0119c71c Text Generation • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_800_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-5000-991ca4d859 Text Classification • 2B • Updated Oct 10, 2025 • 1