anirudhb11/actor_600_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-4-90d8028a03 Text Generation • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_800_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-8de2febea2 Text Classification • 2B • Updated Oct 10, 2025 • 1
anirudhb11/actor_800_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-250-4-06baa5364e Text Generation • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-eaac83002c Text Classification • 2B • Updated Oct 10, 2025 • 1
anirudhb11/actor_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-f3e1037c79 Text Generation • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-858e2b46a2 Text Classification • 2B • Updated Oct 10, 2025 • 2
anirudhb11/actor_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-84886564c3 Text Generation • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-093f62d880 Text Classification • 2B • Updated Oct 10, 2025 • 1
anirudhb11/actor_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-bd5e2df6b8 Text Generation • 2B • Updated Oct 10, 2025 • 1
anirudhb11/actor_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-f88521a6d0 Text Generation • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-128-a7c942368f Text Classification • 2B • Updated Oct 10, 2025 • 2
anirudhb11/actor_1200_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatur-d0bd85e83a Text Generation • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_1200_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatu-6aa1e360d1 Text Classification • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatur-9fe16df365 Text Classification • 2B • Updated Oct 10, 2025 • 2
anirudhb11/actor_450_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperature-c2ba73201b Text Generation • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatur-95d37aee1a Text Classification • 2B • Updated Oct 10, 2025 • 2
anirudhb11/actor_200_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperature-a568741859 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_400_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatur-df26720fa9 Text Classification • 2B • Updated Oct 10, 2025 • 2
anirudhb11/actor_400_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperature-796698f19e 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_600_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatur-6ae14d0a13 Text Classification • 2B • Updated Oct 10, 2025 • 2
anirudhb11/actor_600_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperature-ef9e6060c9 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_800_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatur-e2beb64b4d Text Classification • 2B • Updated Oct 10, 2025 • 2
anirudhb11/actor_800_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperature-73384f366f Text Generation • 2B • Updated Oct 10, 2025 • 1
anirudhb11/critic_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-a-9a44e3cd58 Text Classification • 2B • Updated Oct 9, 2025 • 2
anirudhb11/actor_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-ac-4bf36a0fbe Text Generation • 2B • Updated Oct 9, 2025 • 1
anirudhb11/critic_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-96479447bb Text Classification • 2B • Updated Oct 9, 2025 • 2
anirudhb11/actor_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-994dbb42b4 Text Generation • 2B • Updated Oct 9, 2025 • 1
anirudhb11/critic_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-f2bcc8637e Text Classification • 2B • Updated Oct 9, 2025 • 2
anirudhb11/actor_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-5464c01ede Text Generation • 2B • Updated Oct 9, 2025 • 1
anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-b68c4eafde Text Classification • 2B • Updated Oct 9, 2025 • 2