anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatur-9fe16df365 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_450_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperature-c2ba73201b Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatur-95d37aee1a Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_200_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperature-a568741859 Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_400_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatur-df26720fa9 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_400_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperature-796698f19e Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_600_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatur-6ae14d0a13 Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_600_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperature-ef9e6060c9 Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_800_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperatur-e2beb64b4d Text Classification • 2B • Updated Oct 10, 2025
anirudhb11/actor_800_ppo-run-math-training-prompt-len-800-response-len-4096-bce-loss-temperature-73384f366f Text Generation • 2B • Updated Oct 10, 2025 • 2
anirudhb11/critic_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-a-9a44e3cd58 Text Classification • 2B • Updated Oct 9, 2025
anirudhb11/actor_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-ac-4bf36a0fbe Text Generation • 2B • Updated Oct 9, 2025 • 2
anirudhb11/critic_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-96479447bb Text Classification • 2B • Updated Oct 9, 2025
anirudhb11/actor_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-994dbb42b4 Text Generation • 2B • Updated Oct 9, 2025 • 2
anirudhb11/critic_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-f2bcc8637e Text Classification • 2B • Updated Oct 9, 2025
anirudhb11/actor_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-5464c01ede Text Generation • 2B • Updated Oct 9, 2025 • 2
anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-b68c4eafde Text Classification • 2B • Updated Oct 9, 2025
anirudhb11/actor_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-d811ccc173 Text Generation • 2B • Updated Oct 9, 2025 • 2
anirudhb11/critic_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-b-4e11a85372 Text Classification • 2B • Updated Oct 9, 2025
anirudhb11/actor_50_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-bc-0fc3eba881 Text Generation • 2B • Updated Oct 9, 2025 • 2
anirudhb11/critic_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-c4b41565c8 Text Classification • 2B • Updated Oct 9, 2025
anirudhb11/actor_250_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-b-f72dece0c0 Text Generation • 2B • Updated Oct 9, 2025 • 2
anirudhb11/actor_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-b-73b15e5d59 Text Generation • 2B • Updated Oct 9, 2025 • 2
anirudhb11/critic_450_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-629dffdb6a Text Classification • 2B • Updated Oct 9, 2025
anirudhb11/critic_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-b-7ac0757c94 Text Classification • 2B • Updated Oct 9, 2025
anirudhb11/actor_16_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-500-bc-4a220c1e9c Text Generation • 2B • Updated Oct 9, 2025 • 2
anirudhb11/critic_200_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-1000-b580379099 Text Classification • 2B • Updated Oct 9, 2025
anirudhb11/actor_200_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-1000-c1347692ed Text Generation • 2B • Updated Oct 9, 2025 • 2
anirudhb11/critic_400_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-1000-6314c2edc2 Text Classification • 2B • Updated Oct 9, 2025
anirudhb11/actor_400_ppo-run-math-training-prompt-len-800-response-len-4096-seed-43-subset-1000-994489ef46 Text Generation • 2B • Updated Oct 9, 2025 • 2