ajagota71/pythia-410m-detox-irl-rlhf-seed-42 Reinforcement Learning • 0.4B • Updated May 11, 2025 • 1
ajagota71/pythia-70m-detox-irl-rlhf-seed-200 Reinforcement Learning • 70.4M • Updated May 11, 2025 • 1
ajagota71/pythia-70m-detox-irl-rlhf-seed-42 Reinforcement Learning • 70.4M • Updated May 11, 2025 • 1
ajagota71/pythia-70m-detox-raw-logits-test2 Reinforcement Learning • 70.4M • Updated May 11, 2025 • 1
ajagota71/pythia-70m-detox-irl-rlhf-test-facebook-filter Reinforcement Learning • 70.4M • Updated May 11, 2025 • 1
ajagota71/toxicity-reward-model-max-margin-seed-400-pythia-1b-checkpoint-70 1B • Updated May 10, 2025
ajagota71/toxicity-reward-model-max-margin-seed-300-pythia-1b-checkpoint-70 1B • Updated May 10, 2025
ajagota71/toxicity-reward-model-max-margin-seed-200-pythia-1b-checkpoint-70 1B • Updated May 10, 2025
ajagota71/toxicity-reward-model-max-margin-seed-100-pythia-1b-checkpoint-70 1B • Updated May 10, 2025
ajagota71/toxicity-reward-model-max-margin-seed-400-pythia-410m-checkpoint-70 0.4B • Updated May 10, 2025 • 1
ajagota71/toxicity-reward-model-max-margin-seed-300-pythia-410m-checkpoint-70 0.4B • Updated May 10, 2025
ajagota71/toxicity-reward-model-max-margin-seed-200-pythia-410m-checkpoint-70 0.4B • Updated May 10, 2025
ajagota71/toxicity-reward-model-max-margin-seed-100-pythia-410m-checkpoint-70 0.4B • Updated May 9, 2025
ajagota71/toxicity-reward-model-max-margin-seed-42-pythia-410m-checkpoint-70 0.4B • Updated May 9, 2025
ajagota71/toxicity-reward-model-max-margin-seed-400-pythia-160m-checkpoint-50 0.2B • Updated May 9, 2025
ajagota71/toxicity-reward-model-max-margin-seed-300-pythia-160m-checkpoint-50 0.2B • Updated May 9, 2025
ajagota71/toxicity-reward-model-max-margin-seed-200-pythia-160m-checkpoint-50 0.2B • Updated May 9, 2025