ajagota71/toxicity-reward-model-v-head-prompt-output-max-margin-seed-200-pythia-70m 70.4M • Updated Jul 29, 2025
ajagota71/toxicity-reward-model-v-head-prompt-output-max-margin-seed-200-pythia-70m-checkpoint-30 70.4M • Updated Jul 29, 2025
ajagota71/toxicity-reward-model-v-head-prompt-output-max-margin-seed-100-pythia-70m 70.4M • Updated Jul 29, 2025
ajagota71/toxicity-reward-model-v-head-prompt-output-max-margin-seed-100-pythia-70m-checkpoint-30 70.4M • Updated Jul 29, 2025
ajagota71/toxicity-reward-model-v-head-prompt-output-max-margin-seed-42-pythia-70m-checkpoint-30 70.4M • Updated Jul 29, 2025
ajagota71/tox-RM-max-m-epoch-100-s-nlp-tox-p9-999-llama-3.2-1b-checkpoint-70 1B • Updated Jul 26, 2025
ajagota71/toxicity-reward-model-max-margin-epoch-100-s-nlp-tox-p9-500-llama-3.2-1b 1B • Updated Jul 26, 2025
ajagota71/toxicity-reward-model-max-margin-epoch-100-s-nlp-tox-p9-500-llama-3.2-1b-checkpoint-30 1B • Updated Jul 26, 2025
ajagota71/toxicity-reward-model-max-margin-epoch-100-s-nlp-tox-p9-500-pythia-1b 1B • Updated Jul 26, 2025
ajagota71/toxicity-reward-model-max-margin-epoch-100-s-nlp-tox-p9-500-pythia-1b-checkpoint-30 1B • Updated Jul 26, 2025
ajagota71/toxicity-reward-model-max-margin-epoch-100-s-nlp-tox-p9-1800-pythia-410m 0.4B • Updated Jul 26, 2025
ajagota71/toxicity-reward-model-max-margin-epoch-100-s-nlp-tox-p9-1800-pythia-410m-checkpoint-30 0.4B • Updated Jul 26, 2025
ajagota71/toxicity-reward-model-max-margin-epoch-100-s-nlp-tox-p9-500-pythia-410m 0.4B • Updated Jul 26, 2025
ajagota71/toxicity-reward-model-max-margin-epoch-100-s-nlp-tox-p9-500-pythia-410m-checkpoint-30 0.4B • Updated Jul 26, 2025
ajagota71/toxicity-reward-model-max-margin-epoch-tox-p9-s-nlp-pythia-70m 70.4M • Updated Jul 26, 2025
ajagota71/toxicity-reward-model-max-margin-epoch-tox-p9-s-nlp-pythia-70m-checkpoint-30 70.4M • Updated Jul 26, 2025
ajagota71/toxicity-reward-model-max-margin-epoch-tox-p9-s-nlp-outputs-pythia-70m 70.4M • Updated Jul 26, 2025
ajagota71/toxicity-reward-model-max-margin-epoch-tox-p9-s-nlp-outputs-pythia-70m-checkpoint-30 70.4M • Updated Jul 26, 2025
ajagota71/toxicity-reward-model-max-margin-epoch-100-v2push-pythia-70m-checkpoint-30 70.4M • Updated Jul 26, 2025
ajagota71/gt-s-nlp-toxicity-reward-model-v-head-prompt-output-max-margin-seed-42-pythia-1b 1B • Updated Jul 23, 2025
ajagota71/gt-s-nlp-toxicity-reward-model-v-head-prompt-output-max-margin-seed-42-pythia-1b-checkpoint-30 1B • Updated Jul 23, 2025
ajagota71/gt-s-nlp-toxicity-reward-model-v-head-prompt-output-max-margin-seed-42-pythia-410m 0.4B • Updated Jul 23, 2025
ajagota71/gt-s-nlp-toxicity-reward-model-v-head-prompt-output-max-margin-seed-42-pythia-410m-checkpoint-30 0.4B • Updated Jul 23, 2025
ajagota71/gt-s-nlp-toxicity-reward-model-v-head-prompt-output-max-margin-seed-42-pythia-70m 70.4M • Updated Jul 23, 2025
ajagota71/gt-s-nlp-toxicity-reward-model-v-head-prompt-output-max-margin-seed-42-pythia-70m-checkpoint-30 70.4M • Updated Jul 23, 2025
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6 Reinforcement Learning • 1B • Updated Jul 6, 2025 • 1
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-100 Reinforcement Learning • 1B • Updated Jul 6, 2025 • 1
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-80 Reinforcement Learning • 1B • Updated Jul 6, 2025 • 1