-
-
-
-
-
-
Inference Providers
Active filters:
ppo
ajagota71/pythia-410m-s-nlp-detox-checkpoint-epoch-60
Reinforcement Learning
•
0.4B
•
Updated
•
1
ajagota71/pythia-410m-s-nlp-detox-checkpoint-epoch-80
Reinforcement Learning
•
0.4B
•
Updated
•
1
ajagota71/pythia-410m-s-nlp-detox-checkpoint-epoch-100
Reinforcement Learning
•
0.4B
•
Updated
•
1
ajagota71/pythia-410m-s-nlp-detox
Reinforcement Learning
•
0.4B
•
Updated
•
1
ajagota71/pythia-1b-s-nlp-detox-checkpoint-epoch-20
Reinforcement Learning
•
1B
•
Updated
ajagota71/pythia-1b-s-nlp-detox-checkpoint-epoch-40
Reinforcement Learning
•
1B
•
Updated
•
1
ajagota71/pythia-1b-s-nlp-detox-checkpoint-epoch-60
Reinforcement Learning
•
1B
•
Updated
•
1
ajagota71/pythia-1b-s-nlp-detox-checkpoint-epoch-80
Reinforcement Learning
•
1B
•
Updated
•
1
ajagota71/pythia-1b-s-nlp-detox-checkpoint-epoch-100
Reinforcement Learning
•
1B
•
Updated
ajagota71/pythia-1b-s-nlp-detox
Reinforcement Learning
•
1B
•
Updated
•
1
ajagota71/llama-3-2-1b-s-nlp-detox-checkpoint-epoch-20
Reinforcement Learning
•
1B
•
Updated
•
1
ajagota71/llama-3-2-1b-s-nlp-detox-checkpoint-epoch-40
Reinforcement Learning
•
1B
•
Updated
•
1
ajagota71/llama-3-2-1b-s-nlp-detox-checkpoint-epoch-60
Reinforcement Learning
•
1B
•
Updated
•
1
ajagota71/llama-3-2-1b-s-nlp-detox-checkpoint-epoch-80
Reinforcement Learning
•
1B
•
Updated
•
1
Reinforcement Learning
•
Updated
Will-est/ppo-LunarLander-v2-scratch
Reinforcement Learning
•
Updated
Reinforcement Learning
•
Updated
duydl/ppo-LunearLander-v2-8PI
Reinforcement Learning
•
Updated
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-20
Reinforcement Learning
•
1B
•
Updated
•
1
Reinforcement Learning
•
Updated
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-40
Reinforcement Learning
•
1B
•
Updated
•
1
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-60
Reinforcement Learning
•
1B
•
Updated
•
1
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-80
Reinforcement Learning
•
1B
•
Updated
•
1
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-100
Reinforcement Learning
•
1B
•
Updated
•
1
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3
Reinforcement Learning
•
1B
•
Updated
•
1
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-20
Reinforcement Learning
•
1B
•
Updated
•
1
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-40
Reinforcement Learning
•
1B
•
Updated
•
1
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-60
Reinforcement Learning
•
1B
•
Updated
•
1
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-80
Reinforcement Learning
•
1B
•
Updated
•
1
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-100
Reinforcement Learning
•
1B
•
Updated
•
1