-
-
-
-
-
-
Active filters: ppo
ajagota71/pythia-1b-s-nlp-detox-checkpoint-epoch-20
Reinforcement Learning
• 1B • Updated
ajagota71/pythia-1b-s-nlp-detox-checkpoint-epoch-40
Reinforcement Learning
• 1B • Updated
ajagota71/pythia-1b-s-nlp-detox-checkpoint-epoch-60
Reinforcement Learning
• 1B • Updated
ajagota71/pythia-1b-s-nlp-detox-checkpoint-epoch-80
Reinforcement Learning
• 1B • Updated
ajagota71/pythia-1b-s-nlp-detox-checkpoint-epoch-100
Reinforcement Learning
• 1B • Updated
ajagota71/pythia-1b-s-nlp-detox
Reinforcement Learning
• 1B • Updated
• 12
ajagota71/llama-3-2-1b-s-nlp-detox-checkpoint-epoch-20
Reinforcement Learning
• 1B • Updated
ajagota71/llama-3-2-1b-s-nlp-detox-checkpoint-epoch-40
Reinforcement Learning
• 1B • Updated
ajagota71/llama-3-2-1b-s-nlp-detox-checkpoint-epoch-60
Reinforcement Learning
• 1B • Updated
ajagota71/llama-3-2-1b-s-nlp-detox-checkpoint-epoch-80
Reinforcement Learning
• 1B • Updated
Reinforcement Learning
• Updated
Will-est/ppo-LunarLander-v2-scratch
Reinforcement Learning
• Updated
Reinforcement Learning
• Updated
duydl/ppo-LunearLander-v2-8PI
Reinforcement Learning
• Updated
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-20
Reinforcement Learning
• 1B • Updated
Reinforcement Learning
• Updated
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-40
Reinforcement Learning
• 1B • Updated
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-60
Reinforcement Learning
• 1B • Updated
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-80
Reinforcement Learning
• 1B • Updated
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-100
Reinforcement Learning
• 1B • Updated
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3
Reinforcement Learning
• 1B • Updated
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-20
Reinforcement Learning
• 1B • Updated
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-40
Reinforcement Learning
• 1B • Updated
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-60
Reinforcement Learning
• 1B • Updated
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-80
Reinforcement Learning
• 1B • Updated
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-100
Reinforcement Learning
• 1B • Updated
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6
Reinforcement Learning
• 1B • Updated
Sandf1sh/ppo-LunarLander-v2
Reinforcement Learning
• Updated
Johnsonin/DeepRL-PPO-LunarLander-v2
Reinforcement Learning
• Updated
mikebernico/ppo-LunarLander-v2
Reinforcement Learning
• Updated