Active filters: ppo
ajagota71/llama-3-2-1b-s-nlp-detox-checkpoint-epoch-60
Reinforcement Learning
• 1B • Updated • 1
ajagota71/llama-3-2-1b-s-nlp-detox-checkpoint-epoch-80
Reinforcement Learning
• 1B • Updated • 1
Reinforcement Learning
• Updated Will-est/ppo-LunarLander-v2-scratch
Reinforcement Learning
• Updated Reinforcement Learning
• Updated duydl/ppo-LunearLander-v2-8PI
Reinforcement Learning
• Updated ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-20
Reinforcement Learning
• 1B • Updated • 1
Reinforcement Learning
• Updated ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-40
Reinforcement Learning
• 1B • Updated • 2
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-60
Reinforcement Learning
• 1B • Updated • 2
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-80
Reinforcement Learning
• 1B • Updated • 2
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3-checkpoint-epoch-100
Reinforcement Learning
• 1B • Updated • 2
ajagota71/llama-3-2-1b-rlhf-kl-p4-target-3
Reinforcement Learning
• 1B • Updated • 2
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-20
Reinforcement Learning
• 1B • Updated • 2
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-40
Reinforcement Learning
• 1B • Updated • 2
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-60
Reinforcement Learning
• 1B • Updated • 2
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-80
Reinforcement Learning
• 1B • Updated • 2
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6-checkpoint-epoch-100
Reinforcement Learning
• 1B • Updated • 2
ajagota71/llama-3-2-1b-rlhf-kl-p5-target-2p5-lr-3e-6
Reinforcement Learning
• 1B • Updated • 1
Sandf1sh/ppo-LunarLander-v2
Reinforcement Learning
• Updated Johnsonin/DeepRL-PPO-LunarLander-v2
Reinforcement Learning
• Updated mikebernico/ppo-LunarLander-v2
Reinforcement Learning
• Updated niratpatel/ppo-CartPole-v1
Reinforcement Learning
• Updated IgnacioCorrecher/CustomPPO-LunarLander-v2
Reinforcement Learning
• Updated Reinforcement Learning
• Updated lokeessshhhh/ppo-CartPole-v1
Reinforcement Learning
• Updated lokeessshhhh/ppo-LunarLandar-v2
Reinforcement Learning
• Updated Devyaansh123/ppo-CartPole-v1
Reinforcement Learning
• Updated Devyaansh123/my-awesome-model
Reinforcement Learning
• Updated IntelliGrow/LunarLander-v2
Reinforcement Learning
• Updated