Active filters: ppo
ajagota71/pythia-70m-fb-detox-checkpoint-epoch-180
Reinforcement Learning
• 70.4M • Updated • 1
ajagota71/pythia-70m-fb-detox-checkpoint-epoch-200
Reinforcement Learning
• 70.4M • Updated • 1
ajagota71/pythia-70m-fb-detox
Reinforcement Learning
• 70.4M • Updated ajagota71/pythia-160m-fb-detox-checkpoint-epoch-20
Reinforcement Learning
• 0.2B • Updated • 1
ajagota71/pythia-160m-fb-detox-checkpoint-epoch-60
Reinforcement Learning
• 0.2B • Updated • 1
ajagota71/pythia-160m-fb-detox-checkpoint-epoch-80
Reinforcement Learning
• 0.2B • Updated • 1
ajagota71/pythia-160m-fb-detox-checkpoint-epoch-100
Reinforcement Learning
• 0.2B • Updated • 1
ajagota71/pythia-410m-fb-detox-checkpoint-epoch-20
Reinforcement Learning
• 0.4B • Updated • 2
ajagota71/pythia-410m-fb-detox-checkpoint-epoch-40
Reinforcement Learning
• 0.4B • Updated • 1
ajagota71/pythia-410m-fb-detox-checkpoint-epoch-60
Reinforcement Learning
• 0.4B • Updated • 1
ajagota71/pythia-410m-fb-detox-checkpoint-epoch-80
Reinforcement Learning
• 0.4B • Updated • 1
ajagota71/pythia-410m-fb-detox-checkpoint-epoch-100
Reinforcement Learning
• 0.4B • Updated • 1
ajagota71/pythia-410m-fb-detox-checkpoint-epoch-120
Reinforcement Learning
• 0.4B • Updated • 1
ajagota71/pythia-410m-fb-detox-checkpoint-epoch-140
Reinforcement Learning
• 0.4B • Updated ajagota71/pythia-410m-fb-detox-checkpoint-epoch-160
Reinforcement Learning
• 0.4B • Updated • 1
ajagota71/pythia-410m-fb-detox-checkpoint-epoch-180
Reinforcement Learning
• 0.4B • Updated • 1
ajagota71/pythia-410m-fb-detox-checkpoint-epoch-200
Reinforcement Learning
• 0.4B • Updated • 1
ajagota71/pythia-410m-fb-detox
Reinforcement Learning
• 0.4B • Updated • 1
Reinforcement Learning
• Updated jtan4albany/ppo-lunarlander
Reinforcement Learning
• Updated jtan4albany/lunarlander-unit8
Reinforcement Learning
• Updated GinesMeca/ppo-LunarLander-v2.1
Reinforcement Learning
• Updated ajmalmahmood/ppo-CartPole-v1
Reinforcement Learning
• Updated ajmalmahmood/LunarLander-v2
Reinforcement Learning
• Updated winssu/LunarLander-v2-ppo
Reinforcement Learning
• Updated refikcam/ppo-LunarLander-fromScratch
Reinforcement Learning
• Updated Reinforcement Learning
• Updated gabrielbo/spark-model-QLoRA
Text Generation
• Updated • 1
aarifahullah/LunarLander-v2_CleanRL
Reinforcement Learning
• Updated Reinforcement Learning
• Updated