Active filters: ppo
alejandroajhr/ppo-LunarLander-v2-unit8
Reinforcement Learning
• Updated ajagota71/pythia-70m-detox-irl-rlhf-test
Reinforcement Learning
• 70.4M • Updated • 1
rusuanjun/ppo-selfimplement-LunarLander-v2
Reinforcement Learning
• Updated Reinforcement Learning
• Updated aalva/ppo-cleanrl-LunarLander-v2
Reinforcement Learning
• Updated ajagota71/pythia-70m-detox-irl-rlhf-test-facebook-filter
Reinforcement Learning
• 70.4M • Updated • 1
ajagota71/pythia-70m-detox-irl-rlhf-test2
Reinforcement Learning
• 70.4M • Updated • 1
ajagota71/pythia-70m-detox-raw-logits-test2
Reinforcement Learning
• 70.4M • Updated • 1
ajagota71/pythia-160m-detox-raw-logits-test2
Reinforcement Learning
• 0.2B • Updated • 1
ajagota71/pythia-70m-detox-irl-rlhf-seed-42
Reinforcement Learning
• 70.4M • Updated • 2
ajagota71/pythia-70m-detox-irl-rlhf-seed-200
Reinforcement Learning
• 70.4M • Updated • 1
DeepMostInnovations/sales-conversion-model-reinf-learning
Reinforcement Learning
• Updated • 11
• 33
Reinforcement Learning
• Updated ajagota71/pythia-410m-detox-irl-rlhf-seed-42
Reinforcement Learning
• 0.4B • Updated • 1
ajagota71/pythia-410m-detox-irl-rlhf-seed-100
Reinforcement Learning
• 0.4B • Updated • 3
ajagota71/pythia-410m-detox-irl-rlhf-seed-200
Reinforcement Learning
• 0.4B • Updated • 1
ajagota71/pythia-410m-detox-irl-rlhf-seed-300
Reinforcement Learning
• 0.4B • Updated • 1
ajagota71/pythia-410m-detox-irl-rlhf-seed-400
Reinforcement Learning
• 0.4B • Updated • 1
S-Chaves/ppo-from-scratch-LunarLander-v2
Reinforcement Learning
• Updated Arrebol-yzq/RLP_llm_inductive_model
Reinforcement Learning
• Updated Reinforcement Learning
• Updated ajagota71/pythia-70m-fb-detox-checkpoint-epoch-20
Reinforcement Learning
• 70.4M • Updated • 1
ajagota71/pythia-70m-fb-detox-checkpoint-epoch-40
Reinforcement Learning
• 70.4M • Updated • 1
ajagota71/pythia-70m-fb-detox-checkpoint-epoch-60
Reinforcement Learning
• 70.4M • Updated • 1
ajagota71/pythia-70m-fb-detox-checkpoint-epoch-80
Reinforcement Learning
• 70.4M • Updated • 1
ajagota71/pythia-70m-fb-detox-checkpoint-epoch-100
Reinforcement Learning
• 70.4M • Updated • 2
ajagota71/pythia-70m-fb-detox-checkpoint-epoch-120
Reinforcement Learning
• 70.4M • Updated • 1
ajagota71/pythia-70m-fb-detox-checkpoint-epoch-140
Reinforcement Learning
• 70.4M • Updated • 1
ajagota71/pythia-70m-fb-detox-checkpoint-epoch-160
Reinforcement Learning
• 70.4M • Updated • 1
ajagota71/pythia-70m-fb-detox-checkpoint-epoch-180
Reinforcement Learning
• 70.4M • Updated • 1