-
-
-
-
-
-
Active filters: ppo
jlse/ppo-LunarLander-v2-u8
Reinforcement Learning
• Updated
ajagota71/pythia-70m-detox-test
Reinforcement Learning
• 70.4M • Updated
Momin-Shahzad/ppo-CartPole-v1
Reinforcement Learning
• Updated
ajagota71/pythia-70m-detox-raw-logits
Reinforcement Learning
• 70.4M • Updated
Momin-Shahzad/LunarLander-v2
Reinforcement Learning
• Updated
Nack34/ppo-from-scratch-LunarLander-v2
Reinforcement Learning
• Updated
Reinforcement Learning
• Updated
Ari8/ppo-LunarLander-v2_unit8
Reinforcement Learning
• Updated
AndreiVoicuT/ppo-LunarLander-v2-C8
Reinforcement Learning
• Updated
alejandroajhr/ppo-LunarLander-v2-unit8
Reinforcement Learning
• Updated
ajagota71/pythia-70m-detox-irl-rlhf-test
Reinforcement Learning
• 70.4M • Updated
rusuanjun/ppo-selfimplement-LunarLander-v2
Reinforcement Learning
• Updated
Reinforcement Learning
• Updated
aalva/ppo-cleanrl-LunarLander-v2
Reinforcement Learning
• Updated
ajagota71/pythia-70m-detox-irl-rlhf-test-facebook-filter
Reinforcement Learning
• 70.4M • Updated
ajagota71/pythia-70m-detox-irl-rlhf-test2
Reinforcement Learning
• 70.4M • Updated
ajagota71/pythia-70m-detox-raw-logits-test2
Reinforcement Learning
• 70.4M • Updated
ajagota71/pythia-160m-detox-raw-logits-test2
Reinforcement Learning
• 0.2B • Updated
ajagota71/pythia-70m-detox-irl-rlhf-seed-42
Reinforcement Learning
• 70.4M • Updated
ajagota71/pythia-70m-detox-irl-rlhf-seed-200
Reinforcement Learning
• 70.4M • Updated
DeepMostInnovations/sales-conversion-model-reinf-learning
Reinforcement Learning
• Updated
• 51
• 33
Reinforcement Learning
• Updated
ajagota71/pythia-410m-detox-irl-rlhf-seed-42
Reinforcement Learning
• 0.4B • Updated
ajagota71/pythia-410m-detox-irl-rlhf-seed-100
Reinforcement Learning
• 0.4B • Updated
ajagota71/pythia-410m-detox-irl-rlhf-seed-200
Reinforcement Learning
• 0.4B • Updated
ajagota71/pythia-410m-detox-irl-rlhf-seed-300
Reinforcement Learning
• 0.4B • Updated
ajagota71/pythia-410m-detox-irl-rlhf-seed-400
Reinforcement Learning
• 0.4B • Updated
S-Chaves/ppo-from-scratch-LunarLander-v2
Reinforcement Learning
• Updated
Arrebol-yzq/RLP_llm_inductive_model
Reinforcement Learning
• Updated
Reinforcement Learning
• Updated