-
-
-
-
-
-
Active filters: ppo
volfy/huggingface_rl_unit8_ppo-LunarLander-v2
Reinforcement Learning
• Updated
Vanheart/ppoCRL-LunarLander-v2
Reinforcement Learning
• Updated
JuanjoGT13/ppo-CartPole-v1
Reinforcement Learning
• Updated
amostof/ppoScratch-LunarLander-v2
Reinforcement Learning
• Updated
twofacejr/ppo-CartPole-v1
Reinforcement Learning
• Updated
vinhdq842/ppo-LunarLander-v2-scratch
Reinforcement Learning
• Updated
Jennny/llama3_samsum_rl_marshal
Reinforcement Learning
• 8B • Updated
Jennny/llama3_dialogsum_rl_marshal
Reinforcement Learning
• 8B • Updated
• 1
francescosabbarese/ppo-CartPole-v1
Reinforcement Learning
• Updated
francescosabbarese/ppo-LunarLander-v2-unit8-pt1
Reinforcement Learning
• Updated
nasnoussi/ppo-CartPole-v1
Reinforcement Learning
• Updated
takedakoji00/Llama-3.1-8B-Instruct-custom-qg-full_20250219-7th_random_pad_is_eos_test
Reinforcement Learning
• Updated
• 1
baronase/ppo-cleanrl-CartPole-v1
Reinforcement Learning
• Updated
baronase/ppo-cleanrl-CartPole-v1_2
Reinforcement Learning
• Updated
baronase/ppo-cleanrl-LunarLander-v2_1
Reinforcement Learning
• Updated
baronase/ppo-cleanrl-LunarLander-v2_200k
Reinforcement Learning
• Updated
takedakoji00/Llama-3.1-8B-Instruct-custom-qg-full_20250219-7th_random_pad_is_eos_ppo_2nd
Reinforcement Learning
• Updated
takedakoji00/Llama-3.1-8B-Instruct-custom-qg-full_20250219-7th_random_pad_is_eos_offline_nav
Reinforcement Learning
• 5B • Updated
Jennny/llama3_samsum_marl_wo_comm
Reinforcement Learning
• 8B • Updated
Jennny/llama3_dialogsum_marl_wo_comm
Reinforcement Learning
• 8B • Updated
• 1
lucas-palmiro/ppo-LunarLander-v3
Reinforcement Learning
• Updated
lucas-palmiro/ppo-early-stopping-LunarLander-v3
Reinforcement Learning
• Updated
sighmon/ppo-cleanrl-LunarLander-v2
Reinforcement Learning
• Updated
mrinaldi86/ppo-CartPole-v1
Reinforcement Learning
• Updated
mrinaldi86/ppo-LunarLander-v3
Reinforcement Learning
• Updated
takedakoji00/Llama-3.1-8B-Instruct-custom-qg-full_20250219-7th_random_pad_is_eos_offline_nav_2nd
Reinforcement Learning
• 5B • Updated
takedakoji00/Llama-3.1-8B-Instruct-custom-qg-full_20250219-7th_random_pad_is_eos_ppo_3rd
Reinforcement Learning
• Updated
nasnoussi/ppo-Pixelcopter-v1
Reinforcement Learning
• Updated
dragovoid/ppo-LunarLander-v2-u8
Reinforcement Learning
• Updated
amostof/ppoScratchTest-LunarLander-v2
Reinforcement Learning
• Updated