Models

3,196

Full-text search

Active filters: ppo

MattBou00/llama-3-2-1b-detox_v1f_round2-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Aug 21, 2025

MattBou00/llama-3-2-1b-detox_v1f_round2-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Aug 21, 2025

MattBou00/llama-3-2-1b-detox_v1f_round2-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Aug 21, 2025

MattBou00/llama-3-2-1b-detox_v1f_round2

Reinforcement Learning • 1B • Updated Aug 21, 2025 • 1

MattBou00/llama-3-2-1b-detox_v1f_round3-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Aug 21, 2025

MattBou00/llama-3-2-1b-detox_v1f_round3-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Aug 21, 2025

MattBou00/llama-3-2-1b-detox_v1f_round3-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Aug 21, 2025

MattBou00/llama-3-2-1b-detox_v1f_round3-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Aug 21, 2025

MattBou00/llama-3-2-1b-detox_v1f_round3-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Aug 21, 2025

MattBou00/llama-3-2-1b-detox_v1f_round3

Reinforcement Learning • 1B • Updated Aug 21, 2025

MattBou00/llama-3-2-1b-detox_v1f_round4-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Aug 21, 2025

MattBou00/llama-3-2-1b-detox_v1f_round4-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Aug 21, 2025

MattBou00/llama-3-2-1b-detox_v1f_round4-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Aug 21, 2025

MattBou00/llama-3-2-1b-detox_v1f_round4-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Aug 21, 2025

MattBou00/llama-3-2-1b-detox_v1f_round4-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Aug 21, 2025

MattBou00/llama-3-2-1b-detox_v1f_round4

Reinforcement Learning • 1B • Updated Aug 21, 2025

yobellee/ppo-CartPole-v1

Reinforcement Learning • Updated Aug 22, 2025

sam522/ppo-lunarlander-v3

Reinforcement Learning • Updated Aug 22, 2025

bensalem14/lunarlanderv2-unit8

Reinforcement Learning • Updated Aug 23, 2025

Huinker/ppo-CartPole-v1

Reinforcement Learning • Updated Aug 23, 2025

a1024053774/ppo-LunarLander-v2

Reinforcement Learning • Updated Aug 23, 2025

Narunat/ppo-LunarLander-v2

Reinforcement Learning • Updated Aug 24, 2025

Ale902/ppo-lunar_lander

Reinforcement Learning • Updated Aug 24, 2025

caragones/ppo-lunarlander-best

Reinforcement Learning • Updated Aug 24, 2025 • 5

MattBou00/llama-3-2-1b-detox_retry-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Aug 25, 2025

igzi/ppo-CartPole-v1

Reinforcement Learning • Updated Aug 26, 2025

aka38/ppo-unit8-LunarLander-v2

Reinforcement Learning • Updated Aug 27, 2025

igabirondo13/ppo-LunarLander-v2

Reinforcement Learning • Updated Sep 3, 2025

madmage/ppo-fromscratch-LunarLander

Reinforcement Learning • Updated Aug 28, 2025

sanjaykushwah/ppo-LunarLander-v3

Reinforcement Learning • Updated Aug 28, 2025