Edit Models filters

Models

3,285

Base only

Active filters: ppo

MattBou00/llama-3-2-1b-detox_RETRY_scale10

Reinforcement Learning • 1B • Updated Sep 19, 2025

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round4-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Sep 19, 2025 • 1

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round4-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 19, 2025 • 1

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round4-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 19, 2025 • 1

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round4-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 19, 2025 • 1

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round4-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 19, 2025 • 1

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round4

Reinforcement Learning • 1B • Updated Sep 19, 2025 • 1

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round3-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Sep 19, 2025

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round3-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 19, 2025

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round3-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 19, 2025

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round3-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 19, 2025

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round3-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 19, 2025

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round3

Reinforcement Learning • 1B • Updated Sep 19, 2025

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Sep 19, 2025

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 19, 2025

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 19, 2025 • 1

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 19, 2025

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 19, 2025

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2

Reinforcement Learning • 1B • Updated Sep 19, 2025

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Sep 19, 2025

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 19, 2025

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 19, 2025

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 19, 2025

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 19, 2025 • 1

MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1

Reinforcement Learning • 1B • Updated Sep 19, 2025

hungtrab/ppo-LunarLander-v2-scratch

Reinforcement Learning • Updated Sep 19, 2025

CatkinChen/nethack-ppo-ablation-no_hmm_no_intrinsic

Reinforcement Learning • Updated Sep 28, 2025

CatkinChen/nethack-ppo-ablation-baseline_no_intrinsic

Reinforcement Learning • Updated Sep 27, 2025

inq-android/eedgym-ckpts

Reinforcement Learning • Updated Dec 23, 2025 • 4

rabeeqasem/lunarlanderv2

Reinforcement Learning • Updated Sep 21, 2025