Edit Models filters

Models

3,281

Base only

Active filters: ppo

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round4-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round4-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round4-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round4

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round1-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Nov 21, 2025 • 1

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round1-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round1-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round1-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round1-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round1

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round5-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round5-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round5-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round5-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round5-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round5

Reinforcement Learning • 1B • Updated Sep 22, 2025 • 1

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE8_round3

Reinforcement Learning • 1B • Updated Sep 22, 2025

CatkinChen/nethack-ppo-ablation-no_hmm_rnd

Reinforcement Learning • Updated Sep 27, 2025

CatkinChen/nethack-ppo-ablation-baseline_curiosity_dyn_only

Reinforcement Learning • Updated Sep 28, 2025

joigalcar/ppo-LunarLander-v2_Scratch

Reinforcement Learning • Updated Sep 23, 2025

joigalcar/ppo-LunarLander-v2_Scratch_2

Reinforcement Learning • Updated Sep 23, 2025

rishiad/kinitro-metaworld-agent

Reinforcement Learning • Updated Oct 25, 2025

CatkinChen/nethack-ppo-ablation-baseline_rnd

Reinforcement Learning • Updated Sep 28, 2025

CatkinChen/nethack-ppo-ablation-baseline_curiosity_skill_only

Reinforcement Learning • Updated Sep 27, 2025

CatkinChen/nethack-ppo-ablation-baseline_curiosity_trans_only

Reinforcement Learning • Updated Sep 27, 2025