Models

3,195

Full-text search

Active filters: ppo

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND1-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND1-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND1-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND1-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND1

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND2-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND2-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND2-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND2-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 22, 2025 • 1

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND2-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND2

Reinforcement Learning • 1B • Updated Sep 22, 2025 • 1

CatkinChen/nethack-ppo-ablation-no_hmm_curiosity_dyn_only

Reinforcement Learning • Updated Sep 28, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_AGAIN_ROUND3-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Sep 22, 2025 • 1

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_AGAIN_ROUND3-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 22, 2025 • 1

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_AGAIN_ROUND3-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 22, 2025 • 1

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_AGAIN_ROUND3-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 22, 2025 • 1

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_AGAIN_ROUND3-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 22, 2025 • 1

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_AGAIN_ROUND3

Reinforcement Learning • 1B • Updated Sep 22, 2025 • 1

MattBou00/llama-3-2-1b-detox_v1f_SCALE9_round5-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE9_round5-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE9_round5-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE9_round5-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE9_round5-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE9_round5

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE9_round3-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Sep 22, 2025 • 1

MattBou00/llama-3-2-1b-detox_v1f_SCALE9_round3-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 22, 2025 • 1

MattBou00/llama-3-2-1b-detox_v1f_SCALE9_round3-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 22, 2025 • 1

MattBou00/llama-3-2-1b-detox_v1f_SCALE9_round3-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 22, 2025 • 1

MattBou00/llama-3-2-1b-detox_v1f_SCALE9_round3-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_SCALE9_round3

Reinforcement Learning • 1B • Updated Sep 22, 2025