Edit Models filters

Models

3,289

Base only

Active filters: ppo

Tanaybh/gpt2-rlhf-anthropic

Text Generation • 0.1B • Updated Oct 2, 2025 • 8

karthik/verl-qwen2.5-0.5b-gsm8k-ppo-step360

Text Generation • 0.5B • Updated Sep 21, 2025 • 7

MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3

Reinforcement Learning • 1B • Updated Sep 22, 2025 • 1

mradermacher/gpt2-rlhf-anthropic-GGUF

0.1B • Updated Sep 22, 2025 • 157

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND5-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND5-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 22, 2025 • 2

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND5-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND5-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND5-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND5

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND3-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND3-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND3-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND3-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND3-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND3

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND1-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND1-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND1-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND1-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND1-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND1

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND2-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Sep 22, 2025 • 1

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND2-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Sep 22, 2025

MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND2-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Sep 22, 2025