Active filters: ppo
ajagota71/gemma-3-270m-detox-checkpoint-epoch-20
Reinforcement Learning
• 0.3B • Updated • 5
ajagota71/Qwen2.5-0.5B-detox-checkpoint-epoch-40
Reinforcement Learning
• 0.5B • Updated • 1
ajagota71/Qwen2.5-0.5B-detox-checkpoint-epoch-60
Reinforcement Learning
• 0.5B • Updated • 1
ajagota71/gemma-3-270m-detox-checkpoint-epoch-40
Reinforcement Learning
• 0.3B • Updated • 4
ajagota71/Qwen2.5-0.5B-detox-checkpoint-epoch-80
Reinforcement Learning
• 0.5B • Updated • 1
ajagota71/gemma-3-270m-detox-checkpoint-epoch-60
Reinforcement Learning
• 0.3B • Updated • 5
ajagota71/Qwen2.5-0.5B-detox-checkpoint-epoch-100
Reinforcement Learning
• 0.5B • Updated • 1
ajagota71/Qwen2.5-0.5B-detox
Reinforcement Learning
• 0.5B • Updated • 1
ajagota71/gemma-3-270m-detox-checkpoint-epoch-80
Reinforcement Learning
• 0.3B • Updated • 4
ajagota71/gemma-3-270m-detox-checkpoint-epoch-100
Reinforcement Learning
• 0.3B • Updated • 5
ajagota71/gemma-3-270m-detox
Reinforcement Learning
• 0.3B • Updated • 5
Reinforcement Learning
• Updated LizardAPN/ppo-CartPole-v1
Reinforcement Learning
• Updated Reinforcement Learning
• Updated LizardAPN/LunarLander-v2-with-ppo
Reinforcement Learning
• Updated MattBou00/smolLM-360m-detox_try_2
Reinforcement Learning
• 0.4B • Updated • 2
MattBou00/smolLM-360m-detox_try_3
Reinforcement Learning
• 0.4B • Updated • 1
Reinforcement Learning
• Updated MattBou00/smolLM-360m-detox_try_4
Reinforcement Learning
• 0.4B • Updated • 1
MattBou00/smolLM-360m-detox_try_3_stable
Reinforcement Learning
• 0.4B • Updated • 1
MattBou00/smolLM-360m-detox_try_3_stable_retry-ckpt-ep20-2025-08-18_18-34-45
Reinforcement Learning
• 0.4B • Updated • 1
MattBou00/smolLM-360m-detox_try_3_stable_retry-ckpt-ep40-2025-08-18_18-34-45
Reinforcement Learning
• 0.4B • Updated • 1
MattBou00/smolLM-360m-detox_try_3_stable_retry
Reinforcement Learning
• 0.4B • Updated • 1
MattBou00/smolLM-360m-detox_try_4_closekl-ckpt-ep20-2025-08-18_18-50-03
Reinforcement Learning
• 0.4B • Updated • 1
MattBou00/smolLM-360m-detox_try_4_closekl-ckpt-ep40-2025-08-18_18-50-03
Reinforcement Learning
• 0.4B • Updated • 1
MattBou00/smolLM-360m-detox_try_4_closekl
Reinforcement Learning
• 0.4B • Updated MattBou00/smolLM-135-detox_first
Reinforcement Learning
• 0.1B • Updated • 1
MattBou00/smolLM-135m-detox_same_as_larger
Reinforcement Learning
• 0.1B • Updated • 1
MattBou00/llama-3-2-1b-detox_v1-checkpoint-epoch-20
Reinforcement Learning
• 1B • Updated • 2
MattBou00/llama-3-2-1b-detox_v1b-checkpoint-epoch-20
Reinforcement Learning
• 1B • Updated • 1