-
-
-
-
-
-
Inference Providers
Active filters:
ppo
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2-checkpoint-epoch-20
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2-checkpoint-epoch-40
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2-checkpoint-epoch-60
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2-checkpoint-epoch-80
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2-checkpoint-epoch-100
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round2
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1-checkpoint-epoch-20
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1-checkpoint-epoch-40
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1-checkpoint-epoch-60
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1-checkpoint-epoch-80
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1-checkpoint-epoch-100
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_scale10_Round1
Reinforcement Learning
•
1B
•
Updated
•
1
hungtrab/ppo-LunarLander-v2-scratch
Reinforcement Learning
•
Updated
CatkinChen/nethack-ppo-ablation-no_hmm_no_intrinsic
Reinforcement Learning
•
Updated
CatkinChen/nethack-ppo-ablation-baseline_no_intrinsic
Reinforcement Learning
•
Updated
Reinforcement Learning
•
Updated
•
395
Reinforcement Learning
•
Updated
Tanaybh/gpt2-rlhf-anthropic
Text Generation
•
0.1B
•
Updated
•
1
karthik/verl-qwen2.5-0.5b-gsm8k-ppo-step360
Text Generation
•
0.5B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3-checkpoint-epoch-20
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3-checkpoint-epoch-40
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3-checkpoint-epoch-60
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3-checkpoint-epoch-80
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3-checkpoint-epoch-100
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_RETRY_SAMPLING_scale10_Round3
Reinforcement Learning
•
1B
•
Updated
•
1
mradermacher/gpt2-rlhf-anthropic-GGUF
0.1B
•
Updated
•
151
MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND5-checkpoint-epoch-20
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND5-checkpoint-epoch-40
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND5-checkpoint-epoch-60
Reinforcement Learning
•
1B
•
Updated
•
1
MattBou00/llama-3-2-1b-detox_v1f_RRETRT_Again_ROUND5-checkpoint-epoch-80
Reinforcement Learning
•
1B
•
Updated
•
1