Inference Providers
Active filters: RL
mradermacher/Austral-70B-Winton-i1-GGUF
71B • Updated • 147
HYDARIM7/SmolLM2_RLHF_PPO_HY
Reinforcement Learning
• 0.1B • Updated • 3
SII-Enigma/Qwen2.5-7B-Ins-AMPO
Text Generation
• 8B • Updated • 3
SII-Enigma/Qwen2.5-7B-Ins-SFT-GRPO
Text Generation
• 8B • Updated • 4
SII-Enigma/Llama3.2-8B-Ins-GRPO
Text Generation
• 2B • Updated • 1
• 1
mradermacher/Llama3.2-8B-Ins-GRPO-GGUF
8B • Updated • 49
• 1
SII-Enigma/Qwen2.5-7B-Ins-GRPO
Text Generation
• 2B • Updated • 1
SII-Enigma/Qwen2.5-1.5B-Ins-AMPO
Text Generation
• 2B • Updated • 1
SII-Enigma/Llama3.2-8B-Ins-AMPO
Text Generation
• 8B • Updated • 4
SII-Enigma/Qwen2.5-1.5B-Ins-GRPO
Text Generation
• 2B • Updated Text Generation
• 2B • Updated • 3
mradermacher/GCPO-R1-1.5B-GGUF
2B • Updated • 74
mradermacher/GCPO-R1-1.5B-i1-GGUF
2B • Updated • 144
mradermacher/DeepHermes-Egregore-8B-131K-GGUF
Reinforcement Learning
• 8B • Updated • 69
• 1
mradermacher/DeepHermes-Egregore-8B-131K-i1-GGUF
Reinforcement Learning
• 8B • Updated • 147
• 1
stephenchungmh/thinker_r1_5b
2B • Updated • 1
stephenchungmh/thinker_q1_5b
2B • Updated • 1
stephenchungmh/thinker_r7b
8B • Updated • 1
• 1
8B • Updated • 1
• 1
mradermacher/RENT-Qwen-7B-GGUF
8B • Updated • 200
• 1
mradermacher/RENT-Qwen-7B-i1-GGUF
8B • Updated • 407
• 1
beyoru/MinCoder-4B-Expert
Text Generation
• 4B • Updated • 6
• • 1
mradermacher/MinCoder-4B-Expert-GGUF
4B • Updated • 67
• 2
mradermacher/MinCoder-4B-Expert-i1-GGUF
4B • Updated • 364
• 1
Text Generation
• 4B • Updated • 1
aryan-kolapkar/MathReasoner-Mini-1.5b
Text Generation
• 2B • Updated • 2
• 1
mradermacher/MathReasoner-Mini-1.5b-GGUF
2B • Updated • 5
ryota39/Qwen3-8B-math-RL-ja
8B • Updated • 4
nvidia/Nemotron-Cascade-8B-Thinking
Text Generation
• Updated • 4.24k
• • 40
nvidia/Nemotron-Cascade-14B-Thinking
Text Generation
• Updated • 1.65k
• • 79