Models

228

Full-text search

Active filters: RL

mradermacher/Austral-70B-Winton-i1-GGUF

71B • Updated Dec 28, 2025 • 147

HYDARIM7/SmolLM2_RLHF_PPO_HY

Reinforcement Learning • 0.1B • Updated Sep 21, 2025 • 3

SII-Enigma/Qwen2.5-7B-Ins-AMPO

Text Generation • 8B • Updated Mar 20 • 3

SII-Enigma/Qwen2.5-7B-Ins-SFT-GRPO

Text Generation • 8B • Updated Mar 20 • 4

SII-Enigma/Llama3.2-8B-Ins-GRPO

Text Generation • 2B • Updated Oct 15, 2025 • 1 • 1

mradermacher/Llama3.2-8B-Ins-GRPO-GGUF

8B • Updated Oct 16, 2025 • 49 • 1

SII-Enigma/Qwen2.5-7B-Ins-GRPO

Text Generation • 2B • Updated Oct 15, 2025 • 1

SII-Enigma/Qwen2.5-1.5B-Ins-AMPO

Text Generation • 2B • Updated Mar 20 • 1

SII-Enigma/Llama3.2-8B-Ins-AMPO

Text Generation • 8B • Updated Mar 21 • 4

SII-Enigma/Qwen2.5-1.5B-Ins-GRPO

Text Generation • 2B • Updated Oct 15, 2025

Ach0/GCPO-R1-1.5B

Text Generation • 2B • Updated Oct 11, 2025 • 3

mradermacher/GCPO-R1-1.5B-GGUF

2B • Updated Oct 11, 2025 • 74

mradermacher/GCPO-R1-1.5B-i1-GGUF

2B • Updated Dec 6, 2025 • 144

mradermacher/DeepHermes-Egregore-8B-131K-GGUF

Reinforcement Learning • 8B • Updated Oct 16, 2025 • 69 • 1

mradermacher/DeepHermes-Egregore-8B-131K-i1-GGUF

Reinforcement Learning • 8B • Updated Dec 10, 2025 • 147 • 1

stephenchungmh/thinker_r1_5b

2B • Updated Oct 16, 2025 • 1

stephenchungmh/thinker_q1_5b

2B • Updated Oct 16, 2025 • 1

stephenchungmh/thinker_r7b

8B • Updated Oct 16, 2025 • 1 • 1

aippolit/RENT-Qwen-7B

8B • Updated Oct 31, 2025 • 1 • 1

mradermacher/RENT-Qwen-7B-GGUF

8B • Updated Oct 31, 2025 • 200 • 1

mradermacher/RENT-Qwen-7B-i1-GGUF

8B • Updated Dec 5, 2025 • 407 • 1

beyoru/MinCoder-4B-Expert

Text Generation • 4B • Updated Nov 2, 2025 • 6 • • 1

mradermacher/MinCoder-4B-Expert-GGUF

4B • Updated Nov 3, 2025 • 67 • 2

mradermacher/MinCoder-4B-Expert-i1-GGUF

4B • Updated Dec 7, 2025 • 364 • 1

beyoru/MaxCoder-4B

Text Generation • 4B • Updated Nov 7, 2025 • 1

aryan-kolapkar/MathReasoner-Mini-1.5b

Text Generation • 2B • Updated Apr 17 • 2 • 1

mradermacher/MathReasoner-Mini-1.5b-GGUF

2B • Updated Dec 18, 2025 • 5

ryota39/Qwen3-8B-math-RL-ja

8B • Updated Dec 9, 2025 • 4

nvidia/Nemotron-Cascade-8B-Thinking

Text Generation • Updated Jan 1 • 4.24k • • 40

nvidia/Nemotron-Cascade-14B-Thinking

Text Generation • Updated Jan 1 • 1.65k • • 79