Models

197

Full-text search

Active filters: GRPO

npallewela/Qwen-1.5B-moral_social_all_2

Text Generation • 2B • Updated Mar 7 • 14

OpenMOSS-Team/SciJudge-4B

Text Generation • 4B • Updated Mar 17 • 333 • 6

mradermacher/SciJudge-4B-GGUF

4B • Updated Mar 16 • 38

mradermacher/SciJudge-30B-GGUF

31B • Updated Mar 16 • 84

etri-vilab/MultiHopSpatial-Qwen3-VL-4B-Instruct

Image-Text-to-Text • 4B • Updated Mar 20 • 22

JustinLeee/GrandLine_LLM

Question Answering • Updated about 1 month ago • 1

servantofares/AReaL-SEA-235B-A22B

Text Generation • 235B • Updated 30 days ago • 10

goosmanlei/SmolLM-135M-Instruct-GRPO-smoltldr

Text Generation • 0.1B • Updated 28 days ago • 528

Nirav-Madhani/LFM2.5-1.2B-Meditation

Text Generation • Updated 9 days ago

airev-ae/Qwen-0.8B-AgentJSON

Text Generation • 0.8B • Updated 26 days ago • 696 • 1

mradermacher/AReaL-SEA-235B-A22B-GGUF

Reinforcement Learning • 235B • Updated 21 days ago • 562

mradermacher/AReaL-SEA-235B-A22B-i1-GGUF

Reinforcement Learning • 235B • Updated 19 days ago • 14.5k

MJPT2/SmolGRPO-135M

Text Generation • 0.1B • Updated 19 days ago • 521

OpenLearnLM/special-r1-deepseek-qwen3-8b-sped-adaptive-think-noreward

Text Generation • 8B • Updated 13 days ago • 208

owlgebra-ai/wufus-CART-8B

Text Generation • 8B • Updated 8 days ago • 234

OpenLearnLM/special-r1-deepseek-qwen3-8b-sped-adaptive-think-reward

Text Generation • 8B • Updated 4 days ago • 81

sjwjames0325/GRPO_agentic-Qwen2ForCausalLM-2B

Text Generation • 2.43M • Updated 1 day ago • 223