-
-
-
-
-
-
Inference Providers
Active filters: rlvr
nvidia/Nemotron-Research-GooseReason-4B-Instruct
Text Generation
• 4B • Updated
• 249
• 7
Shion1124/sapo-gdpo-dora-qwen-struct
Text Generation
• 4B • Updated
• 38
• 1
SultanR/SmolTulu-1.7b-Reinforced-GGUF
Text Generation
• 2B • Updated
• 10
• 1
thuml/rt1-world-model-multi-step-rlvr
thuml/rt1-world-model-single-step-rlvr
thuml/webarena-world-model-rlvr
2B • Updated
• 6
thuml/bytesized32-world-model-rlvr-binary-reward
2B • Updated
• 7
thuml/bytesized32-world-model-rlvr-task-specific-reward
2B • Updated
• 8
DebateLabKIT/Llama-3.1-Argunaut-1-8B-HIRPO
Text Generation
• 8B • Updated
• 9
• 1
Question Answering
• 4B • Updated
• 2
• 2
thinkwee/NOVER1-Qwen2.5-7B
Question Answering
• 8B • Updated
• 2
mradermacher/NOVER1-Qwen3-4B-GGUF
4B • Updated
• 24
• 1
mradermacher/NOVER1-Qwen2.5-7B-GGUF
8B • Updated
• 6
• 1
mradermacher/NOVER1-Qwen3-4B-i1-GGUF
4B • Updated
• 149
• 1
mradermacher/NOVER1-Qwen2.5-7B-i1-GGUF
8B • Updated
• 49
• 1
DebateLabKIT/Phi-4-Argunaut-1-HIRPO
Text Generation
• 415k • Updated
• 3
mradermacher/Llama-3.1-Argunaut-1-8B-HIRPO-GGUF
8B • Updated
• 42
• 1
mradermacher/Llama-3.1-Argunaut-1-8B-HIRPO-i1-GGUF
8B • Updated
• 43
• 1
Text Generation
• 2B • Updated
• 17
• 9
Text Generation
• 4B • Updated
• 6
• 1
mradermacher/airesupdated-v2-GGUF
Reinforcement Learning
• 4B • Updated
• 35
ABaroian/Apertus-8B-RLVR-GSM
Text Generation
• Updated
• 2
Anonymouslolol/qwen3-8B-hanabi-step110
Reinforcement Learning
• Updated
Text Generation
• 4B • Updated
• 1
anonymousatom/IntelliAsk-Qwen3-32B-450-Merged
Text Generation
• 33B • Updated
• 85
mradermacher/Phi-4-Argunaut-1-HIRPO-GGUF
15B • Updated
• 406
mradermacher/Phi-4-Argunaut-1-HIRPO-i1-GGUF
15B • Updated
• 5
TrialPanorama/LLaMA-3-8B-TP
Text Generation
• Updated
• 3
TrialPanorama/Qwen-3-8B-TP
Text Generation
• Updated
• 22
suv11235/mtsa-extreme-tar-v1-llama-3.1-8b
Updated