-
-
-
-
-
-
Inference Providers
Active filters: GRPO
Text Generation
• 0.1B • Updated
• 1
stranger47/Qwen2.5-3B-Instruct-GRPO-NuminaMath-TIR
Text Generation
• 3B • Updated
• 1
TharunSivamani/SmolGRPO-135M
Text Generation
• 0.1B • Updated
• 1
Text Generation
• 0.1B • Updated
• 1
bhaveshgoel07/SmolGRPO-135M
Updated
Text Generation
• 0.1B • Updated
• 1
hiroyuki0823/SakanaAI-TinySwallow-1.5B-Instruct-GRPO-lora
Updated
ykarout/Phi4-ThinkMode-fp16
Text Generation
• 15B • Updated
mradermacher/Phi4-ThinkMode-fp16-GGUF
15B • Updated
• 36
Text Generation
• 0.1B • Updated
• 1
mradermacher/Nuke_X_Gemma3_1B_Reasoner_Testing-GGUF
1.0B • Updated
• 13
• 1
mradermacher/Nuke_X_Gemma3_1B_Reasoner_Testing-i1-GGUF
1.0B • Updated
• 84
• 1
Text Generation
• 0.1B • Updated
• 1
alonsosilva/SmolGRPO-135M
Text Generation
• 0.1B • Updated
• 2
VaidikML0508/Shark-Tank-Offer-Evaluator-llama3.2-3B-Instruct-GRPO-16bits-V1
Text Generation
• 3B • Updated
• 4
• 1
mradermacher/Shark-Tank-Offer-Evaluator-llama3.2-3B-Instruct-GRPO-16bits-V1-GGUF
3B • Updated
• 77
alfredcs/gemma-3-12b-grpo-firstaid
Updated
Text Generation
• 0.1B • Updated
• 1
Thabet/SmolGRPO-135M-learning
Text Generation
• 0.1B • Updated
• 1
Text Generation
• 0.1B • Updated
• 1
Text Generation
• 0.1B • Updated
yigitkucuk/tint-interact-sft-grpo
Text Generation
• 0.4B • Updated
• 2
koochikoo25/SmolGRPO-135M
Text Generation
• 0.1B • Updated
• 1
Text Generation
• 0.1B • Updated
• 2
TianheWu/VisualQuality-R1-7B
Reinforcement Learning
• 8B • Updated
• 27.7k
• 10
pedrocurvo/llama2-grpo-lora
Text Generation
• 7B • Updated
• 2
mradermacher/VisualQuality-R1-7B-GGUF
8B • Updated
• 144
Text Generation
• 0.1B • Updated
• 7
• 1
Ceenen2302/Llama-3.2-1B-Instruct-GRPO-SmartLed
Feature Extraction
• 1B • Updated
alfredcs/torchrun-gemma-3-12b-grpo-icd10pcs-merged
Text Generation
• 8B • Updated