-
-
-
-
-
-
Inference Providers
Active filters: GRPO
alfredcs/torchrun-gemma-3-12b-grpo-icd10pcs-merged
Text Generation
• 8B • Updated
Ceenen2302/Llama-3.2-1B-Instruct-GRPO
Text Generation
• 1B • Updated
alperenyildiz/Mistral-7B-Instruct-v0.3_q8_0_GRPO
Text Generation
• 7B • Updated
• 1
hibikigf88/SmolLM-135M-Instruct-smoltldr-GRPO
Text Generation
• 0.1B • Updated
• 1
Text Generation
• 0.1B • Updated
• 1
Sarahpa/spGRPO-135M-readability
Text Generation
• 0.1B • Updated
• 1
alfredcs/gemma-3-27b-grpo-med-merged
Image-Text-to-Text
• Updated
alfredcs/gemma-3-27b-firstaid-icd10-merged
Image-Text-to-Text
• Updated
mradermacher/gemma-3-27b-firstaid-icd10-merged-GGUF
28B • Updated
• 68
jinlovespho/SmolGRPO-135M
Text Generation
• 0.1B • Updated
• 2
Sarahpa/spGRPO-135M-readability-2
Text Generation
• 0.1B • Updated
• 1
Text Generation
• 0.1B • Updated
tariktuna/Summarizer-Demo-SmolGRPO-135M
Text Generation
• 0.1B • Updated
Text Generation
• 0.1B • Updated
supermodelresearch/VAR-d16-GRPO-Aesthetic
Text-to-Image
• Updated
supermodelresearch/VAR-d30-GRPO-Aesthetic
Text-to-Image
• Updated
dzungever/SmolLM-135M-Instruct-GRPO
Text Generation
• 0.1B • Updated
ritwik098/SmolGRPO-360M-Ritwik
Text Generation
• 0.4B • Updated
• 1
Text Generation
• 0.1B • Updated
• 1
alfredcs/torchrun-medgemma-27b-grpo-merged
Image-Text-to-Text
• 27B • Updated
KhushalM/Qwen2.5-1.5B-GRPO-Complete
Text Generation
• 2B • Updated
Text Generation
• 0.1B • Updated
• 4
Mhammad2023/SmolGRPO-135M
Text Generation
• 0.1B • Updated
• 4
Text Generation
• 0.1B • Updated
• 1
Text Generation
• 0.1B • Updated
• 1
Text Generation
• 0.1B • Updated
mlx-community/VisualQuality-R1-7B-bf16
Reinforcement Learning
• Updated
• 8
mlx-community/VisualQuality-R1-7B-6bit
Reinforcement Learning
• Updated
• 7
mlx-community/VisualQuality-R1-7B-8bit
Reinforcement Learning
• Updated
• 9
mlx-community/VisualQuality-R1-7B-4bit
Reinforcement Learning
• Updated
• 15
• 1