Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

182

Full-text search

Active filters: GRPO

tobrun/SmolLM2-135M-GRPO

Text Generation • 0.1B • Updated Mar 15, 2025 • 1

stranger47/Qwen2.5-3B-Instruct-GRPO-NuminaMath-TIR

Text Generation • 3B • Updated Mar 16, 2025 • 1

TharunSivamani/SmolGRPO-135M

Text Generation • 0.1B • Updated Mar 16, 2025 • 1

frascuchon/SmolGRPO-135M

Text Generation • 0.1B • Updated Mar 17, 2025 • 1

bhaveshgoel07/SmolGRPO-135M

Updated Mar 18, 2025

Arushhh/SmolGRPO-135M

Text Generation • 0.1B • Updated Mar 24, 2025 • 1

hiroyuki0823/SakanaAI-TinySwallow-1.5B-Instruct-GRPO-lora

Updated Mar 24, 2025

ykarout/Phi4-ThinkMode-fp16

Text Generation • 15B • Updated Mar 27, 2025

mradermacher/Phi4-ThinkMode-fp16-GGUF

15B • Updated Jul 11, 2025 • 36

czuo03/SmolGRPO-135M

Text Generation • 0.1B • Updated Mar 28, 2025 • 1

mradermacher/Nuke_X_Gemma3_1B_Reasoner_Testing-GGUF

1.0B • Updated Jul 11, 2025 • 13 • 1

mradermacher/Nuke_X_Gemma3_1B_Reasoner_Testing-i1-GGUF

1.0B • Updated Jul 11, 2025 • 84 • 1

opria123/SmolGRPO-135M

Text Generation • 0.1B • Updated Apr 6, 2025 • 1

alonsosilva/SmolGRPO-135M

Text Generation • 0.1B • Updated Apr 8, 2025 • 2

VaidikML0508/Shark-Tank-Offer-Evaluator-llama3.2-3B-Instruct-GRPO-16bits-V1

Text Generation • 3B • Updated Apr 22, 2025 • 4 • 1

mradermacher/Shark-Tank-Offer-Evaluator-llama3.2-3B-Instruct-GRPO-16bits-V1-GGUF

3B • Updated Jul 11, 2025 • 77

alfredcs/gemma-3-12b-grpo-firstaid

Updated Apr 24, 2025

garethpaul/SmolGRPO-135M

Text Generation • 0.1B • Updated May 8, 2025 • 1

Thabet/SmolGRPO-135M-learning

Text Generation • 0.1B • Updated May 10, 2025 • 1

jcollado/SmolGRPO-135M

Text Generation • 0.1B • Updated May 14, 2025 • 1

Brianpuz/SmolGRPO-135M

Text Generation • 0.1B • Updated May 19, 2025

yigitkucuk/tint-interact-sft-grpo

Text Generation • 0.4B • Updated May 19, 2025 • 2

koochikoo25/SmolGRPO-135M

Text Generation • 0.1B • Updated May 20, 2025 • 1

jackle33/SmolGRPO-135M

Text Generation • 0.1B • Updated May 22, 2025 • 2

TianheWu/VisualQuality-R1-7B

Reinforcement Learning • 8B • Updated Sep 19, 2025 • 27.7k • 10

pedrocurvo/llama2-grpo-lora

Text Generation • 7B • Updated May 26, 2025 • 2

mradermacher/VisualQuality-R1-7B-GGUF

8B • Updated Jul 31, 2025 • 144

HuangXinBa/GRPO

Text Generation • 0.1B • Updated May 28, 2025 • 7 • 1

Ceenen2302/Llama-3.2-1B-Instruct-GRPO-SmartLed

Feature Extraction • 1B • Updated Jun 3, 2025

alfredcs/torchrun-gemma-3-12b-grpo-icd10pcs-merged

Text Generation • 8B • Updated Jun 4, 2025