Edit Models filters

Model Tree

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Reasoning-Course

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

83

Base only

Active filters: Reasoning-Course

flymars/SmolGRPO-135M

Text Generation • 0.1B • Updated Dec 2, 2025 • 3

DannieAI/SmolGRPO-135M

Text Generation • 0.1B • Updated Dec 3, 2025 • 4

zijie0304/SmolGRPO-135M

Text Generation • 0.1B • Updated Dec 16, 2025 • 3

chanhyeok/SmolGRPO-135M

Text Generation • 0.1B • Updated Dec 30, 2025 • 3

7beshoyarnest/fine_tuned_SmolGRPO-135M_using_GRPO

0.1B • Updated Jan 22 • 4

halxj/Devjalx-4b

Text Generation • 4B • Updated Apr 7 • 26 •

Cacciatore2023/SmolGRPO-135M

Text Generation • 0.1B • Updated Jan 26 • 2

Cacciatore2023/SmolGRPO-135M-v2

Text Generation • 0.1B • Updated Jan 26 • 3

mradermacher/SmolGRPO-135M-v2-GGUF

0.1B • Updated Jan 28

Perditio/SmolGRPO-135M

Text Generation • 0.1B • Updated Feb 11 • 5

syanwang/SmolGRPO-135M

Text Generation • 0.1B • Updated Feb 21 • 2

npallewela/Qwen-0.5-RL-tune

Text Generation • 0.5B • Updated Feb 22 • 3

npallewela/SmolGRPO-135M

Text Generation • 0.1B • Updated Feb 22 • 3

npallewela/Qwen-0.5B-moral_social_emh

Text Generation • 0.5B • Updated Feb 23 • 4

npallewela/Qwen-1.5B-moral_social_emh

Text Generation • 2B • Updated Feb 23 • 3

npallewela/Qwen-1.5B-moral_social_ed_1

Text Generation • 2B • Updated Feb 24 • 2

npallewela/Qwen-1.5B-moral_social_all

Text Generation • 2B • Updated Feb 26 • 2

npallewela/Qwen-1.5B-moral_social_all_1

Text Generation • 2B • Updated Mar 2 • 4

npallewela/Qwen-1.5B-moral_social_all_2

Text Generation • 2B • Updated Mar 7 • 2

goosmanlei/SmolLM-135M-Instruct-GRPO-smoltldr

Text Generation • 0.1B • Updated Mar 23 • 6

MJPT2/SmolGRPO-135M

Text Generation • 0.1B • Updated Apr 2 • 6

gugukaka/SmolGRPO-135M

Text Generation • 0.1B • Updated Apr 26 • 2

AnujPatnaik1/SmolGRPO-135M

Text Generation • 0.1B • Updated May 25 • 2