Edit Models filters

Model Tree

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

270

Base only

Active filters: llm-compressor

kaitchup/LFM2.5-1.2B-Instruct-W4A16-G128

1B • Updated Jan 9 • 3 • 1

kaitchup/LFM2.5-1.2B-JP-W4A16-G128

1B • Updated Jan 9 • 3 • 1

kaitchup/LFM2.5-1.2B-JP-NVFP4

0.9B • Updated Jan 9 • 124 • 1

kaitchup/LFM2.5-1.2B-Instruct-NVFP4

0.9B • Updated Jan 27 • 11 • 2

kaitchup/LFM2.5-1.2B-Instruct-awq-asym

1B • Updated Jan 9 • 235 • 1

kaitchup/LFM2.5-1.2B-JP-awq-asym

1B • Updated Jan 9 • 7 • 1

xhxlb/IQuest-Coder-V1-40B-Instruct-int4

Text Generation • 6B • Updated Jan 8 • 3

EmbeddedLLM/Qwen3-VL-30B-A3B-Instruct.w4a16

Text Generation • 5B • Updated Jan 13 • 238 • 1

EmbeddedLLM/Qwen3-VL-30B-A3B-Thinking.w4a16

Text Generation • 5B • Updated Jan 13 • 7

mratsim/MiniMax-M2.1-BF16-INT4-AWQ

Text Generation • 39B • Updated Jan 14 • 10 • 7

RedHatAI/granite-4.0-h-tiny-FP8-dynamic

Text Generation • 7B • Updated Apr 28 • 5.18k • 3

kaitchup/LFM2.5-1.2B-Thinking-AWQ-W4A16-ASYM

1B • Updated Jan 22 • 3

kaitchup/LFM2.5-1.2B-Thinking-FP8-Dynamic

1B • Updated Jan 22 • 10

kaitchup/LFM2.5-1.2B-Thinking-MXFP4

0.8B • Updated Jan 22 • 5

kaitchup/LFM2.5-1.2B-Thinking-NVFP4

0.9B • Updated Jan 27 • 3

kaitchup/LFM2.5-1.2B-Thinking-W4A16-G128

1B • Updated Jan 22 • 4

kaitchup/LFM2.5-1.2B-Thinking-autoround-W4A16

0.7B • Updated Jan 22 • 43

kaitchup/GLM-4.7-Flash-FP8-Dynamic

30B • Updated Jan 23 • 7

RedHatAI/Phi-4-reasoning-FP8-dynamic

Text Generation • 15B • Updated Apr 28 • 283 • 1

dtometzki/Qwen3-30B-A3B-awq-sym

Text Generation • 5B • Updated Jan 28 • 48

JongYeop/Llama-3.1-8B-Instruct-NVFP4-W4A4

5B • Updated Jan 29 • 12

JongYeop/Llama-3.1-70B-Instruct-NVFP4-W4A4

41B • Updated Feb 2 • 5

JongYeop/Llama-3.1-8B-Instruct-INT8-W8A8

8B • Updated Feb 3 • 2

JongYeop/Llama-3.1-8B-Instruct-INT8-W8A8-Dynamic-Per-Token

8B • Updated Feb 3 • 7

JongYeop/Llama-3.1-8B-Instruct-FP8-W8A8-Dynamic-Per-Token

8B • Updated Feb 3 • 2

bullpoint/Qwen3-Coder-Next-AWQ-4bit

Text Generation • 14B • Updated Feb 3 • 89.1k • 26

rtj1/Qwen2.5-0.5B-AWQ-FP8-Dynamic

Text Generation • 0.6B • Updated Feb 10 • 3

rtj1/Qwen2.5-0.5B-AWQ-FP8-Block

Text Generation • 0.6B • Updated Feb 10 • 3

ludovicoYIN/MiniMax-M2-BF16

Text Generation • 229B • Updated Feb 8 • 6 • 1

vistralis/Qwen3-4B-NVFP4

Text Generation • 3B • Updated Feb 7 • 50 • 1