Edit Models filters

Model Tree

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

646

Base only

Active filters: int4

ISTA-DASLab/Llama-3.1-8B-Instruct-MR-GPTQ-nvfp

Image-Text-to-Text • 5B • Updated Oct 3, 2025 • 4

ISTA-DASLab/Llama-3.1-8B-Instruct-MR-GPTQ-mxfp

Image-Text-to-Text • 5B • Updated Oct 3, 2025 • 3

huawei-csl/Qwen3-1.7B-4bit-SINQ

Text Generation • 1B • Updated Feb 2 • 3 • 5

huawei-csl/Qwen3-1.7B-4bit-ASINQ

Text Generation • 1B • Updated Feb 2 • 4 • 5

huawei-csl/Qwen3-32B-4bit-SINQ

Text Generation • 18B • Updated Feb 2 • 16 • 7

huawei-csl/Qwen3-14B-4bit-SINQ

Text Generation • 9B • Updated Feb 2 • 6 • 5

huawei-csl/Qwen3-14B-4bit-ASINQ

Text Generation • 9B • Updated Feb 2 • 1 • 6

huawei-csl/Qwen3-32B-4bit-ASINQ

Text Generation • 18B • Updated Feb 2 • 7 • 8

ModelCloud/GLM-4.6-GPTQMODEL-W4A16-v1

Text Generation • 357B • Updated Oct 28, 2025 • 7

ModelCloud/GLM-4.6-GPTQMODEL-W4A16-v2

Text Generation • 357B • Updated Oct 28, 2025 • 2 • 1

PangaiaSoftware/YanoljaNEXT-Rosetta-4B-onnx

Translation • Updated Oct 21, 2025 • 3 • 2

RedHatAI/NVIDIA-Nemotron-Nano-9B-v2-quantized.w4a16

Text Generation • 2B • Updated Apr 28 • 2.33k • 5

ModelCloud/GLM-4.6-REAP-268B-A32B-GPTQMODEL-W4A16

Text Generation • 269B • Updated Oct 28, 2025 • 5 • 2

AhtnaGlen/phi-4-mini-instruct-int4-sym-npu-ov

Text Generation • Updated Oct 27, 2025 • 26

tencent/DeepSeek-V3.1-Terminus-W4AFP8

Text Generation • 349B • Updated Nov 4, 2025 • 475 • 16

ModelCloud/MiniMax-M2-GPTQMODEL-W4A16

Text Generation • 229B • Updated Oct 28, 2025 • 9 • 3

ModelCloud/Marin-32B-Base-GPTQMODEL-W4A16

Text Generation • 33B • Updated Oct 29, 2025 • 1 • 1

ModelCloud/Marin-32B-Base-GPTQMODEL-AWQ-W4A16

Text Generation • 33B • Updated Oct 30, 2025 • 6 • 2

huawei-csl/Apertus-8B-2509-4bit-SINQ

Text Generation • 5B • Updated Feb 2 • 5 • 2

huawei-csl/Apertus-8B-2509-4bit-ASINQ

Text Generation • 5B • Updated Feb 2 • 291 • 3

ModelCloud/Granite-4.0-H-1B-GPTQMODEL-W4A16

Text Generation • 1B • Updated Oct 31, 2025 • 1 • 1

ModelCloud/Granite-4.0-H-350M-GPTQMODEL-W4A16

Text Generation • 0.3B • Updated Oct 31, 2025 • 2 • 1

ModelCloud/Brumby-14B-Base-GPTQMODEL-W4A16

Text Generation • 15B • Updated Oct 31, 2025 • 1

ModelCloud/Brumby-14B-Base-GPTQMODEL-W4A16-v2

Text Generation • 15B • Updated Oct 31, 2025 • 4 • 1

SherlockID365/Qwen3-VL-8B-Instruct-quantized.w4a16

Image-Text-to-Text • 3B • Updated Nov 3, 2025 • 276 • 1

Ishant86/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-compressed-tensors-int4

6B • Updated Nov 13, 2025 • 11

zandzpider/Qwen3-30B-A3B-abliterated-erotic-autoround-int4

0.6B • Updated Nov 27, 2025 • 15 • 1

ikarius/Granite-3.2-8b-instruct-Abliterated-gs128-GPTQ-INT4

Text Generation • 8B • Updated Nov 18, 2025 • 4 • 1

huawei-csl/Kimi-Linear-48B-A3B-Instruct-4bit-SINQ

Text Generation • 27B • Updated Feb 2 • 13 • 3

huawei-csl/Qwen3-Next-80B-A3B-Instruct-4bit-SINQ

Text Generation • Updated Feb 2 • 13 • 2