Edit Models filters

Model Tree

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

45

Base only

Active filters: W4A16

QuantTrio/GLM-5.2-Int4-Int8Mix

Text Generation • 785B • Updated 9 days ago • 41.3k • 7

festr2/GLM-5.2-Int8Mix-NVFP4

Text Generation • Updated 8 days ago • 219 • 2

ModelCloud/QwQ-32B-Preview-gptqmodel-4bit-vortex-v2

Text Generation • 33B • Updated Dec 18, 2024 • 14 • 16

ModelCloud/QwQ-32B-Preview-gptqmodel-4bit-vortex-v3

Text Generation • 33B • Updated Dec 20, 2024 • 6 • 14

ModelCloud/Falcon3-10B-Instruct-gptqmodel-4bit-vortex-v1

Text Generation • 10B • Updated Dec 21, 2024 • 6 • 3

ModelCloud/Qwen2.5-0.5B-Instruct-gptqmodel-w4a16

Text Generation • 0.5B • Updated Oct 19, 2025 • 23 • 1

ModelCloud/DeepSeek-R1-Distill-Qwen-7B-gptqmodel-4bit-vortex-v1

Text Generation • 8B • Updated Jan 24, 2025 • 7 • 6

ModelCloud/DeepSeek-R1-Distill-Qwen-7B-gptqmodel-4bit-vortex-v2

Text Generation • 8B • Updated Jan 24, 2025 • 80 • 8

RedHatAI/phi-4-quantized.w4a16

Text Generation • 15B • Updated 21 days ago • 94.8k • 5

RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16

Image-Text-to-Text • 24B • Updated 21 days ago • 1.93k • 10

RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16

Image-Text-to-Text • 109B • Updated 21 days ago • 7.16k • 13

pyrymikko/nomic-embed-code-W4A16-AWQ

7B • Updated Sep 30, 2025 • 6.46k

tcclaviger/Minimax-M2-Thrift-GPTQ-W4A16-AMD

Text Generation • 24B • Updated Dec 1, 2025 • 8 • 1

TevunahAi/granite-34b-code-instruct-8k-Ultra-Hybrid

Text Generation • 11B • Updated Dec 1, 2025 • 5

TevunahAi/Llama-3.1-70B-Instruct-Ultra-Hybrid

Text Generation • 22B • Updated Dec 4, 2025 • 3

Vishva007/Qwen3-4B-Instruct-2507-W4A16-AutoRound

Text Generation • 0.9B • Updated Jan 30 • 2

Vishva007/Qwen3-VL-8B-Instruct-W4A16-AutoRound

Image-Text-to-Text • 2B • Updated Feb 7 • 427

Vishva007/Qwen3-VL-2B-Instruct-W4A16-AutoRound

Image-Text-to-Text • 0.9B • Updated Feb 7 • 2

Vishva007/Qwen3-VL-2B-Instruct-W4A16-AutoRound-GPTQ

Image-Text-to-Text • 2B • Updated Feb 7 • 3

Vishva007/Qwen3-VL-2B-Instruct-W4A16-AutoRound-AWQ

Image-Text-to-Text • 2B • Updated Feb 7 • 27

Vishva007/Qwen3-VL-4B-Instruct-W4A16-AutoRound

Image-Text-to-Text • 1B • Updated Feb 7 • 1

Vishva007/Qwen3-VL-4B-Instruct-W4A16-AutoRound-GPTQ

Image-Text-to-Text • 4B • Updated Feb 7 • 11

Vishva007/Qwen3-VL-4B-Instruct-W4A16-AutoRound-AWQ

Image-Text-to-Text • 4B • Updated Feb 7 • 146 • 1

embedl/Cosmos-Reason2-2B-W4A16

Image-Text-to-Text • 2B • Updated May 19 • 1.07k • 11

bg-digitalservices/Gemma-4-26B-A4B-it-NVFP4A16

Text Generation • 15B • Updated Apr 5 • 3.38k • 5

bg-digitalservices/Apertus-8B-2509-NVFP4A16

Text Generation • 5B • Updated Apr 6 • 7

bg-digitalservices/Apertus-8B-Instruct-2509-NVFP4A16

Text Generation • 5B • Updated Apr 6 • 5

bg-digitalservices/Apertus-70B-2509-NVFP4A16

Text Generation • 36B • Updated Apr 6 • 29

bg-digitalservices/Apertus-70B-Instruct-2509-NVFP4A16

Text Generation • 36B • Updated Apr 6 • 17 • 1

bg-digitalservices/Gemma-4-E2B-NVFP4A16

Text Generation • 4B • Updated Apr 6 • 20