Edit Models filters

Model Tree

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

270

Base only

Active filters: llm-compressor

edge-inference/DSR1-14B-llmc-awq-w4

Text Generation • 3B • Updated Aug 30, 2025 • 1

edge-inference/DSR1-32B-llmc-awq-w4

Text Generation • 6B • Updated Aug 30, 2025 • 1

edge-inference/DSR1-1.5B-llmc-awq-w4

Text Generation • 2B • Updated Aug 30, 2025 • 1 • 1

kaitchup/Qwen3-1.7B-calib-OpenR1-Math-220k-16klen-NVFP4

1B • Updated Sep 8, 2025 • 3

kaitchup/Qwen3-1.7B-calib-OpenR1-Math-220k-2klen-NVFP4

1B • Updated Sep 8, 2025 • 4

kaitchup/Qwen3-4B-calib-OpenR1-Math-220k-16klen-NVFP4

3B • Updated Sep 8, 2025 • 3

kaitchup/Qwen3-4B-calib-OpenR1-Math-220k-2klen-NVFP4

3B • Updated Sep 8, 2025 • 3

itroot/Qwen3-4B-Instruct-2507-W8A8

Text Generation • 4B • Updated Sep 24, 2025

RedHatAI/Apertus-8B-Instruct-2509-FP8-dynamic

Text Generation • 8B • Updated Apr 28 • 1.28k • 3

RedHatAI/Apertus-70B-Instruct-2509-FP8-dynamic

Text Generation • 71B • Updated Sep 30, 2025 • 389 • 1

RedHatAI/Apertus-70B-Instruct-2509-quantized.w4a16

Text Generation • 11B • Updated Sep 23, 2025 • 176k • 1

kaitchup/Llama-3.1-8B-Instruct-NVFP4

5B • Updated Sep 24, 2025 • 4

itroot/Qwen3-4B-Thinking-2507-W8A8

Text Generation • 4B • Updated Sep 24, 2025 • 8

RedHatAI/Qwen3-VL-235B-A22B-Instruct-FP8-dynamic

Text Generation • 236B • Updated Oct 3, 2025 • 156 • 4

RedHatAI/Qwen3-VL-235B-A22B-Instruct-FP8-block

Text Generation • 236B • Updated Oct 27, 2025 • 25 • 3

RedHatAI/NVIDIA-Nemotron-Nano-9B-v2-FP8-dynamic

Text Generation • 9B • Updated Apr 28 • 2.52k • 3

RedHatAI/Llama-3.1-8B-Instruct-FP8-block

Text Generation • 8B • Updated Oct 29, 2025 • 8

nm-testing/Llama-3.1-70B-Instruct-FP8-block

Text Generation • Updated Oct 14, 2025

RedHatAI/Qwen3-14B-FP8-block

Text Generation • 15B • Updated Oct 24, 2025 • 33

nm-testing/Qwen3-30B-A3B-FP8-block

Text Generation • 3B • Updated Oct 27, 2025 • 5

RedHatAI/Qwen3-32B-FP8-block

Text Generation • 33B • Updated Oct 24, 2025 • 11

RedHatAI/Qwen3-8B-FP8-block

Text Generation • 8B • Updated Dec 31, 2025 • 141

nm-testing/Qwen3-VL-235B-A22B-Instruct-FP8-BLOCK

Text Generation • Updated Oct 27, 2025

ronantakizawa/SmolVLM-Instruct-gptq

Image-Text-to-Text • 2B • Updated Oct 18, 2025 • 4 • 1

ronantakizawa/SmolVLM-Instruct-awq

Image-Text-to-Text • 2B • Updated Oct 18, 2025 • 13 • 1

ronantakizawa/idefics3-8b-llama3-awq

Image-Text-to-Text • 2B • Updated Oct 17, 2025 • 2 • 1

ronantakizawa/molmo-7b-d-awq

Image-Text-to-Text • 2B • Updated Dec 20, 2025 • 16 • 3

ronantakizawa/molmoact-7b-d-awq

Image-Text-to-Text • 2B • Updated Oct 17, 2025 • 11 • 1

ronantakizawa/olmo2-32b-instruct-awq

Text Generation • 5B • Updated Oct 18, 2025 • 5 • 1

ronantakizawa/molmo-72b-awq

Image-Text-to-Text • 11B • Updated Dec 20, 2025 • 13 • 1