Edit Models filters

Model Tree

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

2,565

Base only

Active filters: 2-bit

ChenMnZ/Llama-2-13b-EfficientQAT-w2g128-GPTQ

Text Generation • 13B • Updated Jul 22, 2024 • 4

ChenMnZ/Llama-2-13b-EfficientQAT-w2g128-BitBLAS

Text Generation • 51B • Updated Jul 22, 2024 • 3

ChenMnZ/Llama-2-13b-EfficientQAT-w2g64-BitBLAS

Text Generation • 51B • Updated Jul 22, 2024 • 1

ChenMnZ/Llama-2-13b-EfficientQAT-w2g64-GPTQ

Text Generation • 13B • Updated Jul 22, 2024 • 3

ChenMnZ/Llama-2-70b-EfficientQAT-w2g128-BitBLAS

Text Generation • 274B • Updated Jul 22, 2024 • 3

ChenMnZ/Llama-2-70b-EfficientQAT-w2g128-GPTQ

Text Generation • 69B • Updated Jul 22, 2024 • 3

ChenMnZ/Llama-2-70b-EfficientQAT-w2g64-GPTQ

Text Generation • 69B • Updated Jul 22, 2024 • 4

ChenMnZ/Llama-2-7b-EfficientQAT-w2g128-GPTQ

Text Generation • 7B • Updated Jul 22, 2024 • 7

ChenMnZ/Llama-2-7b-EfficientQAT-w2g64-GPTQ

Text Generation • 7B • Updated Jul 22, 2024 • 2 • 1

ChenMnZ/Llama-3-70b-EfficientQAT-w2g128-GPTQ

Text Generation • 71B • Updated Jul 22, 2024 • 4

ChenMnZ/Llama-3-70b-EfficientQAT-w2g64-GPTQ

Text Generation • 71B • Updated Jul 22, 2024 • 3

ChenMnZ/Llama-3-70b-instruct-EfficientQAT-w2g128-GPTQ

Text Generation • 71B • Updated Jul 22, 2024 • 6

ChenMnZ/Llama-3-70b-instruct-EfficientQAT-w2g64-GPTQ

Text Generation • 71B • Updated Jul 22, 2024 • 5

ChenMnZ/Llama-2-7b-EfficientQAT-w2g128-BitBLAS

Text Generation • 26B • Updated Jul 22, 2024 • 7

ChenMnZ/Llama-2-7b-EfficientQAT-w2g64-BitBLAS

Text Generation • 26B • Updated Jul 22, 2024 • 4

ChenMnZ/Llama-3-70b-EfficientQAT-w2g128-BitBLAS

Text Generation • 276B • Updated Jul 22, 2024 • 4

ChenMnZ/Llama-3-8b-EfficientQAT-w2g128-GPTQ

Text Generation • 8B • Updated Jul 22, 2024 • 44

ChenMnZ/Llama-3-8b-EfficientQAT-w2g64-GPTQ

Text Generation • 8B • Updated Jul 22, 2024 • 4

ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w2g128-GPTQ

Text Generation • 8B • Updated Jul 22, 2024 • 7 • 1

ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w2g64-GPTQ

Text Generation • 8B • Updated Jul 22, 2024 • 11

ChenMnZ/Llama-3-70b-EfficientQAT-w2g64-BitBLAS

Text Generation • 276B • Updated Jul 22, 2024 • 3

ChenMnZ/Llama-3-70b-instruct-EfficientQAT-w2g128-BitBLAS

Text Generation • 276B • Updated Jul 22, 2024 • 3

ChenMnZ/Llama-3-70b-instruct-EfficientQAT-w2g64-BitBLAS

Text Generation • 276B • Updated Jul 22, 2024 • 5

ChenMnZ/Llama-3-8b-EfficientQAT-w2g128-BitBLAS

Text Generation • 29B • Updated Jul 22, 2024 • 3

ChenMnZ/Llama-3-8b-EfficientQAT-w2g64-BitBLAS

Text Generation • 29B • Updated Jul 22, 2024 • 4

ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w2g128-BitBLAS

Text Generation • 29B • Updated Jul 22, 2024 • 3

ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w2g64-BitBLAS

Text Generation • 29B • Updated Jul 22, 2024 • 3

MaziyarPanahi/SmolLM-135M-Instruct-GGUF

Text Generation • 0.1B • Updated Jul 22, 2024 • 480 • 2

MaziyarPanahi/SmolLM-360M-Instruct-GGUF

Text Generation • 0.4B • Updated Jul 22, 2024 • 156 • 1

MaziyarPanahi/SmolLM-1.7B-Instruct-GGUF

Text Generation • 2B • Updated Jul 22, 2024 • 1.13k • 4