Edit Models filters

Model Tree

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

98

Base only

Active filters: kv-cache

tonera/FLUX.2-klein-9b-kv-Nunchaku

Image-to-Image • Updated 10 days ago • 353 • 10

fromthesky/PLDR-LLM-v51-104M

Text Generation • 0.1B • Updated Dec 21, 2025 • 5

fromthesky/PLDR-LLM-v51-110M-1

Text Generation • 0.1B • Updated Dec 21, 2025 • 18

fromthesky/PLDR-LLM-v51-110M-2

Text Generation • 0.1B • Updated Dec 21, 2025 • 5

fromthesky/PLDR-LLM-v51-110M-3

Text Generation • 0.1B • Updated Dec 21, 2025 • 10

fromthesky/PLDR-LLM-v51-110M-4

Text Generation • 0.1B • Updated Dec 21, 2025 • 6

fromthesky/PLDR-LLM-v51-110M-5

Text Generation • 0.1B • Updated Dec 21, 2025 • 6

fromthesky/PLDR-LLM-v51-DAG-110M

Text Generation • 0.1B • Updated Dec 21, 2025 • 4

fromthesky/PLDR-LLM-v51G-106M-1

Text Generation • 0.1B • Updated Dec 21, 2025 • 7

fromthesky/PLDR-LLM-v51G-106M-2

Text Generation • 0.1B • Updated Dec 21, 2025 • 6

fromthesky/PLDR-LLM-v51G-106M-3

Text Generation • 0.1B • Updated Dec 21, 2025 • 5

fromthesky/PLDR-LLM-v51G-106M-test

Text Generation • 0.1B • Updated Aug 27, 2025 • 2

fromthesky/PLDR-LLM-v52-81M-FT-SC-1

Text Classification • 81M • Updated Sep 20, 2025 • 5

fromthesky/PLDR-LLM-v52-81M-FT-QA-1

Question Answering • 81M • Updated Sep 20, 2025 • 10

fromthesky/PLDR-LLM-v52-81M-FT-TC-1

Token Classification • 81M • Updated Sep 20, 2025 • 4

fromthesky/PLDR-LLM-v52-110M-1

Text Generation • 0.1B • Updated Dec 21, 2025 • 3

nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Tensor

Updated Nov 25, 2025

nm-testing/Llama-3.1-8B-Instruct-QKV-Cache-FP8-Per-Head

Updated Nov 25, 2025

nm-testing/Llama-3.1-8B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Tensor

Updated Nov 25, 2025

nm-testing/Llama-3.1-8B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Head

Updated Nov 25, 2025

nm-testing/Qwen3-32B-QKV-Cache-FP8-Per-Tensor

Updated Nov 25, 2025

nm-testing/Qwen3-32B-QKV-Cache-FP8-Per-Head

Updated Nov 25, 2025

nm-testing/Qwen3-32B-FP8-dynamic-QKV-Cache-FP8-Per-Tensor

Updated Nov 25, 2025

nm-testing/Qwen3-32B-FP8-dynamic-QKV-Cache-FP8-Per-Head

Updated Nov 25, 2025

nm-testing/Llama-3.3-70B-Instruct-QKV-Cache-FP8-Per-Tensor

Updated Nov 25, 2025

nm-testing/Llama-3.3-70B-Instruct-QKV-Cache-FP8-Per-Head

Updated Nov 25, 2025

nm-testing/Llama-3.3-70B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Tensor

Updated Nov 25, 2025

nm-testing/Llama-3.3-70B-Instruct-FP8-dynamic-QKV-Cache-FP8-Per-Head

Updated Nov 25, 2025

ddddamn/IronCell-Mark-1

anthonym21/Mistral-7B-v0.3-CoDA-GQA-L

Text Generation • 7B • Updated Feb 24 • 216