Edit Models filters

Model Tree

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

speculative-decoding

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

733

Base only

Active filters: speculative-decoding

ruv/ruvltra-medium

Text Generation • 1B • Updated Mar 28 • 25 • 4

alexchen4ai/qwen3-8B-eagle

nebius/MEDUSA-Llama-3.1-8B-Instruct

Text Generation • 3B • Updated May 27 • 25

nebius/MLP-Speculator-Llama-3.1-8B-Instruct

Text Generation • 0.2B • Updated May 27 • 9

nebius/EAGLE3-Llama-3.1-8B-Instruct

Text Generation • 0.4B • Updated May 27 • 413

nebius/EAGLE3-Llama-3.3-70B-Instruct

Text Generation • 1B • Updated May 27 • 2.38k

GadflyII/GLM-4.7-Flash-MTP-NVFP4

Text Generation • 19B • Updated Feb 2 • 1.06k • 5

nebius/EAGLE3-gpt-oss-20b

Text Generation • 0.4B • Updated May 27 • 559

nebius/EAGLE3-gpt-oss-120b

Text Generation • 0.4B • Updated May 27 • 418

nebius/EAGLE3-Qwen3-235B-A22B-Instruct-2507

Text Generation • 0.6B • Updated May 27 • 8

nebius/MTP-DeepSeek-V3-0324

Text Generation • 13B • Updated May 27 • 62 • 3

togethercomputer/Aurora-Spec-Qwen3-Coder-Next-FP8

Text Generation • 0.5B • Updated Feb 24 • 106 • 20

togethercomputer/Aurora-Spec-Minimax-M2.1

Text Generation • 0.9B • Updated Feb 24 • 15 • 2

z-lab/LLaMA3.1-8B-Instruct-DFlash-UltraChat

Text Generation • 1B • Updated Apr 7 • 15.8k • 3

StentorLabs/Stentor-12M

Text Generation • 12M • Updated Feb 10 • 81 • 1

badmadrad/Mistral-Small-3.1-DRAFT-0.5B-MLX-4bit

Text Generation • 92.7M • Updated Feb 13 • 61

thoughtworks/GLM-4.7-Flash-Eagle3

Text Generation • 0.1B • Updated Apr 12 • 115 • 2

Joysulem/FireEcho

Text Generation • Updated Feb 17 • 3

z-lab/gpt-oss-20b-DFlash

Text Generation • 0.8B • Updated Apr 7 • 12.5k • 24

z-lab/Qwen3-Coder-Next-DFlash

Text Generation • 0.5B • Updated Apr 7 • 3.21k • 13

togethercomputer/Aurora-Spec-Minimax-M2.5

Text Generation • 0.9B • Updated Mar 19 • 172 • 6

StentorLabs/Stentor-30M

Text Generation • 30.4M • Updated Feb 21 • 73 • 3

mradermacher/Stentor-30M-GGUF

Text Generation • 30.4M • Updated Feb 21 • 110 • 3

mradermacher/Stentor-30M-i1-GGUF

Text Generation • 30.4M • Updated Feb 21 • 289 • 2

StentorLabs/Stentor2-12M-Preview

Text Generation • 12.3M • Updated Feb 25 • 5

z-lab/gpt-oss-120b-DFlash

Text Generation • 0.8B • Updated Apr 7 • 1.53k • 10

husj576/GTO-deepseek-8B

Text Generation • Updated Mar 4 • 1

husj576/GTO-llama33-instruct-70B

Text Generation • Updated Mar 4 • 1

husj576/GTO-vicuna-13b

Text Generation • Updated Mar 4 • 393

husj576/GTO-qwen3-8B

Text Generation • 0.4B • Updated May 25 • 4