Models Optimized for Rapid-MLX
Tested & benchmarked MLX models for Rapid-MLX — fastest local AI on Apple Silicon. Tool calling, reasoning, streaming verified.
1.0B • Updated • 27.4k • 21Note ⚡ 174 tok/s · Fastest small model · `rapid-mlx serve qwen3.5-4b`
mlx-community/Qwen3.5-9B-4bit
Image-Text-to-Text • 2B • Updated • 22.5k • 10Note ⚡ 107 tok/s · 100% tools · Best bang-for-buck · `rapid-mlx serve qwen3.5-9b`
mlx-community/Qwen3.5-27B-4bit
Image-Text-to-Text • 5B • Updated • 99.3k • 46Note ⚡ 46 tok/s · 100% tools · Strong reasoning · `rapid-mlx serve qwen3.5-27b`
mlx-community/Qwen3.5-35B-A3B-8bit
Image-Text-to-Text • 10B • Updated • 3.19k • 20Note ⚡ MoE · 100% tools · `rapid-mlx serve qwen3.5-35b`
nightmedia/Qwen3.5-122B-A10B-Text-mxfp4-mlx
Text Generation • 122B • Updated • 2.05k • 9Note ⚡ MoE 122B · 100% tools · `rapid-mlx serve qwen3.5-122b`
mlx-community/Qwen3.5-122B-A10B-8bit
Image-Text-to-Text • 35B • Updated • 14.4k • 1Note ⚡ MoE 122B 8bit · `rapid-mlx serve qwen3.5-122b-8bit`
lmstudio-community/Qwen3-Coder-Next-MLX-4bit
80B • Updated • 243k • 19Note 🧑💻 Best coding model · `rapid-mlx serve qwen3-coder`
mlx-community/gemma-4-26b-a4b-it-4bit
Image-Text-to-Text • 5B • Updated • 48.9k • 54Note ⚡ 85 tok/s · MoE · Vision · 100% tools · `rapid-mlx serve gemma-4-26b`
mlx-community/gemma-4-31b-it-4bit
Image-Text-to-Text • 5B • Updated • 37.6k • 38Note ⚡ 31 tok/s · Vision · 100% tools · `rapid-mlx serve gemma-4-31b`
mlx-community/gemma-3-12b-it-qat-4bit
Image-Text-to-Text • Updated • 65.1k • 18Note ⚡ Gemma 3 · `rapid-mlx serve gemma3-12b`
mlx-community/Mistral-Small-3.1-24B-Instruct-2503-4bit
Updated • 999 • 9Note ⚡ Mistral · `rapid-mlx serve mistral-24b`
mlx-community/GLM-4.7-4bit
Text Generation • 353B • Updated • 2.62k • 5Note ⚡ GLM · `rapid-mlx serve glm4.7-9b`
lmstudio-community/MiniMax-M2.5-MLX-4bit
Text Generation • 229B • Updated • 27.7kNote ⚡ MiniMax · Reasoning · `rapid-mlx serve minimax-m2.5`
mlx-community/DeepSeek-R1-0528-Qwen3-8B-4bit
Text Generation • 1B • Updated • 2.39k • 5Note 🧠 DeepSeek R1 · Reasoning · `rapid-mlx serve deepseek-r1-8b`
mlx-community/Hermes-3-Llama-3.1-8B-4bit
1B • Updated • 1.82k • 5Note ⚡ Hermes · `rapid-mlx serve hermes3-8b`
mlx-community/Llama-3.2-3B-Instruct-4bit
Text Generation • 0.5B • Updated • 27k • 43Note ⚡ Llama · `rapid-mlx serve llama3-3b`
mlx-community/Phi-4-mini-instruct-4bit
Text Generation • Updated • 12.8k • 1Note ⚡ 174 tok/s · Fastest overall · `rapid-mlx serve phi4-14b`
mlx-community/Devstral-Small-2-24B-Instruct-2512-4bit
Updated • 167k • 4Note 🧑💻 Coding · `rapid-mlx serve devstral-24b`
mlx-community/gpt-oss-20b-MXFP4-Q8
Text Generation • 21B • Updated • 493k • 59Note ⚡ 123 tok/s · `rapid-mlx serve gpt-oss-20b`
mlx-community/Kimi-K2-Instruct-4bit
Text Generation • 1T • Updated • 3.87k • 13Note ⚡ MoE 1T · `rapid-mlx serve kimi-48b`