Models Optimized for Rapid-MLX - a Raullen Collection

Raullen 's Collections

updated Apr 11

Tested & benchmarked MLX models for Rapid-MLX — fastest local AI on Apple Silicon. Tool calling, reasoning, streaming verified.

Upvote

mlx-community/Qwen3.5-4B-MLX-4bit

1.0B • Updated Mar 2 • 27.4k • 21

Note ⚡ 174 tok/s · Fastest small model · `rapid-mlx serve qwen3.5-4b`
mlx-community/Qwen3.5-9B-4bit

Image-Text-to-Text • 2B • Updated Mar 2 • 22.5k • 10

Note ⚡ 107 tok/s · 100% tools · Best bang-for-buck · `rapid-mlx serve qwen3.5-9b`
mlx-community/Qwen3.5-27B-4bit

Image-Text-to-Text • 5B • Updated Feb 24 • 99.3k • 46

Note ⚡ 46 tok/s · 100% tools · Strong reasoning · `rapid-mlx serve qwen3.5-27b`
mlx-community/Qwen3.5-35B-A3B-8bit

Image-Text-to-Text • 10B • Updated Feb 24 • 3.19k • 20

Note ⚡ MoE · 100% tools · `rapid-mlx serve qwen3.5-35b`
nightmedia/Qwen3.5-122B-A10B-Text-mxfp4-mlx

Text Generation • 122B • Updated Mar 11 • 2.05k • 9

Note ⚡ MoE 122B · 100% tools · `rapid-mlx serve qwen3.5-122b`
mlx-community/Qwen3.5-122B-A10B-8bit

Image-Text-to-Text • 35B • Updated Feb 24 • 14.4k • 1

Note ⚡ MoE 122B 8bit · `rapid-mlx serve qwen3.5-122b-8bit`
lmstudio-community/Qwen3-Coder-Next-MLX-4bit

80B • Updated Feb 2 • 243k • 19

Note 🧑‍💻 Best coding model · `rapid-mlx serve qwen3-coder`
mlx-community/gemma-4-26b-a4b-it-4bit

Image-Text-to-Text • 5B • Updated Apr 13 • 48.9k • 54

Note ⚡ 85 tok/s · MoE · Vision · 100% tools · `rapid-mlx serve gemma-4-26b`
mlx-community/gemma-4-31b-it-4bit

Image-Text-to-Text • 5B • Updated Apr 13 • 37.6k • 38

Note ⚡ 31 tok/s · Vision · 100% tools · `rapid-mlx serve gemma-4-31b`
mlx-community/gemma-3-12b-it-qat-4bit

Image-Text-to-Text • Updated Apr 21, 2025 • 65.1k • 18

Note ⚡ Gemma 3 · `rapid-mlx serve gemma3-12b`
mlx-community/Mistral-Small-3.1-24B-Instruct-2503-4bit

Updated Mar 19, 2025 • 999 • 9

Note ⚡ Mistral · `rapid-mlx serve mistral-24b`
mlx-community/GLM-4.7-4bit

Text Generation • 353B • Updated Jan 1 • 2.62k • 5

Note ⚡ GLM · `rapid-mlx serve glm4.7-9b`
lmstudio-community/MiniMax-M2.5-MLX-4bit

Text Generation • 229B • Updated Feb 13 • 27.7k

Note ⚡ MiniMax · Reasoning · `rapid-mlx serve minimax-m2.5`
mlx-community/DeepSeek-R1-0528-Qwen3-8B-4bit

Text Generation • 1B • Updated May 30, 2025 • 2.39k • 5

Note 🧠 DeepSeek R1 · Reasoning · `rapid-mlx serve deepseek-r1-8b`
mlx-community/Hermes-3-Llama-3.1-8B-4bit

1B • Updated Aug 16, 2024 • 1.82k • 5

Note ⚡ Hermes · `rapid-mlx serve hermes3-8b`
mlx-community/Llama-3.2-3B-Instruct-4bit

Text Generation • 0.5B • Updated Mar 5, 2025 • 27k • 43

Note ⚡ Llama · `rapid-mlx serve llama3-3b`
mlx-community/Phi-4-mini-instruct-4bit

Text Generation • Updated Mar 5, 2025 • 12.8k • 1

Note ⚡ 174 tok/s · Fastest overall · `rapid-mlx serve phi4-14b`
mlx-community/Devstral-Small-2-24B-Instruct-2512-4bit

Updated Dec 10, 2025 • 167k • 4

Note 🧑‍💻 Coding · `rapid-mlx serve devstral-24b`
mlx-community/gpt-oss-20b-MXFP4-Q8

Text Generation • 21B • Updated Mar 19 • 493k • 59

Note ⚡ 123 tok/s · `rapid-mlx serve gpt-oss-20b`
mlx-community/Kimi-K2-Instruct-4bit

Text Generation • 1T • Updated Jul 11, 2025 • 3.87k • 13

Note ⚡ MoE 1T · `rapid-mlx serve kimi-48b`

Upvote