Open Source Impact

This repository provides an MLX-optimized, 4-bit quantized DeepSeek LLM for local inference on Apple silicon.

The model is publicly distributed via Hugging Face and used by members of the MLX community to run on-device LLM inference without relying on cloud APIs or CUDA-based stacks.

This project demonstrates practical hardware-aware ML optimization by leveraging Apple’s unified memory architecture and Metal GPU acceleration.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support