MD3P-Int8 - INT8 Quantized Moondream3 for MLX

An INT8 quantized version of Moondream3, offering a balance between model quality and size for MLX deployment.

Model Details

Model	Size	Quality	Use Case
md3p-int8 (this)	10 GB	Higher	Desktop/Server MLX
md3p-int4	6.48 GB	Medium	Memory-constrained
md3p-int4-smol	5.43 GB	Lower	iOS (~6GB limit)

This model is designed for use with MLX-based Moondream implementations.

# Example with mlx-lm or similar
from mlx_lm import load, generate

model, tokenizer = load("lewi/md3p-int8")

Thanks to the Moondream team for the original model and Apache 2.0 license.

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

(1)

this model