MD3P-Int4-Smol - Fully Quantized Moondream3 for iOS

A more aggressively quantized version of Moondream3 designed to fit in iOS memory constraints (~5.9GB available).

Model Details

Component	Original (md3p-int4)	This Model
MoE Experts (layers 4-23)	int4	int4 (unchanged)
Vision Encoder	BF16	int4
Text Attention	BF16	int4
Text MLP (layers 0-3)	BF16	int4
Embeddings	BF16	int4
Total Size	6.48 GB	5.43 GB

Quantization Details

Method: Affine quantization (bits=4, group_size=64)
Selective: Only BF16 tensors were quantized; existing int4 MoE weights preserved
Some layers kept at BF16: Vision fc2 layers (shape incompatible with group_size=64)

Usage

This model is designed for use with moondream-mlx Swift implementation.

let config = ModelConfiguration(id: "lewi/md3p-int4-smol")
let container = try await Moondream3Loader.loadContainer(configuration: config)

Source & License

Base Model: moondream/md3p-int4
Original Model: moondream/moondream3-preview
License: Apache 2.0 (same as original)

Changes from Base Model

This model applies additional int4 quantization to the BF16 components (vision encoder, text attention, embeddings) that were left unquantized in the original md3p-int4 release. This reduces model size from 6.48GB to 5.43GB, enabling deployment on iOS devices with ~6GB memory limits.

Acknowledgments

Thanks to the Moondream team for the original model and Apache 2.0 license.

Downloads last month: 5

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lewi/md3p-int4-smol

Base model

moondream/md3p-int4

Finetuned

(1)

this model