MD3P-Int4-Smol - Fully Quantized Moondream3 for iOS

A more aggressively quantized version of Moondream3 designed to fit in iOS memory constraints (~5.9GB available).

Model Details

Component Original (md3p-int4) This Model
MoE Experts (layers 4-23) int4 int4 (unchanged)
Vision Encoder BF16 int4
Text Attention BF16 int4
Text MLP (layers 0-3) BF16 int4
Embeddings BF16 int4
Total Size 6.48 GB 5.43 GB

Quantization Details

  • Method: Affine quantization (bits=4, group_size=64)
  • Selective: Only BF16 tensors were quantized; existing int4 MoE weights preserved
  • Some layers kept at BF16: Vision fc2 layers (shape incompatible with group_size=64)

Usage

This model is designed for use with moondream-mlx Swift implementation.

let config = ModelConfiguration(id: "lewi/md3p-int4-smol")
let container = try await Moondream3Loader.loadContainer(configuration: config)

Source & License

Changes from Base Model

This model applies additional int4 quantization to the BF16 components (vision encoder, text attention, embeddings) that were left unquantized in the original md3p-int4 release. This reduces model size from 6.48GB to 5.43GB, enabling deployment on iOS devices with ~6GB memory limits.

Acknowledgments

Thanks to the Moondream team for the original model and Apache 2.0 license.

Downloads last month
5
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for lewi/md3p-int4-smol

Finetuned
(1)
this model