MD3P-Int4-Smol - Fully Quantized Moondream3 for iOS

A more aggressively quantized version of Moondream3 designed to fit in iOS memory constraints (~5.9GB available).

Model Details

Component Original (md3p-int4) This Model
MoE Experts (layers 4-23) int4 int4 (unchanged)
Vision Encoder BF16 int4
Text Attention BF16 int4
Text MLP (layers 0-3) BF16 int4
Embeddings BF16 int4
Total Size 6.48 GB 5.43 GB

Quantization Details

  • Method: Affine quantization (bits=4, group_size=64)
  • Selective: Only BF16 tensors were quantized; existing int4 MoE weights preserved
  • Some layers kept at BF16: Vision fc2 layers (shape incompatible with group_size=64)

Usage

This model is designed for use with moondream-mlx Swift implementation.

let config = ModelConfiguration(id: "lewi/md3p-int4-smol")
let container = try await Moondream3Loader.loadContainer(configuration: config)

Source & License

Changes from Base Model

This model applies additional int4 quantization to the BF16 components (vision encoder, text attention, embeddings) that were left unquantized in the original md3p-int4 release. This reduces model size from 6.48GB to 5.43GB, enabling deployment on iOS devices with ~6GB memory limits.

Acknowledgments

Thanks to the Moondream team for the original model and Apache 2.0 license.

Downloads last month
37
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for lewi/md3p-int4-smol

Finetuned
(1)
this model