MD3P-Int4-Smol - Fully Quantized Moondream3 for iOS
A more aggressively quantized version of Moondream3 designed to fit in iOS memory constraints (~5.9GB available).
Model Details
| Component | Original (md3p-int4) | This Model |
|---|---|---|
| MoE Experts (layers 4-23) | int4 | int4 (unchanged) |
| Vision Encoder | BF16 | int4 |
| Text Attention | BF16 | int4 |
| Text MLP (layers 0-3) | BF16 | int4 |
| Embeddings | BF16 | int4 |
| Total Size | 6.48 GB | 5.43 GB |
Quantization Details
- Method: Affine quantization (bits=4, group_size=64)
- Selective: Only BF16 tensors were quantized; existing int4 MoE weights preserved
- Some layers kept at BF16: Vision fc2 layers (shape incompatible with group_size=64)
Usage
This model is designed for use with moondream-mlx Swift implementation.
let config = ModelConfiguration(id: "lewi/md3p-int4-smol")
let container = try await Moondream3Loader.loadContainer(configuration: config)
Source & License
- Base Model: moondream/md3p-int4
- Original Model: moondream/moondream3-preview
- License: Apache 2.0 (same as original)
Changes from Base Model
This model applies additional int4 quantization to the BF16 components (vision encoder, text attention, embeddings) that were left unquantized in the original md3p-int4 release. This reduces model size from 6.48GB to 5.43GB, enabling deployment on iOS devices with ~6GB memory limits.
Acknowledgments
Thanks to the Moondream team for the original model and Apache 2.0 license.
- Downloads last month
- 37
Hardware compatibility
Log In
to add your hardware
Quantized
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for lewi/md3p-int4-smol
Base model
moondream/md3p-int4