--- license: apache-2.0 tags: - vision - moondream - mlx - int8 - quantized base_model: moondream/moondream3-preview --- # MD3P-Int8 - INT8 Quantized Moondream3 for MLX An INT8 quantized version of Moondream3, offering a balance between model quality and size for MLX deployment. ## Model Details | Component | Original (BF16) | This Model | |-----------|-----------------|------------| | MoE Experts (layers 4-23) | BF16 | **int8** | | Vision Encoder | BF16 | BF16 (preserved) | | Text Attention | BF16 | **int8** | | Text MLP (layers 0-3) | BF16 | **int8** | | Embeddings | BF16 | BF16 (preserved) | | **Total Size** | ~12 GB | **~10 GB** | ## Quantization Details - **Method**: Affine quantization (bits=8, group_size=64) - **Target**: Text model layers (attention, MLP, MoE experts) - **Preserved**: Vision encoder and embeddings at BF16 for quality ## Comparison with INT4 Variants | Model | Size | Quality | Use Case | |-------|------|---------|----------| | md3p-int8 (this) | 10 GB | Higher | Desktop/Server MLX | | md3p-int4 | 6.48 GB | Medium | Memory-constrained | | md3p-int4-smol | 5.43 GB | Lower | iOS (~6GB limit) | ## Usage This model is designed for use with MLX-based Moondream implementations. ```python # Example with mlx-lm or similar from mlx_lm import load, generate model, tokenizer = load("lewi/md3p-int8") ``` ## Source & License - **Original Model**: [moondream/moondream3-preview](https://huggingface.co/moondream/moondream3-preview) - **License**: Apache 2.0 (same as original) ## Acknowledgments Thanks to the [Moondream](https://moondream.ai/) team for the original model and Apache 2.0 license.