md3p-int8 / README.md
lewi's picture
Upload folder using huggingface_hub
9729a9e verified
metadata
license: apache-2.0
tags:
  - vision
  - moondream
  - mlx
  - int8
  - quantized
base_model: moondream/moondream3-preview

MD3P-Int8 - INT8 Quantized Moondream3 for MLX

An INT8 quantized version of Moondream3, offering a balance between model quality and size for MLX deployment.

Model Details

Component Original (BF16) This Model
MoE Experts (layers 4-23) BF16 int8
Vision Encoder BF16 BF16 (preserved)
Text Attention BF16 int8
Text MLP (layers 0-3) BF16 int8
Embeddings BF16 BF16 (preserved)
Total Size ~12 GB ~10 GB

Quantization Details

  • Method: Affine quantization (bits=8, group_size=64)
  • Target: Text model layers (attention, MLP, MoE experts)
  • Preserved: Vision encoder and embeddings at BF16 for quality

Comparison with INT4 Variants

Model Size Quality Use Case
md3p-int8 (this) 10 GB Higher Desktop/Server MLX
md3p-int4 6.48 GB Medium Memory-constrained
md3p-int4-smol 5.43 GB Lower iOS (~6GB limit)

Usage

This model is designed for use with MLX-based Moondream implementations.

# Example with mlx-lm or similar
from mlx_lm import load, generate

model, tokenizer = load("lewi/md3p-int8")

Source & License

Acknowledgments

Thanks to the Moondream team for the original model and Apache 2.0 license.