|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- vision |
|
|
- moondream |
|
|
- mlx |
|
|
- int8 |
|
|
- quantized |
|
|
base_model: moondream/moondream3-preview |
|
|
--- |
|
|
|
|
|
# MD3P-Int8 - INT8 Quantized Moondream3 for MLX |
|
|
|
|
|
An INT8 quantized version of Moondream3, offering a balance between model quality and size for MLX deployment. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
| Component | Original (BF16) | This Model | |
|
|
|-----------|-----------------|------------| |
|
|
| MoE Experts (layers 4-23) | BF16 | **int8** | |
|
|
| Vision Encoder | BF16 | BF16 (preserved) | |
|
|
| Text Attention | BF16 | **int8** | |
|
|
| Text MLP (layers 0-3) | BF16 | **int8** | |
|
|
| Embeddings | BF16 | BF16 (preserved) | |
|
|
| **Total Size** | ~12 GB | **~10 GB** | |
|
|
|
|
|
## Quantization Details |
|
|
|
|
|
- **Method**: Affine quantization (bits=8, group_size=64) |
|
|
- **Target**: Text model layers (attention, MLP, MoE experts) |
|
|
- **Preserved**: Vision encoder and embeddings at BF16 for quality |
|
|
|
|
|
## Comparison with INT4 Variants |
|
|
|
|
|
| Model | Size | Quality | Use Case | |
|
|
|-------|------|---------|----------| |
|
|
| md3p-int8 (this) | 10 GB | Higher | Desktop/Server MLX | |
|
|
| md3p-int4 | 6.48 GB | Medium | Memory-constrained | |
|
|
| md3p-int4-smol | 5.43 GB | Lower | iOS (~6GB limit) | |
|
|
|
|
|
## Usage |
|
|
|
|
|
This model is designed for use with MLX-based Moondream implementations. |
|
|
|
|
|
```python |
|
|
# Example with mlx-lm or similar |
|
|
from mlx_lm import load, generate |
|
|
|
|
|
model, tokenizer = load("lewi/md3p-int8") |
|
|
``` |
|
|
|
|
|
## Source & License |
|
|
|
|
|
- **Original Model**: [moondream/moondream3-preview](https://huggingface.co/moondream/moondream3-preview) |
|
|
- **License**: Apache 2.0 (same as original) |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
Thanks to the [Moondream](https://moondream.ai/) team for the original model and Apache 2.0 license. |
|
|
|