lewi
/

md3p-int8

Model card Files Files and versions

md3p-int8 / README.md

lewi's picture

Upload folder using huggingface_hub

9729a9e verified about 1 month ago

|

history blame contribute delete

1.64 kB

	---
	license: apache-2.0
	tags:
	- vision
	- moondream
	- mlx
	- int8
	- quantized
	base_model: moondream/moondream3-preview
	---

	# MD3P-Int8 - INT8 Quantized Moondream3 for MLX

	An INT8 quantized version of Moondream3, offering a balance between model quality and size for MLX deployment.

	## Model Details

	\| Component \| Original (BF16) \| This Model \|
	\|-----------\|-----------------\|------------\|
	\| MoE Experts (layers 4-23) \| BF16 \| int8 \|
	\| Vision Encoder \| BF16 \| BF16 (preserved) \|
	\| Text Attention \| BF16 \| int8 \|
	\| Text MLP (layers 0-3) \| BF16 \| int8 \|
	\| Embeddings \| BF16 \| BF16 (preserved) \|
	\| Total Size \| ~12 GB \| ~10 GB \|

	## Quantization Details

	- Method: Affine quantization (bits=8, group_size=64)
	- Target: Text model layers (attention, MLP, MoE experts)
	- Preserved: Vision encoder and embeddings at BF16 for quality

	## Comparison with INT4 Variants

	\| Model \| Size \| Quality \| Use Case \|
	\|-------\|------\|---------\|----------\|
	\| md3p-int8 (this) \| 10 GB \| Higher \| Desktop/Server MLX \|
	\| md3p-int4 \| 6.48 GB \| Medium \| Memory-constrained \|
	\| md3p-int4-smol \| 5.43 GB \| Lower \| iOS (~6GB limit) \|

	## Usage

	This model is designed for use with MLX-based Moondream implementations.

	```python
	# Example with mlx-lm or similar
	from mlx_lm import load, generate

	model, tokenizer = load("lewi/md3p-int8")
	```

	## Source & License

	- Original Model: [moondream/moondream3-preview](https://huggingface.co/moondream/moondream3-preview)
	- License: Apache 2.0 (same as original)

	## Acknowledgments

	Thanks to the [Moondream](https://moondream.ai/) team for the original model and Apache 2.0 license.