moondream3-preview-mlx-8bit

An 8-bit MLX quantization of moondream/moondream3-preview for running on Apple Silicon with mlx-vlm.

Quantization affine, 8 bits, group size 64 (vision tower included)
On-disk size ~9.5 GB
Peak memory ~11 GB
Tokenizer loaded from moondream/starmie-v1 at runtime (not bundled)

Usage

pip install mlx-vlm

python -m mlx_vlm.generate \
  --model beshkenadze/moondream3-preview-mlx-8bit \
  --image path/to/image.jpg \
  --prompt "Describe this image." \
  --max-tokens 128 --temperature 0.0

License

moondream3 is released under the Business Source License 1.1 (BSL 1.1) — see LICENSE.md. This quantization is a derivative redistribution under the same terms.

Downloads last month
45
Safetensors
Model size
3B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for beshkenadze/moondream3-preview-mlx-8bit

Quantized
(3)
this model