moondream3-preview-mlx-8bit

An 8-bit MLX quantization of moondream/moondream3-preview for running on Apple Silicon with mlx-vlm.


Quantization	affine, 8 bits, group size 64 (vision tower included)
On-disk size	~9.5 GB
Peak memory	~11 GB
Tokenizer	loaded from `moondream/starmie-v1` at runtime (not bundled)

Usage

pip install mlx-vlm

python -m mlx_vlm.generate \
  --model beshkenadze/moondream3-preview-mlx-8bit \
  --image path/to/image.jpg \
  --prompt "Describe this image." \
  --max-tokens 128 --temperature 0.0

License

moondream3 is released under the Business Source License 1.1 (BSL 1.1) — see LICENSE.md. This quantization is a derivative redistribution under the same terms.

Downloads last month: 45

Safetensors

Model size

3B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for beshkenadze/moondream3-preview-mlx-8bit

Base model

moondream/moondream3-preview

Quantized

(3)

this model