swaylenhayes's picture
Add InternVL3-8B 4-bit MLX conversion
b770bfb verified
metadata
language:
  - multilingual
license: other
license_name: qwen
license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
library_name: mlx
base_model:
  - mlx-community/InternVL3-8B-bf16
tags:
  - mlx
  - mlx-vlm
  - internvl
  - internvl3
  - 4-bit
  - quantized
  - vision-language-model
  - apple-silicon
pipeline_tag: image-text-to-text

InternVL3-8B-MLX-4bit

This repository contains a 4-bit MLX quantized conversion of mlx-community/InternVL3-8B-bf16 for Apple Silicon inference.

Conversion Details

Setting Value
Source model mlx-community/InternVL3-8B-bf16
Conversion tool mlx_vlm.convert
Quantization bits 4
Group size 64
Quantization mode affine
Quant predicate none (uniform quantization)

Conversion command used:

python3 -m mlx_vlm convert \
  --hf-path "mlx-community/InternVL3-8B-bf16" \
  --mlx-path "./models/InternVL3-8B-4bit" \
  -q --q-bits 4 --q-group-size 64

Validation

Test Status
Text generation load test passed

Verification command:

python3 -m mlx_vlm generate \
  --model "./models/InternVL3-8B-4bit" \
  --prompt "Reply with exactly: OK" \
  --max-tokens 8 --temperature 0

Observed response: OK

Usage

Install:

python3 -m pip install -U mlx-vlm

Run locally from this folder:

python3 -m mlx_vlm generate \
  --model "." \
  --prompt "Describe the image briefly." \
  --image path/to/image.jpg \
  --max-tokens 256 \
  --temperature 0

Run from Hugging Face after upload:

python3 -m mlx_vlm generate \
  --model "mlx-community/InternVL3-8B-MLX-4bit" \
  --prompt "Describe the image briefly." \
  --image path/to/image.jpg \
  --max-tokens 256 \
  --temperature 0

Notes

  • This conversion does not upload anything automatically.
  • Quantization changes numerical behavior relative to bf16 weights.
  • During local tests, mlx_vlm emitted an upstream tokenizer regex warning from the source model assets.

Links

License

Follows the upstream model license terms from the source repository.