InternVL3-8B-MLX-4bit

This repository contains a 4-bit MLX quantized conversion of mlx-community/InternVL3-8B-bf16 for Apple Silicon inference.

Conversion Details

Setting Value
Source model mlx-community/InternVL3-8B-bf16
Conversion tool mlx_vlm.convert
Quantization bits 4
Group size 64
Quantization mode affine
Quant predicate none (uniform quantization)

Conversion command used:

python3 -m mlx_vlm convert \
  --hf-path "mlx-community/InternVL3-8B-bf16" \
  --mlx-path "./models/InternVL3-8B-4bit" \
  -q --q-bits 4 --q-group-size 64

Validation

Test Status
Text generation load test passed

Verification command:

python3 -m mlx_vlm generate \
  --model "./models/InternVL3-8B-4bit" \
  --prompt "Reply with exactly: OK" \
  --max-tokens 8 --temperature 0

Observed response: OK

Usage

Install:

python3 -m pip install -U mlx-vlm

Run locally from this folder:

python3 -m mlx_vlm generate \
  --model "." \
  --prompt "Describe the image briefly." \
  --image path/to/image.jpg \
  --max-tokens 256 \
  --temperature 0

Run from Hugging Face after upload:

python3 -m mlx_vlm generate \
  --model "mlx-community/InternVL3-8B-MLX-4bit" \
  --prompt "Describe the image briefly." \
  --image path/to/image.jpg \
  --max-tokens 256 \
  --temperature 0

Notes

  • This conversion does not upload anything automatically.
  • Quantization changes numerical behavior relative to bf16 weights.
  • During local tests, mlx_vlm emitted an upstream tokenizer regex warning from the source model assets.

Links

License

Follows the upstream model license terms from the source repository.

Downloads last month
30
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/InternVL3-8B-MLX-4bit