| | --- |
| | language: |
| | - multilingual |
| | license: other |
| | license_name: qwen |
| | license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE |
| | library_name: mlx |
| | base_model: |
| | - mlx-community/InternVL3-8B-bf16 |
| | tags: |
| | - mlx |
| | - mlx-vlm |
| | - internvl |
| | - internvl3 |
| | - 4-bit |
| | - quantized |
| | - vision-language-model |
| | - apple-silicon |
| | pipeline_tag: image-text-to-text |
| | --- |
| | |
| | # InternVL3-8B-MLX-4bit |
| |
|
| | This repository contains a 4-bit MLX quantized conversion of `mlx-community/InternVL3-8B-bf16` for Apple Silicon inference. |
| |
|
| | ## Conversion Details |
| |
|
| | | Setting | Value | |
| | | --- | --- | |
| | | Source model | `mlx-community/InternVL3-8B-bf16` | |
| | | Conversion tool | `mlx_vlm.convert` | |
| | | Quantization bits | `4` | |
| | | Group size | `64` | |
| | | Quantization mode | `affine` | |
| | | Quant predicate | none (uniform quantization) | |
| |
|
| | Conversion command used: |
| |
|
| | ```bash |
| | python3 -m mlx_vlm convert \ |
| | --hf-path "mlx-community/InternVL3-8B-bf16" \ |
| | --mlx-path "./models/InternVL3-8B-4bit" \ |
| | -q --q-bits 4 --q-group-size 64 |
| | ``` |
| |
|
| | ## Validation |
| |
|
| | | Test | Status | |
| | | --- | --- | |
| | | Text generation load test | passed | |
| |
|
| | Verification command: |
| |
|
| | ```bash |
| | python3 -m mlx_vlm generate \ |
| | --model "./models/InternVL3-8B-4bit" \ |
| | --prompt "Reply with exactly: OK" \ |
| | --max-tokens 8 --temperature 0 |
| | ``` |
| |
|
| | Observed response: `OK` |
| |
|
| | ## Usage |
| |
|
| | Install: |
| |
|
| | ```bash |
| | python3 -m pip install -U mlx-vlm |
| | ``` |
| |
|
| | Run locally from this folder: |
| |
|
| | ```bash |
| | python3 -m mlx_vlm generate \ |
| | --model "." \ |
| | --prompt "Describe the image briefly." \ |
| | --image path/to/image.jpg \ |
| | --max-tokens 256 \ |
| | --temperature 0 |
| | ``` |
| |
|
| | Run from Hugging Face after upload: |
| |
|
| | ```bash |
| | python3 -m mlx_vlm generate \ |
| | --model "mlx-community/InternVL3-8B-MLX-4bit" \ |
| | --prompt "Describe the image briefly." \ |
| | --image path/to/image.jpg \ |
| | --max-tokens 256 \ |
| | --temperature 0 |
| | ``` |
| |
|
| | ## Notes |
| |
|
| | - This conversion does not upload anything automatically. |
| | - Quantization changes numerical behavior relative to bf16 weights. |
| | - During local tests, `mlx_vlm` emitted an upstream tokenizer regex warning from the source model assets. |
| |
|
| | ## Links |
| |
|
| | - Source model: https://huggingface.co/mlx-community/InternVL3-8B-bf16 |
| | - MLX: https://github.com/ml-explore/mlx |
| | - mlx-vlm: https://github.com/Blaizzy/mlx-vlm |
| |
|
| | ## License |
| |
|
| | Follows the upstream model license terms from the source repository. |
| |
|