README.md · mlx-community/InternVL3-8B-MLX-4bit at main

InternVL3-8B-MLX-4bit / README.md

swaylenhayes

Add InternVL3-8B 4-bit MLX conversion

b770bfb verified 7 days ago

preview code

raw

history blame contribute delete

2.27 kB

	---
	language:
	- multilingual
	license: other
	license_name: qwen
	license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
	library_name: mlx
	base_model:
	- mlx-community/InternVL3-8B-bf16
	tags:
	- mlx
	- mlx-vlm
	- internvl
	- internvl3
	- 4-bit
	- quantized
	- vision-language-model
	- apple-silicon
	pipeline_tag: image-text-to-text
	---

	# InternVL3-8B-MLX-4bit

	This repository contains a 4-bit MLX quantized conversion of `mlx-community/InternVL3-8B-bf16` for Apple Silicon inference.

	## Conversion Details

	\| Setting \| Value \|
	\| --- \| --- \|
	\| Source model \| `mlx-community/InternVL3-8B-bf16` \|
	\| Conversion tool \| `mlx_vlm.convert` \|
	\| Quantization bits \| `4` \|
	\| Group size \| `64` \|
	\| Quantization mode \| `affine` \|
	\| Quant predicate \| none (uniform quantization) \|

	Conversion command used:

	```bash
	python3 -m mlx_vlm convert \
	--hf-path "mlx-community/InternVL3-8B-bf16" \
	--mlx-path "./models/InternVL3-8B-4bit" \
	-q --q-bits 4 --q-group-size 64
	```

	## Validation

	\| Test \| Status \|
	\| --- \| --- \|
	\| Text generation load test \| passed \|

	Verification command:

	```bash
	python3 -m mlx_vlm generate \
	--model "./models/InternVL3-8B-4bit" \
	--prompt "Reply with exactly: OK" \
	--max-tokens 8 --temperature 0
	```

	Observed response: `OK`

	## Usage

	Install:

	```bash
	python3 -m pip install -U mlx-vlm
	```

	Run locally from this folder:

	```bash
	python3 -m mlx_vlm generate \
	--model "." \
	--prompt "Describe the image briefly." \
	--image path/to/image.jpg \
	--max-tokens 256 \
	--temperature 0
	```

	Run from Hugging Face after upload:

	```bash
	python3 -m mlx_vlm generate \
	--model "mlx-community/InternVL3-8B-MLX-4bit" \
	--prompt "Describe the image briefly." \
	--image path/to/image.jpg \
	--max-tokens 256 \
	--temperature 0
	```

	## Notes

	- This conversion does not upload anything automatically.
	- Quantization changes numerical behavior relative to bf16 weights.
	- During local tests, `mlx_vlm` emitted an upstream tokenizer regex warning from the source model assets.

	## Links

	- Source model: https://huggingface.co/mlx-community/InternVL3-8B-bf16
	- MLX: https://github.com/ml-explore/mlx
	- mlx-vlm: https://github.com/Blaizzy/mlx-vlm

	## License

	Follows the upstream model license terms from the source repository.