Zeknes
/

Qwen3-VL-Reranker-8B-MLX-4bit

multimodal rerank

4-bit precision

Model card Files Files and versions

Qwen3-VL-Reranker-8B-MLX-4bit / README.md

Zeknes's picture

Upload README.md with huggingface_hub

5efecc5 verified about 2 months ago

|

history blame contribute delete

1.63 kB

	---
	license: apache-2.0
	library_name: mlx
	pipeline_tag: text-ranking

	base_model:
	- Qwen/Qwen3-VL-Reranker-8B
	tags:
	- mlx
	- multimodal rerank
	- qwen
	- reranker
	- 4-bit
	---

	# Qwen3-VL-Reranker-8B-MLX-4bit

	This is the MLX 4-bit quantized version of [Qwen/Qwen3-VL-Reranker-8B](https://huggingface.co/Qwen/Qwen3-VL-Reranker-8B), optimized for Apple Silicon (Mac / iPad / iPhone) inference using the [MLX framework](https://github.com/ml-explore/mlx).

	## Quantization Info

	\| Config \| Value \|
	\|--------\|-------\|
	\| Bits \| 4 \|
	\| Group Size \| 64 \|
	\| Quantization Mode \| Affine \|
	\| Dtype \| bfloat16 \|

	## Model Overview

	- Model Type: MultiModal Reranker
	- Supported Modalities: Text, images, screenshots, videos, and arbitrary multimodal combinations
	- Parameters: 8B
	- Context Length: 32k
	- Languages: 30+

	## Requirements

	```bash
	pip install mlx-lm transformers
	```

	## Usage with `mlx-lm`

	```python
	from mlx_lm import load

	model, tokenizer = load("Zeknes/Qwen3-VL-Reranker-8B-MLX-4bit")
	```

	For full usage examples (multimodal reranking, vLLM), please refer to the original model page:
	[Qwen3-VL-Reranker-8B](https://huggingface.co/Qwen/Qwen3-VL-Reranker-8B)

	## Citation

	```bibtex
	@article{qwen3vlembedding,
	title={Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking},
	author={Li, Mingxin and Zhang, Yanzhao and Long, Dingkun and Chen Keqin and Song, Sibo and Bai, Shuai and Yang, Zhibo and Xie, Pengjun and Yang, An and Liu, Dayiheng and Zhou, Jingren and Lin, Junyang},
	journal={arXiv preprint arXiv:2601.04720},
	year={2026}
	}
	```