Zeknes's picture
Upload README.md with huggingface_hub
5efecc5 verified
metadata
license: apache-2.0
library_name: mlx
pipeline_tag: text-ranking
base_model:
  - Qwen/Qwen3-VL-Reranker-8B
tags:
  - mlx
  - multimodal rerank
  - qwen
  - reranker
  - 4-bit

Qwen3-VL-Reranker-8B-MLX-4bit

This is the MLX 4-bit quantized version of Qwen/Qwen3-VL-Reranker-8B, optimized for Apple Silicon (Mac / iPad / iPhone) inference using the MLX framework.

Quantization Info

Config Value
Bits 4
Group Size 64
Quantization Mode Affine
Dtype bfloat16

Model Overview

  • Model Type: MultiModal Reranker
  • Supported Modalities: Text, images, screenshots, videos, and arbitrary multimodal combinations
  • Parameters: 8B
  • Context Length: 32k
  • Languages: 30+

Requirements

pip install mlx-lm transformers

Usage with mlx-lm

from mlx_lm import load

model, tokenizer = load("Zeknes/Qwen3-VL-Reranker-8B-MLX-4bit")

For full usage examples (multimodal reranking, vLLM), please refer to the original model page: Qwen3-VL-Reranker-8B

Citation

@article{qwen3vlembedding,
  title={Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking},
  author={Li, Mingxin and Zhang, Yanzhao and Long, Dingkun and Chen Keqin and Song, Sibo and Bai, Shuai and Yang, Zhibo and Xie, Pengjun and Yang, An and Liu, Dayiheng and Zhou, Jingren and Lin, Junyang},
  journal={arXiv preprint arXiv:2601.04720},
  year={2026}
}