qmd-query-expansion-qwen3.5-2B-mlx-4bit

MLX 4-bit quantized version of tobil/qmd-query-expansion-qwen3.5-2B for Apple Silicon.

Model Details

  • Base model: tobil/qmd-query-expansion-qwen3.5-2B (Qwen3.5-2B VL, multimodal Image-Text-to-Text)
  • Quantization: 4-bit (4.503 bits per weight average)
  • Framework: MLX
  • Memory: ~1.0 GB (down from ~4 GB float16)
  • Use case: Query expansion for visual-language memory search (RecallForge)

Usage

from mlx_lm import load, generate

model, tokenizer = load("bmeyer2025/qmd-query-expansion-qwen3.5-2B-mlx-4bit")
response = generate(model, tokenizer, prompt="Expand this search query: transformer attention mechanism")

Purpose

This model generates lexical, semantic, and hypothetical document expansions for search queries in RecallForge, a cross-modal visual-language memory search system.

Given a query like "transformer attention", it produces:

  • Lexical expansion: Related keywords for BM25 search
  • Vector expansion: Semantically rich rephrasing for embedding search
  • HyDE expansion: Hypothetical document passage for retrieval

Quantization

Converted using mlx_lm.convert:

mlx_lm.convert --hf-path tobil/qmd-query-expansion-qwen3.5-2B -q --q-bits 4

License

Apache 2.0 (same as base model)

Downloads last month
124
Safetensors
Model size
0.3B params
Tensor type
BF16
·
U32
·
F32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bmeyer2025/qmd-query-expansion-qwen3.5-2B-mlx-4bit

Finetuned
Qwen/Qwen3.5-2B
Quantized
(2)
this model