qmd-query-expansion-qwen3.5-2B-mlx-4bit
MLX 4-bit quantized version of tobil/qmd-query-expansion-qwen3.5-2B for Apple Silicon.
Model Details
- Base model: tobil/qmd-query-expansion-qwen3.5-2B (Qwen3.5-2B VL, multimodal Image-Text-to-Text)
- Quantization: 4-bit (4.503 bits per weight average)
- Framework: MLX
- Memory: ~1.0 GB (down from ~4 GB float16)
- Use case: Query expansion for visual-language memory search (RecallForge)
Usage
from mlx_lm import load, generate
model, tokenizer = load("bmeyer2025/qmd-query-expansion-qwen3.5-2B-mlx-4bit")
response = generate(model, tokenizer, prompt="Expand this search query: transformer attention mechanism")
Purpose
This model generates lexical, semantic, and hypothetical document expansions for search queries in RecallForge, a cross-modal visual-language memory search system.
Given a query like "transformer attention", it produces:
- Lexical expansion: Related keywords for BM25 search
- Vector expansion: Semantically rich rephrasing for embedding search
- HyDE expansion: Hypothetical document passage for retrieval
Quantization
Converted using mlx_lm.convert:
mlx_lm.convert --hf-path tobil/qmd-query-expansion-qwen3.5-2B -q --q-bits 4
License
Apache 2.0 (same as base model)
- Downloads last month
- 124
Model size
0.3B params
Tensor type
BF16
·
U32 ·
F32 ·
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for bmeyer2025/qmd-query-expansion-qwen3.5-2B-mlx-4bit
Base model
Qwen/Qwen3.5-2B-Base Finetuned
Qwen/Qwen3.5-2B Finetuned
unsloth/Qwen3.5-2B Finetuned
tobil/qmd-query-expansion-qwen3.5-2B