|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen3-Reranker-0.6B |
|
|
base_model_relation: quantized |
|
|
tags: |
|
|
- gguf |
|
|
- quantized |
|
|
- llama.cpp |
|
|
- text-ranking |
|
|
model_type: qwen3 |
|
|
quantized_by: Jonathan Middleton |
|
|
revision: 602838d |
|
|
--- |
|
|
|
|
|
# Qwen3-Reranker-0.6B-GGUF |
|
|
|
|
|
**π¨ REQUIRED Llama.cpp build:** https://github.com/ngxson/llama.cpp/tree/xsn/qwen3_embd_rerank |
|
|
**This unmerged fix branch is mandatory** to run Qwen3 reranking models. Other HF GGUF quantizations of the 0.6B reranker typically fail in mainline `llama.cpp` because they were not produced with this build. **This quantization was produced with the above build and works.** |
|
|
|
|
|
## Purpose |
|
|
Multilingual **text-reranking** model in **GGUF** for efficient CPU/GPU inference with *llama.cpp*-compatible back-ends. |
|
|
Parameters β **0.6 B**. |
|
|
|
|
|
**Note:** Token embedding matrix and output tensors are **left at FP16** across all quantizations. |
|
|
|
|
|
## Files |
|
|
| Filename | Quant | Size (bytes / MiB) | Est. quality Ξ vs FP16 | |
|
|
|--------------------------------------------|---------|------------------------------------|------------------------| |
|
|
| `Qwen3-Reranker-0.6B-F16.gguf` | FP16 | 1,197,634,048 B (1142.2 MiB) | 0 (reference) | |
|
|
| `Qwen3-Reranker-0.6B-Q4_K_M.gguf` | Q4_K_M | 396,476,032 B (378.1 MiB) | TBD | |
|
|
| `Qwen3-Reranker-0.6B-Q5_K_M.gguf` | Q5_K_M | 444,186,496 B (423.6 MiB) | TBD | |
|
|
| `Qwen3-Reranker-0.6B-Q6_K.gguf` | Q6_K | 494,878,880 B (472.0 MiB) | TBD | |
|
|
| `Qwen3-Reranker-0.6B-Q8_0.gguf` | Q8_0 | 639,153,088 B (609.5 MiB) | TBD | |
|
|
|
|
|
## Upstream Source |
|
|
* **Repo:** `Qwen/Qwen3-Reranker-0.6B` |
|
|
* **Commit:** `f16fc5d` (2025-06-09) |
|
|
* **License:** Apache-2.0 |
|
|
|
|
|
## Conversion & Quantization |
|
|
```bash |
|
|
# Convert safetensors β GGUF (FP16) |
|
|
python convert_hf_to_gguf.py ~/models/local/Qwen3-Reranker-0.6B |
|
|
|
|
|
# Quantize variants |
|
|
EMB_OPT="--token-embedding-type F16 --leave-output-tensor" |
|
|
for QT in Q4_K_M Q5_K_M Q6_K Q8_0; do |
|
|
llama-quantize $EMB_OPT Qwen3-Reranker-0.6B-F16.gguf Qwen3-Reranker-0.6B-${QT}.gguf $QT |
|
|
done |
|
|
|
|
|
|