Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,50 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
base_model: Qwen/Qwen3-Reranker-0.6B
|
| 4 |
+
base_model_relation: quantized
|
| 5 |
+
tags:
|
| 6 |
+
- gguf
|
| 7 |
+
- quantized
|
| 8 |
+
- llama.cpp
|
| 9 |
+
- text-ranking
|
| 10 |
+
model_type: qwen3
|
| 11 |
+
quantized_by: Jonathan Middleton
|
| 12 |
+
revision: 602838d # Aug 19 2025
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# Qwen3-Reranker-0.6B-GGUF
|
| 16 |
+
|
| 17 |
+
**🚨 REQUIRED Llama.cpp build:** https://github.com/ngxson/llama.cpp/tree/xsn/qwen3_embd_rerank
|
| 18 |
+
**This unmerged fix branch is mandatory** to run Qwen3 embedding **and** reranking models. Other HF GGUF quantizations of the reranker typically fail in mainline `llama.cpp` because they were not produced with this build. **This quantization was produced with the above build and works.**
|
| 19 |
+
|
| 20 |
+
## Purpose
|
| 21 |
+
Multilingual **text-reranking** model in **GGUF** for efficient CPU/GPU inference with *llama.cpp*-compatible back-ends.
|
| 22 |
+
Parameters ≈ **0.6 B**.
|
| 23 |
+
|
| 24 |
+
**Note:** Token embedding matrix and output tensors are **left at FP16** across all quantizations.
|
| 25 |
+
|
| 26 |
+
## Files
|
| 27 |
+
| Filename | Quant | Size (bytes / MiB) | Est. quality Δ vs FP16 |
|
| 28 |
+
|--------------------------------------------|---------|------------------------------------|------------------------|
|
| 29 |
+
| `Qwen3-Reranker-0.6B-F16.gguf` | FP16 | 1,197,634,048 B (1142.2 MiB) | 0 (reference) |
|
| 30 |
+
| `Qwen3-Reranker-0.6B-Q4_K_M.gguf` | Q4_K_M | 396,476,032 B (378.1 MiB) | TBD |
|
| 31 |
+
| `Qwen3-Reranker-0.6B-Q5_K_M.gguf` | Q5_K_M | 444,186,496 B (423.6 MiB) | TBD |
|
| 32 |
+
| `Qwen3-Reranker-0.6B-Q6_K.gguf` | Q6_K | 494,878,880 B (472.0 MiB) | TBD |
|
| 33 |
+
| `Qwen3-Reranker-0.6B-Q8_0.gguf` | Q8_0 | 639,153,088 B (609.5 MiB) | TBD |
|
| 34 |
+
|
| 35 |
+
## Upstream Source
|
| 36 |
+
* **Repo:** `Qwen/Qwen3-Reranker-0.6B`
|
| 37 |
+
* **Commit:** `f16fc5d` (2025-06-09)
|
| 38 |
+
* **License:** Apache-2.0
|
| 39 |
+
|
| 40 |
+
## Conversion & Quantization
|
| 41 |
+
```bash
|
| 42 |
+
# Convert safetensors → GGUF (FP16)
|
| 43 |
+
python convert_hf_to_gguf.py ~/models/local/Qwen3-Reranker-0.6B
|
| 44 |
+
|
| 45 |
+
# Quantize variants
|
| 46 |
+
EMB_OPT="--token-embedding-type F16 --leave-output-tensor"
|
| 47 |
+
for QT in Q4_K_M Q5_K_M Q6_K Q8_0; do
|
| 48 |
+
llama-quantize $EMB_OPT Qwen3-Reranker-0.6B-F16.gguf Qwen3-Reranker-0.6B-${QT}.gguf $QT
|
| 49 |
+
done
|
| 50 |
+
|