JonathanMiddleton
/

Qwen3-Reranker-0.6B

Model card Files Files and versions

JonathanMiddleton commited on Aug 19, 2025

Commit

774578e

·

verified ·

1 Parent(s): 602838d

Update README.md

Files changed (1) hide show

README.md +50 -3

README.md CHANGED Viewed

@@ -1,3 +1,50 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+base_model: Qwen/Qwen3-Reranker-0.6B
+base_model_relation: quantized
+tags:
+  - gguf
+  - quantized
+  - llama.cpp
+  - text-ranking
+model_type: qwen3
+quantized_by: Jonathan Middleton
+revision: 602838d   # Aug 19 2025
+---
+# Qwen3-Reranker-0.6B-GGUF
+**🚨 REQUIRED Llama.cpp build:** https://github.com/ngxson/llama.cpp/tree/xsn/qwen3_embd_rerank
+**This unmerged fix branch is mandatory** to run Qwen3 embedding **and** reranking models. Other HF GGUF quantizations of the reranker typically fail in mainline `llama.cpp` because they were not produced with this build. **This quantization was produced with the above build and works.**
+## Purpose
+Multilingual **text-reranking** model in **GGUF** for efficient CPU/GPU inference with *llama.cpp*-compatible back-ends.
+Parameters ≈ **0.6 B**.
+**Note:** Token embedding matrix and output tensors are **left at FP16** across all quantizations.
+## Files
+| Filename                                   | Quant   | Size (bytes / MiB)                 | Est. quality Δ vs FP16 |
+|--------------------------------------------|---------|------------------------------------|------------------------|
+| `Qwen3-Reranker-0.6B-F16.gguf`             | FP16    | 1,197,634,048 B (1142.2 MiB)       | 0 (reference)          |
+| `Qwen3-Reranker-0.6B-Q4_K_M.gguf`          | Q4_K_M  |   396,476,032 B (378.1 MiB)        | TBD                    |
+| `Qwen3-Reranker-0.6B-Q5_K_M.gguf`          | Q5_K_M  |   444,186,496 B (423.6 MiB)        | TBD                    |
+| `Qwen3-Reranker-0.6B-Q6_K.gguf`            | Q6_K    |   494,878,880 B (472.0 MiB)        | TBD                    |
+| `Qwen3-Reranker-0.6B-Q8_0.gguf`            | Q8_0    |   639,153,088 B (609.5 MiB)        | TBD                    |
+## Upstream Source
+* **Repo:** `Qwen/Qwen3-Reranker-0.6B`
+* **Commit:** `f16fc5d` (2025-06-09)
+* **License:** Apache-2.0
+## Conversion & Quantization
+```bash
+# Convert safetensors → GGUF (FP16)
+python convert_hf_to_gguf.py ~/models/local/Qwen3-Reranker-0.6B
+# Quantize variants
+EMB_OPT="--token-embedding-type F16 --leave-output-tensor"
+for QT in Q4_K_M Q5_K_M Q6_K Q8_0; do
+  llama-quantize $EMB_OPT Qwen3-Reranker-0.6B-F16.gguf Qwen3-Reranker-0.6B-${QT}.gguf $QT
+done