JonathanMiddleton commited on
Commit
774578e
·
verified ·
1 Parent(s): 602838d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -3
README.md CHANGED
@@ -1,3 +1,50 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen3-Reranker-0.6B
4
+ base_model_relation: quantized
5
+ tags:
6
+ - gguf
7
+ - quantized
8
+ - llama.cpp
9
+ - text-ranking
10
+ model_type: qwen3
11
+ quantized_by: Jonathan Middleton
12
+ revision: 602838d # Aug 19 2025
13
+ ---
14
+
15
+ # Qwen3-Reranker-0.6B-GGUF
16
+
17
+ **🚨 REQUIRED Llama.cpp build:** https://github.com/ngxson/llama.cpp/tree/xsn/qwen3_embd_rerank
18
+ **This unmerged fix branch is mandatory** to run Qwen3 embedding **and** reranking models. Other HF GGUF quantizations of the reranker typically fail in mainline `llama.cpp` because they were not produced with this build. **This quantization was produced with the above build and works.**
19
+
20
+ ## Purpose
21
+ Multilingual **text-reranking** model in **GGUF** for efficient CPU/GPU inference with *llama.cpp*-compatible back-ends.
22
+ Parameters ≈ **0.6 B**.
23
+
24
+ **Note:** Token embedding matrix and output tensors are **left at FP16** across all quantizations.
25
+
26
+ ## Files
27
+ | Filename | Quant | Size (bytes / MiB) | Est. quality Δ vs FP16 |
28
+ |--------------------------------------------|---------|------------------------------------|------------------------|
29
+ | `Qwen3-Reranker-0.6B-F16.gguf` | FP16 | 1,197,634,048 B (1142.2 MiB) | 0 (reference) |
30
+ | `Qwen3-Reranker-0.6B-Q4_K_M.gguf` | Q4_K_M | 396,476,032 B (378.1 MiB) | TBD |
31
+ | `Qwen3-Reranker-0.6B-Q5_K_M.gguf` | Q5_K_M | 444,186,496 B (423.6 MiB) | TBD |
32
+ | `Qwen3-Reranker-0.6B-Q6_K.gguf` | Q6_K | 494,878,880 B (472.0 MiB) | TBD |
33
+ | `Qwen3-Reranker-0.6B-Q8_0.gguf` | Q8_0 | 639,153,088 B (609.5 MiB) | TBD |
34
+
35
+ ## Upstream Source
36
+ * **Repo:** `Qwen/Qwen3-Reranker-0.6B`
37
+ * **Commit:** `f16fc5d` (2025-06-09)
38
+ * **License:** Apache-2.0
39
+
40
+ ## Conversion & Quantization
41
+ ```bash
42
+ # Convert safetensors → GGUF (FP16)
43
+ python convert_hf_to_gguf.py ~/models/local/Qwen3-Reranker-0.6B
44
+
45
+ # Quantize variants
46
+ EMB_OPT="--token-embedding-type F16 --leave-output-tensor"
47
+ for QT in Q4_K_M Q5_K_M Q6_K Q8_0; do
48
+ llama-quantize $EMB_OPT Qwen3-Reranker-0.6B-F16.gguf Qwen3-Reranker-0.6B-${QT}.gguf $QT
49
+ done
50
+