Qwen3-Reranker-8B โ GGUF (llama.cpp)
Working GGUF of Qwen/Qwen3-Reranker-8B for llama.cpp. Converted 2025-03-09 with the official convert_hf_to_gguf.py.
Available files
| File | Quant | Size | Description |
|---|---|---|---|
Qwen3-Reranker-8B-F16.gguf |
F16 | 14.10 GB | Full precision, no quality loss |
Qwen3-Reranker-8B-Q8_0.gguf |
Q8_0 | 7.49 GB | 8-bit quantized, half the size |
Does it work?
Yes. Most community GGUFs of Qwen3-Reranker produce garbage scores (4.5e-23) because they're missing reranker-specific tensors. See llama.cpp #16407. This one works:
Doc 0 (relevant): relevance_score = 0.99XX
Doc 1 (irrelevant): relevance_score = 0.00XX
Quick start
llama-server -m Qwen3-Reranker-8B-f16.gguf --reranking --pooling rank --embedding --port 8081
curl http://localhost:8081/v1/rerank \
-H "Content-Type: application/json" \
-d '{
"query": "employment termination notice period",
"documents": [
"The Labour Code requires 30 calendar days written notice.",
"Corporate tax rates for small enterprises."
]
}'
Use /v1/rerank, not /v1/embeddings. The embeddings endpoint returns zeros for reranker models.
What's different about this GGUF?
The official convert_hf_to_gguf.py detects Qwen3-Reranker and does things naive converters skip:
- Extracts
cls.output.weight(the yes/no classifier) fromlm_head - Sets
pooling_type = RANKmetadata - Bakes in the rerank chat template
- Sets
classifier.output_labels = ["yes", "no"]
Without these, llama-server has nothing to compute scores from.
Known broken GGUFs
- DevQuasar/Qwen.Qwen3-Reranker-4B-GGUF โ broken.
models.ini example
[Qwen3-Reranker-8B-f16]
model = /path/to/Qwen3-Reranker-8B-f16.gguf
reranking = true
pooling = rank
embedding = true
ctx-size = 32768
For a full multi-model setup guide (embedding + reranking + chat on one server), see the llama-server Qwen3 guide.
Convert it yourself
pip install huggingface_hub gguf torch safetensors sentencepiece
python -c "from huggingface_hub import snapshot_download; snapshot_download('Qwen/Qwen3-Reranker-8B', local_dir='Qwen3-Reranker-8B-src')"
python convert_hf_to_gguf.py --outtype f16 --outfile Qwen3-Reranker-8B-f16.gguf Qwen3-Reranker-8B-src/
License
Apache 2.0 โ same as the original model.
- Downloads last month
- 90
Hardware compatibility
Log In to add your hardware
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for Voodisss/Qwen3-Reranker-8B-GGUF-llama_cpp
Collection including Voodisss/Qwen3-Reranker-8B-GGUF-llama_cpp
Collection
Working Qwen3-Reranker GGUFs (0.6B, 4B, 8B) converted with the official convert_hf_to_gguf.py. Most community conversions are broken โ missing cls.out โข 3 items โข Updated