The qwen3-reranker is now supported in llama-cpp-python. This project provides a test GGUF file.
llama-cpp-python: https://github.com/JamePeng/llama-cpp-python
Code example:
import llama_cpp
from llama_cpp.llama_embedding import LlamaEmbedding
# Initialize a Reranking model
ranker = LlamaEmbedding(
model_path="path\to\Qwen3-Reranker-0.6B-Q8_0.gguf",
pooling_type=llama_cpp.LLAMA_POOLING_TYPE_RANK, # Crucial for Rerankers!
n_gpu_layers=-1,
n_ctx=0
)
query = "What causes Rain?"
docs = [
"Clouds are made of water droplets...", # Relevant
"To bake a cake you need flour...", # Irrelevant
"Rain is liquid water in the form of droplets..." # Highly Relevant
]
# Calculate relevance scores
# Logic: Constructs inputs like "[BOS] query [SEP] doc [EOS]" automatically
scores = ranker.rank(query, docs)
# Result: List of floats (higher means more relevant)
print(scores)
# e.g., [0.0011407170677557588, 5.614783731289208e-05, 0.7173627614974976] -> The 3rd doc is the best match
- Downloads last month
- 88
Hardware compatibility
Log In to add your hardware
4-bit
8-bit
16-bit