Instructions to use Qwen/Qwen3-Reranker-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Qwen/Qwen3-Reranker-0.6B with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Reranker-0.6B") model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Reranker-0.6B") - sentence-transformers
How to use Qwen/Qwen3-Reranker-0.6B with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("Qwen/Qwen3-Reranker-0.6B") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
Performance Optimization recommendations for Qwen3 Reranker 0.6B on A100/H100 GPUs
#20
by rajshah14 - opened
Hi Team, thank you so much for providing these models.
I was wondering if you could recommend some GPU optimizations for Qwen3 reranker 0.6B model that can be done reduce latency nvidia A100/H100 GPUs.
PS: I have tried the vLLM sample in the model card.
rajshah14 changed discussion title from Performance Optimization recommendations on A100/H100 GPUs to Performance Optimization recommendations for Qwen3 Reranker 0.6B on A100/H100 GPUs
As far as I know, vLLM is currently the easiest way to serve this model.
https://medium.com/@kimdoil1211/deploying-qwen3-reranker-8b-with-vllm-instruction-aware-reranking-for-next-generation-retrieval-c35a57c9f0a6
If you can wait a bit, TEI will likely add support for qwen3 reranker
https://github.com/huggingface/text-embeddings-inference/pull/730