Instructions to use Qwen/Qwen3-Reranker-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Qwen/Qwen3-Reranker-0.6B with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Reranker-0.6B") model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-Reranker-0.6B") - sentence-transformers
How to use Qwen/Qwen3-Reranker-0.6B with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("Qwen/Qwen3-Reranker-0.6B") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
ValueError: Cannot handle batch sizes > 1 if no padding token is defined. Raised when running `query_engine.query(query)` in llamaindex
I'm trying to use Qwen3-embedding-0.6 for accessing a vector index already built by that same embedding model, then use qwen3-reranker-4b and qwen3:4b-q8_0 (Ollama), all three via LlamaIndex, in order to create a Q&A bot that can search, rerank and answer accurately the context of a given text.
The problem is with long markdown files. When I create an index of a large markdown file and load it with the embedding model, once I send the query to quere_engine, it waits for like a few seconds then returns the error above, which seems raised by forward() in modeling_qwen3.py.:
hidden_states = transformer_outputs.last_hidden_state
logits = self.score(hidden_states)
if input_ids is not None:
batch_size = input_ids.shape[0]
else:
batch_size = inputs_embeds.shape[0]
if self.config.pad_token_id is None and batch_size != 1:
raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
The only way it seems to work is when I introduce a smaller index composed of a small markdown file, but the big one returns that error. I've tried changing the Settings.tokenizer to different variants of the Qwen3 embedding/reranker models but none of them seems to work.
At this point I don't know if the issue is the file itself, a tokenizer mistake I made or something in between.
encountered the same error when trying to train a classification model from Qwen3-Embedding and solved by setting:model.config.pad_token_id = tokenizer.pad_token_id. Also some other posts set it to model.config.eos_token_id or tokenizer.eos_token_id. I'm not sure if this is the proper solution but it works fine on my classification task.