Instructions to use michaelfeil/mxbai-rerank-base-v2-seq with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use michaelfeil/mxbai-rerank-base-v2-seq with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("michaelfeil/mxbai-rerank-base-v2-seq") model = AutoModelForSequenceClassification.from_pretrained("michaelfeil/mxbai-rerank-base-v2-seq") - sentence-transformers
How to use michaelfeil/mxbai-rerank-base-v2-seq with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("michaelfeil/mxbai-rerank-base-v2-seq") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
Divergence with original model nearing max. sequence length?
Hello!
In my quick tests with the original model, it has some very specific rules when it comes to maximum sequence lengths and truncation that make it tricky to implement an AutoTokenizer that matches 100%. My understanding is that this model technically also differs when you reach the maximum sequence length, is that true?
Granted, the maximum sequence length is really high, something like 32k tokens presumably based on the position embeddings.
Also - congratulations on your BEI release with Baseten! It looks very solid
- Tom Aarsen
Hey Tom,
Agree! There is to following rule, setting the max_length of query + document to 8192 (or 32768, if none is set). https://github.com/mixedbread-ai/mxbai-rerank/blob/7592f2d37db7d2dcc9627ad6dd002e8f4d4cc82b/mxbai_rerank/mxbai_rerank_v2.py#L79
The system prompt is maybe 100tokens + truncated to 6k query tokens + rest of budget to document.
Also, query and document are split by the \n token, which is tokenized, then appended. (aka, if tokenize + detokenize + tokenize again, the result would be different).
BAAI's llm reranker has similar policies: https://github.com/FlagOpen/FlagEmbedding/blob/d5292b68758f41c7911fe85596cdd0329901a3a5/FlagEmbedding/inference/reranker/decoder_only/base.py#L385
Not sure if any of this helps, let me know!
Thanks!
Very interesting! Thanks for sharing. These LLM-style reranker formats are quite tricky in terms of tokenization.
- Tom Aarsen
Yeah, at least with the "ForSequenceClassification" they should be quite easy to run e.g. with vLLM / TEI etc.