Instructions to use cross-encoder/ms-marco-MiniLM-L6-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use cross-encoder/ms-marco-MiniLM-L6-v2 with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Transformers
How to use cross-encoder/ms-marco-MiniLM-L6-v2 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("cross-encoder/ms-marco-MiniLM-L6-v2") model = AutoModelForSequenceClassification.from_pretrained("cross-encoder/ms-marco-MiniLM-L6-v2") - Notebooks
- Google Colab
- Kaggle
Reranker Model Performance Optimization
Hello,
We are using the ms-marco-MiniLM-L-6-v2 model for our conceptual search application. Initially, we were reranking the top 1000 results, which was taking an average time of 2β2.5 seconds. For deployment, we were using Flask with Gunicorn, configured with 5 workers on a single GPU machine.
Now, we are planning to increase the reranking scope from the top 1000 results to the top 3000. However, we are facing a significant performance hit, with the average time increasing to 6β7 seconds for reranking the passed top results. We are loading the model as per the documentation provided in the model card for the sentence-transformers package.
Could you please advise if we are doing anything wrong, either in terms of model loading or from a deployment perspective?
Additionally, as per the model card, the model is capable of reranking 1800 documents/chunks per sec.
Thanks.