Instructions to use cross-encoder/ms-marco-MiniLM-L6-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use cross-encoder/ms-marco-MiniLM-L6-v2 with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Transformers
How to use cross-encoder/ms-marco-MiniLM-L6-v2 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("cross-encoder/ms-marco-MiniLM-L6-v2") model = AutoModelForSequenceClassification.from_pretrained("cross-encoder/ms-marco-MiniLM-L6-v2") - Notebooks
- Google Colab
- Kaggle
Can you please tell why there is no tokenizer.json file for some models ??
I want to use some of the light weight cross encoders but i need tokenizer.json file for that but it is not there for some models.
could you please tell how can i generate the tokenizer.json file.
Huh, that's odd. You can generate it like so:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("cross-encoder/ms-marco-MiniLM-L6-v2")
print(tokenizer)
tokenizer.save_pretrained("tmp")
that produces special_tokens_map.json, tokenizer_config.json, tokenizer.json, and vocab.txt. It looks like this repository has all except the tokenizer.json. Out of curiosity, what do you need the tokenizer.json file for exactly?
I'm looking into why this file was missing now.
- Tom Aarsen
Resolved via #8, also resolved on all other models under https://huggingface.co/cross-encoder
Thank you for reporting!
- Tom Aarsen
I want to use some light weight cross encoders along with qdrant vectorDB so there is a library fastembed which need all these 4 files including tokenizer.json to use the onnx model locally.
Thanks for sharing. It should work now!