Text Ranking
sentence-transformers
PyTorch
JAX
ONNX
Safetensors
OpenVINO
Transformers
English
bert
text-classification
text-embeddings-inference
Instructions to use cross-encoder/ms-marco-MiniLM-L6-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use cross-encoder/ms-marco-MiniLM-L6-v2 with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Transformers
How to use cross-encoder/ms-marco-MiniLM-L6-v2 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("cross-encoder/ms-marco-MiniLM-L6-v2") model = AutoModelForSequenceClassification.from_pretrained("cross-encoder/ms-marco-MiniLM-L6-v2") - Notebooks
- Google Colab
- Kaggle
Add exported onnx model 'model_qint8_avx512_vnni.onnx'
#9
by xtlalert - opened
Hello!
This pull request has been automatically generated from the export_dynamic_quantized_onnx_model function from the Sentence Transformers library.
Config
QuantizationConfig(
is_static=False,
format=<QuantFormat.QOperator: 0>,
mode=<QuantizationMode.IntegerOps: 0>,
activations_dtype=<QuantType.QUInt8: 1>,
activations_symmetric=False,
weights_dtype=<QuantType.QInt8: 0>,
weights_symmetric=True,
per_channel=True,
reduce_range=False,
nodes_to_quantize=[],
nodes_to_exclude=[],
operators_to_quantize=['Conv',
'MatMul',
'Attention',
'LSTM',
'Gather',
'Transpose',
'EmbedLayerNormalization'],
qdq_add_pair_to_weight=False,
qdq_dedicated_pair=False,
qdq_op_type_per_channel_support_to_axis={'MatMul': 1}
)
Tip:
Consider testing this pull request before merging by loading the model from this PR with the revision argument:
from sentence_transformers import SentenceTransformer
# TODO: Fill in the PR number
pr_number = 2
model = SentenceTransformer(
"cross-encoder/ms-marco-MiniLM-L6-v2",
revision=f"refs/pr/{pr_number}",
backend="onnx",
model_kwargs={"file_name": "model_qint8_avx512_vnni.onnx"},
)
# Verify that everything works as expected
embeddings = model.encode(["The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium."])
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities)
xtlalert changed pull request status to closed
I'm working on ONNX/OpenVINO support for these rerankers, implementation is ready, still finishing benchmarks and docs.
Stay tuned!
- Tom Aarsen