Sentence Similarity
sentence-transformers
ONNX
Safetensors
Transformers
Transformers.js
English
bert
feature-extraction
text-embeddings-inference
information-retrieval
knowledge-distillation
Instructions to use MongoDB/mdbr-leaf-ir with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use MongoDB/mdbr-leaf-ir with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("MongoDB/mdbr-leaf-ir") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use MongoDB/mdbr-leaf-ir with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("MongoDB/mdbr-leaf-ir") model = AutoModel.from_pretrained("MongoDB/mdbr-leaf-ir") - Transformers.js
How to use MongoDB/mdbr-leaf-ir with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('sentence-similarity', 'MongoDB/mdbr-leaf-ir'); - Inference
- Notebooks
- Google Colab
- Kaggle
Upload README.md
Browse files
README.md
CHANGED
|
@@ -53,21 +53,19 @@ A technical report detailing our proposed `LEAF` training procedure will be avai
|
|
| 53 |
|
| 54 |
The table below shows the average BEIR benchmark scores (nDCG@10) for `mdbr-leaf-ir` compared to other retrieval models.
|
| 55 |
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
|
| 59 |
-
|---
|
| 60 |
-
|
|
| 61 |
-
|
|
| 62 |
-
|
|
| 63 |
-
|
|
| 64 |
-
|
|
| 65 |
-
|
|
| 66 |
-
|
|
| 67 |
-
|
| 68 |
-
|
|
| 69 |
-
| MiniLM-L6-v2 | 23M | 41.95 |
|
| 70 |
-
| BM25 | – | 41.14 |
|
| 71 |
|
| 72 |
|
| 73 |
# Quickstart
|
|
@@ -117,7 +115,7 @@ for i, query in enumerate(queries):
|
|
| 117 |
|
| 118 |
See full example notebook [here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/transformers_example.ipynb).
|
| 119 |
|
| 120 |
-
## Asymmetric Retrieval Setup
|
| 121 |
|
| 122 |
`mdbr-leaf-ir` is *aligned* to [`snowflake-arctic-embed-m-v1.5`](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5), the model it has been distilled from. This enables flexible architectures in which, for example, documents are encoded using the larger model, while queries can be encoded faster and more efficiently with the compact `leaf` model:
|
| 123 |
```python
|
|
|
|
| 53 |
|
| 54 |
The table below shows the average BEIR benchmark scores (nDCG@10) for `mdbr-leaf-ir` compared to other retrieval models.
|
| 55 |
|
| 56 |
+
| Model | Size | BEIR Avg. (nDCG@10) |
|
| 57 |
+
|------------------------------------|------|----------------------|
|
| 58 |
+
| **mdbr-leaf-ir** | 23M | **53.55** |
|
| 59 |
+
| snowflake-arctic-embed-s | 32M | 51.98 |
|
| 60 |
+
| bge-small-en-v1.5 | 33M | 51.65 |
|
| 61 |
+
| granite-embedding-small-english-r2 | 47M | 50.87 |
|
| 62 |
+
| snowflake-arctic-embed-xs | 23M | 50.15 |
|
| 63 |
+
| e5-small-v2 | 33M | 49.04 |
|
| 64 |
+
| SPLADE++ | 110M | 48.88 |
|
| 65 |
+
| MiniLM-L6-v2 | 23M | 41.95 |
|
| 66 |
+
| BM25 | – | 41.14 |
|
| 67 |
+
|
| 68 |
+
[//]: # (| **mdbr-leaf-ir (asym.)** | 23M | **?** | )
|
|
|
|
|
|
|
| 69 |
|
| 70 |
|
| 71 |
# Quickstart
|
|
|
|
| 115 |
|
| 116 |
See full example notebook [here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/transformers_example.ipynb).
|
| 117 |
|
| 118 |
+
## Asymmetric Retrieval Setup
|
| 119 |
|
| 120 |
`mdbr-leaf-ir` is *aligned* to [`snowflake-arctic-embed-m-v1.5`](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5), the model it has been distilled from. This enables flexible architectures in which, for example, documents are encoded using the larger model, while queries can be encoded faster and more efficiently with the compact `leaf` model:
|
| 121 |
```python
|