Sentence Similarity
sentence-transformers
ONNX
Safetensors
Transformers
Transformers.js
English
bert
feature-extraction
text-embeddings-inference
information-retrieval
knowledge-distillation
Instructions to use MongoDB/mdbr-leaf-ir with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use MongoDB/mdbr-leaf-ir with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("MongoDB/mdbr-leaf-ir") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use MongoDB/mdbr-leaf-ir with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("MongoDB/mdbr-leaf-ir") model = AutoModel.from_pretrained("MongoDB/mdbr-leaf-ir") - Transformers.js
How to use MongoDB/mdbr-leaf-ir with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('sentence-similarity', 'MongoDB/mdbr-leaf-ir'); - Inference
- Notebooks
- Google Colab
- Kaggle
Upload README.md
Browse files
README.md
CHANGED
|
@@ -119,7 +119,10 @@ for i, query in enumerate(queries):
|
|
| 119 |
See full example notebook [here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/transformers_example.ipynb).
|
| 120 |
|
| 121 |
## Asymmetric Retrieval Setup
|
| 122 |
-
|
|
|
|
|
|
|
|
|
|
| 123 |
`mdbr-leaf-ir` is *aligned* to [`snowflake-arctic-embed-m-v1.5`](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5), the model it has been distilled from. This enables flexible architectures in which, for example, documents are encoded using the larger model, while queries can be encoded faster and more efficiently with the compact `leaf` model:
|
| 124 |
```python
|
| 125 |
# Use mdbr-leaf-ir for query encoding (real-time, low latency)
|
|
@@ -139,25 +142,19 @@ Retrieval results in asymmetric mode are often superior to the [standard mode ab
|
|
| 139 |
|
| 140 |
Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
|
| 141 |
```python
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
query_embeds = model.encode(queries, prompt_name="query", convert_to_tensor=True)
|
| 145 |
-
doc_embeds = model.encode(documents, convert_to_tensor=True)
|
| 146 |
-
|
| 147 |
-
# Truncate and normalize according to MRL
|
| 148 |
-
query_embeds = F.normalize(query_embeds[:, :256], dim=-1)
|
| 149 |
-
doc_embeds = F.normalize(doc_embeds[:, :256], dim=-1)
|
| 150 |
|
| 151 |
similarities = model.similarity(query_embeds, doc_embeds)
|
| 152 |
|
| 153 |
print('After MRL:')
|
| 154 |
print(f"* Embeddings dimension: {query_embeds.shape[1]}")
|
| 155 |
-
print(f"* Similarities:\n\t{similarities}")
|
| 156 |
|
| 157 |
# After MRL:
|
| 158 |
# * Embeddings dimension: 256
|
| 159 |
# * Similarities:
|
| 160 |
-
#
|
| 161 |
# [0.4567, 0.6022]])
|
| 162 |
```
|
| 163 |
|
|
@@ -185,7 +182,7 @@ similarities = query_embeds.astype(int) @ doc_embeds.astype(int).T
|
|
| 185 |
|
| 186 |
print('After quantization:')
|
| 187 |
print(f"* Embeddings type: {query_embeds.dtype}")
|
| 188 |
-
print(f"* Similarities:\n{similarities}")
|
| 189 |
|
| 190 |
# After quantization:
|
| 191 |
# * Embeddings type: int8
|
|
|
|
| 119 |
See full example notebook [here](https://huggingface.co/MongoDB/mdbr-leaf-ir/blob/main/transformers_example.ipynb).
|
| 120 |
|
| 121 |
## Asymmetric Retrieval Setup
|
| 122 |
+
|
| 123 |
+
> [!Note]
|
| 124 |
+
> **Note**: a version of this asymmetric setup, conveniently packaged into a single model, is [available here](https://huggingface.co/MongoDB/mdbr-leaf-ir-asym).
|
| 125 |
+
|
| 126 |
`mdbr-leaf-ir` is *aligned* to [`snowflake-arctic-embed-m-v1.5`](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5), the model it has been distilled from. This enables flexible architectures in which, for example, documents are encoded using the larger model, while queries can be encoded faster and more efficiently with the compact `leaf` model:
|
| 127 |
```python
|
| 128 |
# Use mdbr-leaf-ir for query encoding (real-time, low latency)
|
|
|
|
| 142 |
|
| 143 |
Embeddings have been trained via [MRL](https://arxiv.org/abs/2205.13147) and can be truncated for more efficient storage:
|
| 144 |
```python
|
| 145 |
+
query_embeds = model.encode(queries, prompt_name="query", truncate_dim=256)
|
| 146 |
+
doc_embeds = model.encode(documents, truncate_dim=256)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
|
| 148 |
similarities = model.similarity(query_embeds, doc_embeds)
|
| 149 |
|
| 150 |
print('After MRL:')
|
| 151 |
print(f"* Embeddings dimension: {query_embeds.shape[1]}")
|
| 152 |
+
print(f"* Similarities: \n\t{similarities}")
|
| 153 |
|
| 154 |
# After MRL:
|
| 155 |
# * Embeddings dimension: 256
|
| 156 |
# * Similarities:
|
| 157 |
+
# tensor([[0.7136, 0.4989],
|
| 158 |
# [0.4567, 0.6022]])
|
| 159 |
```
|
| 160 |
|
|
|
|
| 182 |
|
| 183 |
print('After quantization:')
|
| 184 |
print(f"* Embeddings type: {query_embeds.dtype}")
|
| 185 |
+
print(f"* Similarities: \n{similarities}")
|
| 186 |
|
| 187 |
# After quantization:
|
| 188 |
# * Embeddings type: int8
|