MongoDB
/

mdbr-leaf-ir

@@ -22,13 +22,13 @@ language:
 ## Introduction
-`mdbr-leaf-ir` is a compact high-performance text embedding model specifically designed for **information retrieval (IR)** tasks.
 Enabling even greater efficiency, `mdbr-leaf-ir` supports [flexible asymmetric architectures](#asymmetric-retrieval-setup) and is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl).
 If you are looking to perform other tasks such as classification, clustering, semantic sentence similarity, summarization, please check out our [`mdbr-leaf-mt`](https://huggingface.co/MongoDB/mdbr-leaf-mt) model.
-Note: this model has been developed by MongoDB Research and is not part of MongoDB's commercial offerings.</span>
 ## Technical Report
@@ -40,27 +40,6 @@ A technical report detailing our proposed `LEAF` training procedure is [availabl
 * **Flexible Architecture Support**: `mdbr-leaf-ir` supports asymmetric retrieval architectures enabling even greater retrieval results. [See below](#asymmetric-retrieval-setup) for more information.
 * **MRL and quantization support**: embedding vectors generated by `mdbr-leaf-ir` compress well when truncated (MRL) and/or are stored using more efficient types like `int8` and `binary`.  [See below](#mrl) for more information.
-<!-- ## Performance
-### Benchmark Results
-* Values are nDCG@10
-* Scores exclude CQADupstack and MSMARCO; full BEIR results are available on the [public leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
-* Scores in bold highlight when our model outperforms comparisons in either standard or asymmetric mode; we also highlight cases when comparisons outperform our model in standard mode. Blue are scores when asymmetric outperforms standard.
-* `BM25` scores are obtained with `(k₁=0.9, b=0.4)`.
-| Model | Size | arg. | fiqa | nfc | scid. | scif. | quora | covid | nq | fever | c-fever | dbp. | hotpot | avg. |
-|-------|------|------|------|-----|-------|-------|--------|-------|----|----- |---------|------|--------|------|
-| **`mdbr-leaf-ir` (asym.)** | 23M | **<span style="color:blue">58.5</span>** | **<span style="color:blue">42.1</span>** | **36.1** | <span style="color:blue">20.4</span> | **69.9** | <span style="color:blue">86.2</span> | **<span style="color:blue">83.7</span>** | **<span style="color:blue">61.4</span>** | **<span style="color:blue">86.4</span>** | **<span style="color:blue">37.4</span>** | **<span style="color:blue">44.8</span>** | **<span style="color:blue">69.0</span>** | **<span style="color:blue">58.0</span>** |
-| **`mdbr-leaf-ir`** | 23M | **56.7** | **38.1** | **36.2** | 19.5 | **70.0** | 71.0 | **83.0** | **58.2** | **85.4** | **32.4** | 43.7 | 68.2 | **55.2** |
-| **Comparisons** | | | | | | | | | | | | | | |
-| `snowflake-arctic-embed-xs` | 23M | 52.1 | 34.5 | 30.9 | 18.4 | 64.5 | 86.6 | 79.4 | 54.8 | 83.4 | 29.9 | 40.2 | 65.3 | 53.3 |
-| `MiniLM-L6-v2` | 23M | 50.2 | 36.9 | 31.6 | **21.6** | 64.5 | **87.6** | 47.2 | 43.9 | 51.9 | 20.3 | 32.3 | 46.5 | 44.5 |
-| `BM25` | -- | 40.8 | 23.8 | 31.8 | 15.0 | 67.6 | 78.7 | 58.9 | 30.5 | 63.8 | 16.2 | 31.9 | 62.9 | 43.5 |
-| `SPLADE v2` | 110M | 47.9 | 33.6 | 33.4 | 15.8 | 69.3 | 83.8 | 71.0 | 52.1 | 78.6 | 23.5 | 43.5 | **68.4** | 51.7 |
-| `ColBERT v2` | 110M | 46.3 | 35.6 | 33.8 | 15.4 | 69.3 | 85.2 | 73.8 | 56.2 | 78.5 | 17.6 | **44.6** | 66.7 | 51.9 |
--->
 ## Quickstart
 ### Sentence Transformers
@@ -106,80 +85,12 @@ for i, query in enumerate(queries):
 ### Transformers Usage
-<span style="color:red">CHECK THAT safe_open WORKS WITH URLS; link to code in repo</span>
-<!-- ```python
-from safetensors import safe_open
-from transformers import AutoModel, AutoTokenizer
-# Load the model
-tokenizer = AutoTokenizer.from_pretrained(MODEL)
-model = AutoModel.from_pretrained(MODEL)
-tensors = {}
-with safe_open(MODEL + "/2_Dense/model.safetensors", framework="pt") as f:
-    for k in f.keys():
-        tensors[k] = f.get_tensor(k)
-W_out = torch.nn.Linear(in_features=384, out_features=768, bias=True)
-W_out.load_state_dict({
-    "weight": tensors["linear.weight"],
-    "bias": tensors["linear.bias"]
-})
-_ = model.eval()
-_ = W_out.eval()
-# Example queries and documents
-queries = [
-    "What is machine learning?",
-    "How does neural network training work?"
-]
-documents = [
-    "Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.",
-    "Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors."
-]
-# Tokenize
-QUERY_PREFIX = 'Represent this sentence for searching relevant passages: '
-queries_with_prefix = [QUERY_PREFIX + query for query in queries]
-query_tokens = tokenizer(queries_with_prefix, padding=True, truncation=True, return_tensors='pt', max_length=512)
-document_tokens =  tokenizer(documents, padding=True, truncation=True, return_tensors='pt', max_length=512)
-# Perform Inference
-with torch.inference_mode():
-    y_queries = model(**query_tokens).last_hidden_state
-    y_docs = model(**document_tokens).last_hidden_state
-    # perform pooling
-    y_queries = y_queries * query_tokens.attention_mask.unsqueeze(-1)
-    y_queries_pooled = y_queries.sum(dim=1) / query_tokens.attention_mask.sum(dim=1, keepdim=True)
-    y_docs = y_docs * document_tokens.attention_mask.unsqueeze(-1)
-    y_docs_pooled = y_docs.sum(dim=1) / document_tokens.attention_mask.sum(dim=1, keepdim=True)
-    # map to desired output dimension
-    y_queries_out = W_out(y_queries_pooled)
-    y_docs_out = W_out(y_docs_pooled)
-    # normalize and return
-    query_embeddings = F.normalize(y_queries_out, dim=-1)
-    document_embeddings = F.normalize(y_docs_out, dim=-1)
-similarities = query_embeddings @ document_embeddings.T
-print(f"Similarities:\n{similarities}")
-# Similarities:
-#  tensor([[0.6857, 0.4598],
-#          [0.4238, 0.5723]])
-```   -->
 ### Asymmetric Retrieval Setup
-`mdbr-leaf-ir` is *aligned* to [`snowflake-arctic-embed-m-v1.5`](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5), the model it has been distilled from, making the asymmetric system below possible:
-```python
 # Use mdbr-leaf-ir for query encoding (real-time, low latency)
 query_model = SentenceTransformer("MongoDB/mdbr-leaf-ir")
 query_embeddings = query_model.encode(queries, prompt_name="query")
@@ -187,7 +98,7 @@ query_embeddings = query_model.encode(queries, prompt_name="query")
 # Use a larger model for document encoding (one-time, at index time)
 doc_model = SentenceTransformer("Snowflake/snowflake-arctic-embed-m-v1.5")
 document_embeddings = doc_model.encode(documents)
 # Compute similarities
 scores = query_model.similarity(query_embeddings, document_embeddings)
 ```
@@ -255,9 +166,9 @@ print(f"* Similarities:\n{similarities}")
 ## Evaluation
 Please refer to this <span style="color:red">TBD</span> script to replicate results.
-The checkpoint used to produce the scores presented in the paper [is here](https://huggingface.co/MongoDB/mdbr-leaf-ir/commit/ea98995e96beac21b820aa8ad9afaa6fd29b243d).
-## Citation
 If you use this model in your work, please cite:

 ## Introduction
+`mdbr-leaf-ir` is a compact high-performance text embedding model specifically designed for **information retrieval (IR)** tasks, e.g., the retrieveal part of RAGs.
 Enabling even greater efficiency, `mdbr-leaf-ir` supports [flexible asymmetric architectures](#asymmetric-retrieval-setup) and is robust to [vector quantization](#vector-quantization) and [MRL truncation](#mrl).
 If you are looking to perform other tasks such as classification, clustering, semantic sentence similarity, summarization, please check out our [`mdbr-leaf-mt`](https://huggingface.co/MongoDB/mdbr-leaf-mt) model.
+Note: this model is the result of MongoDB Research's ML team. At the time of writing it is not used in any of MongoDB's commercial product or service offerings.
 ## Technical Report
 * **Flexible Architecture Support**: `mdbr-leaf-ir` supports asymmetric retrieval architectures enabling even greater retrieval results. [See below](#asymmetric-retrieval-setup) for more information.
 * **MRL and quantization support**: embedding vectors generated by `mdbr-leaf-ir` compress well when truncated (MRL) and/or are stored using more efficient types like `int8` and `binary`.  [See below](#mrl) for more information.
 ## Quickstart
 ### Sentence Transformers
 ### Transformers Usage
+See [here](https://huggingface.co/MongoDB/mdbr-leaf-ir/resolve/main/transformers_example.ipynb).
 ### Asymmetric Retrieval Setup
+`mdbr-leaf-ir` is *aligned* to [`snowflake-arctic-embed-m-v1.5`](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5), the model it has been distilled from. This enables flexible archiectures in which, for example, documents are encoded using the larger model, while queries can be encoded faster and more efficiently with the compact `leaf` model:
+```python
 # Use mdbr-leaf-ir for query encoding (real-time, low latency)
 query_model = SentenceTransformer("MongoDB/mdbr-leaf-ir")
 query_embeddings = query_model.encode(queries, prompt_name="query")
 # Use a larger model for document encoding (one-time, at index time)
 doc_model = SentenceTransformer("Snowflake/snowflake-arctic-embed-m-v1.5")
 document_embeddings = doc_model.encode(documents)
 # Compute similarities
 scores = query_model.similarity(query_embeddings, document_embeddings)
 ```
 ## Evaluation
 Please refer to this <span style="color:red">TBD</span> script to replicate results.
+The checkpoint used to produce the scores presented in the paper [is here](https://huggingface.co/MongoDB/mdbr-leaf-ir/commit/ea98995e96beac21b820aa8ad9afaa6fd29b243d). The current model has been trained further to achieve higher scores.
+## Citation
 If you use this model in your work, please cite:

transformers_example.ipynb ADDED Viewed

	@@ -0,0 +1,140 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "2a12a2b3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from safetensors import safe_open\n",
+    "import torch\n",
+    "from torch.nn import functional as F\n",
+    "from transformers import AutoModel, AutoTokenizer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "148ce181",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# First clone the model locally\n",
+    "!git clone https://huggingface.co/MongoDB/mdbr-leaf-ir"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ba9ec6c7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Then load it\n",
+    "MODEL = \"mdbr-leaf-ir\"\n",
+    "\n",
+    "tokenizer = AutoTokenizer.from_pretrained(MODEL)\n",
+    "model = AutoModel.from_pretrained(MODEL)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ebaf1a76",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Similarities:\n",
+      "tensor([[0.6857, 0.4598],\n",
+      "        [0.4238, 0.5723]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "tensors = {}\n",
+    "with safe_open(MODEL + \"/2_Dense/model.safetensors\", framework=\"pt\") as f:\n",
+    "    for k in f.keys():\n",
+    "        tensors[k] = f.get_tensor(k)\n",
+    "\n",
+    "W_out = torch.nn.Linear(in_features=384, out_features=768, bias=True)\n",
+    "W_out.load_state_dict({\n",
+    "    \"weight\": tensors[\"linear.weight\"], \n",
+    "    \"bias\": tensors[\"linear.bias\"]\n",
+    "})\n",
+    "\n",
+    "_ = model.eval()\n",
+    "_ = W_out.eval()\n",
+    "\n",
+    "# Example queries and documents  \n",
+    "queries = [\n",
+    "    \"What is machine learning?\",  \n",
+    "    \"How does neural network training work?\"  \n",
+    "]  \n",
+    "  \n",
+    "documents = [  \n",
+    "    \"Machine learning is a subset of artificial intelligence that focuses on algorithms that can learn from data.\",  \n",
+    "    \"Neural networks are trained through backpropagation, adjusting weights to minimize prediction errors.\"  \n",
+    "]\n",
+    "\n",
+    "# Tokenize\n",
+    "QUERY_PREFIX = 'Represent this sentence for searching relevant passages: '\n",
+    "queries_with_prefix = [QUERY_PREFIX + query for query in queries]\n",
+    "\n",
+    "query_tokens = tokenizer(queries_with_prefix, padding=True, truncation=True, return_tensors='pt', max_length=512)\n",
+    "document_tokens =  tokenizer(documents, padding=True, truncation=True, return_tensors='pt', max_length=512)\n",
+    "\n",
+    "# Perform Inference\n",
+    "with torch.inference_mode():\n",
+    "    y_queries = model(**query_tokens).last_hidden_state\n",
+    "    y_docs = model(**document_tokens).last_hidden_state\n",
+    "\n",
+    "    # perform pooling\n",
+    "    y_queries = y_queries * query_tokens.attention_mask.unsqueeze(-1)\n",
+    "    y_queries_pooled = y_queries.sum(dim=1) / query_tokens.attention_mask.sum(dim=1, keepdim=True)\n",
+    "\n",
+    "    y_docs = y_docs * document_tokens.attention_mask.unsqueeze(-1)\n",
+    "    y_docs_pooled = y_docs.sum(dim=1) / document_tokens.attention_mask.sum(dim=1, keepdim=True)\n",
+    "\n",
+    "    # map to desired output dimension\n",
+    "    y_queries_out = W_out(y_queries_pooled)\n",
+    "    y_docs_out = W_out(y_docs_pooled)\n",
+    "\n",
+    "    # normalize and return\n",
+    "    query_embeddings = F.normalize(y_queries_out, dim=-1)\n",
+    "    document_embeddings = F.normalize(y_docs_out, dim=-1)\n",
+    "\n",
+    "similarities = query_embeddings @ document_embeddings.T\n",
+    "print(f\"Similarities:\\n{similarities}\")\n",
+    "\n",
+    "# Similarities:\n",
+    "#  tensor([[0.6857, 0.4598],\n",
+    "#          [0.4238, 0.5723]])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "alexis",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}