prithivida
/

Splade_PP_en_v1

@@ -23,8 +23,48 @@ pipeline_tag: fill-mask
 This work stands on the shoulders of 2 robust researches: [Naver's From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective paper](https://arxiv.org/pdf/2205.04733.pdf) and [Google's SparseEmbed](https://storage.googleapis.com/gweb-research2023-media/pubtools/pdf/79f16d3b3b948706d191a7fe6dd02abe516f5564.pdf).
 Props to both the teams for such a robust work.
-## Motivation:
 SPLADE models are a fine balance between retrieval effectiveness (quality) and retrieval efficiency (latency and $), with that in mind we did **very minor retrieval efficiency tweaks** to make it more suitable for a industry setting.
 *(Pure MLE folks should not conflate efficiency to model inference efficiency. Our main focus is on retrieval efficiency. Hereinafter efficiency is a short hand for retrieval efficiency unless explicitly qualified otherwise. Not that inference efficiency is not important, we will address that subsequently.)*
@@ -42,16 +82,17 @@ SPLADE models are a fine balance between retrieval effectiveness (quality) and r
 <br/>
-## Why FLOPS is one of the key metrics for industry setting ?
   While ONLY a empirical analysis on large sample make sense here is a spot checking - a qualitatively example to give you an idea. Our models achieve par competitive effectiveness with **~10% and ~100%, lesser tokens comparable SPLADE++ models including SoTA**.
 (We will show Quantitative results in the next section.)
 So, **by design "how to beat SoTA MRR?" was never our goal**, Instead "At what cost can we achieve an acceptable effectiveness i.e. MRR@10". Non-chalantly reducing lambda values (λQ,λD, see above table) will achieve a better MRR.
 But Lower lambda values = Higher FLOPS = More tokens = Poorer efficiency. This is NOT desirable for a Industry setting.
-<details>
 **Ours**
 ```python
@@ -78,7 +119,7 @@ SPLADE BOW rep:
 </details>
-## How does it translate into Empirical metrics?
 Our models are token sparse and yet effective. It translates to faster retrieval (User experience) and smaller index size ($). Mean retrieval time on the standard MS-MARCO small dev set and Scaled total FLOPS loss are the respective metrics are below.
 This is why Google's SparseEmbed is interesting as they also achieve SPLADE quality retrieval effectiveness with much lower FLOPs. Compared to ColBERT, SPLADE and SparseEmbed match query and
@@ -99,7 +140,7 @@ The full [anserini evaluation log](https://huggingface.co/prithivida/Splade_PP_e
 - **Same size models:** Official SPLADE++, SparseEmbed and Ours all finetune on the same size based model. Size of `bert-base-uncased`.
 </details>
-## Roadmap and future directions for Industry Suitability.
 - **Custom/Domain Finetuning**: OOD Zeroshot performance of SPLADE models is great but unimportant in the industry setting as we need the ability to finetune on custom datasets or domains. Finetuning SPLADE on a new dataset is not cheap and needs labelling of queries and passages.
   So we will continue to see how we can enable economically finetuning our recipe on custom datasets without expensive labelling.
@@ -107,7 +148,7 @@ The full [anserini evaluation log](https://huggingface.co/prithivida/Splade_PP_e
   120K and 250K vocab as opposed to 30K as in bert-base-uncased. We will continue to research to see how best we can extend our recipe to the multilingual world.
-## Usage
 To enable a light weight inference solution without heavy **No Torch dependency** we will also release a library - **SPLADERunner**
 Ofcourse if it doesnt matter you could always use these models Huggingface transformers library.
@@ -116,7 +157,7 @@ Ofcourse if it doesnt matter you could always use these models Huggingface trans
 <h1 id="htu">How to use? </h1>
-## With SPLADERunner Library
 [SPLADERunner Library](https://github.com/PrithivirajDamodaran/SPLADERunner)
@@ -134,7 +175,7 @@ sparse_rep = expander.expand(
 ```
-## With HuggingFace
 ```python
 import torch

 This work stands on the shoulders of 2 robust researches: [Naver's From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective paper](https://arxiv.org/pdf/2205.04733.pdf) and [Google's SparseEmbed](https://storage.googleapis.com/gweb-research2023-media/pubtools/pdf/79f16d3b3b948706d191a7fe6dd02abe516f5564.pdf).
 Props to both the teams for such a robust work.
+## 1 What is a Sparse Representations and Why learn one?
+**Experts in Sparse & Dense representations feel free skip to next section 2**,
+<details>
+1. Lexical search:
+Lexical search with BOW based sparse vectors are strong baselines, but they famously suffer from vocabulary mismatch problem, as they can only do exact term matching. Here are the pros and cons:
+✅ Efficient and Cheap.
+✅ No need to fine-tune models.
+✅️ Interpretable.
+✅️ Exact Term Matches.
+❌ Vocabulary mismatch (Need to remember exact terms)
+2. Semantic Search:
+Learned Neural / Dense retrievers (DPR, Sentence transformers*, BGE* models) with approximate nearest neighbors search has shown impressive results. Here are the pros and cons:
+✅ Search how humans innately think.
+✅ When finetuned beats sparse by long way.
+✅ Easily works with Multiple modals.
+❌ Suffers token amnesia (misses term matching),
+❌ Resource intensive (both index & retreival),
+❌ Famously hard to interpret.
+❌ Needs fine-tuning for OOD data.
+3. The big idea:
+Getting pros of both searches made sense and that gave rise to interest in learning sparse representations for queries and documents with some interpretability. The sparse representations also double as implicit or explicit (latent, contextualized) expansion mechanisms for both query and documents. If you are new to query expansion learn more here from the master himself Daniel Tunkelang (link below).
+4. What a Sparse model learns ?
+The model learns to project it's learned dense representations over a MLM head to give a vocabulary distribution.
+</details>
+## 2 Motivation:
 SPLADE models are a fine balance between retrieval effectiveness (quality) and retrieval efficiency (latency and $), with that in mind we did **very minor retrieval efficiency tweaks** to make it more suitable for a industry setting.
 *(Pure MLE folks should not conflate efficiency to model inference efficiency. Our main focus is on retrieval efficiency. Hereinafter efficiency is a short hand for retrieval efficiency unless explicitly qualified otherwise. Not that inference efficiency is not important, we will address that subsequently.)*
 <br/>
+## 3 Why FLOPS is one of the key metrics for industry setting ?
+<details>
   While ONLY a empirical analysis on large sample make sense here is a spot checking - a qualitatively example to give you an idea. Our models achieve par competitive effectiveness with **~10% and ~100%, lesser tokens comparable SPLADE++ models including SoTA**.
 (We will show Quantitative results in the next section.)
 So, **by design "how to beat SoTA MRR?" was never our goal**, Instead "At what cost can we achieve an acceptable effectiveness i.e. MRR@10". Non-chalantly reducing lambda values (λQ,λD, see above table) will achieve a better MRR.
 But Lower lambda values = Higher FLOPS = More tokens = Poorer efficiency. This is NOT desirable for a Industry setting.
 **Ours**
 ```python
 </details>
+## 4 How does it translate into Empirical metrics?
 Our models are token sparse and yet effective. It translates to faster retrieval (User experience) and smaller index size ($). Mean retrieval time on the standard MS-MARCO small dev set and Scaled total FLOPS loss are the respective metrics are below.
 This is why Google's SparseEmbed is interesting as they also achieve SPLADE quality retrieval effectiveness with much lower FLOPs. Compared to ColBERT, SPLADE and SparseEmbed match query and
 - **Same size models:** Official SPLADE++, SparseEmbed and Ours all finetune on the same size based model. Size of `bert-base-uncased`.
 </details>
+## 5 Roadmap and future directions for Industry Suitability.
 - **Custom/Domain Finetuning**: OOD Zeroshot performance of SPLADE models is great but unimportant in the industry setting as we need the ability to finetune on custom datasets or domains. Finetuning SPLADE on a new dataset is not cheap and needs labelling of queries and passages.
   So we will continue to see how we can enable economically finetuning our recipe on custom datasets without expensive labelling.
   120K and 250K vocab as opposed to 30K as in bert-base-uncased. We will continue to research to see how best we can extend our recipe to the multilingual world.
+## 6 Usage
 To enable a light weight inference solution without heavy **No Torch dependency** we will also release a library - **SPLADERunner**
 Ofcourse if it doesnt matter you could always use these models Huggingface transformers library.
 <h1 id="htu">How to use? </h1>
+## 7 With SPLADERunner Library
 [SPLADERunner Library](https://github.com/PrithivirajDamodaran/SPLADERunner)
 ```
+## 8 With HuggingFace
 ```python
 import torch