prithivida commited on
Commit
f304b34
·
verified ·
1 Parent(s): 4666917

Added context for beginners

Browse files
Files changed (1) hide show
  1. README.md +50 -9
README.md CHANGED
@@ -23,8 +23,48 @@ pipeline_tag: fill-mask
23
  This work stands on the shoulders of 2 robust researches: [Naver's From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective paper](https://arxiv.org/pdf/2205.04733.pdf) and [Google's SparseEmbed](https://storage.googleapis.com/gweb-research2023-media/pubtools/pdf/79f16d3b3b948706d191a7fe6dd02abe516f5564.pdf).
24
  Props to both the teams for such a robust work.
25
 
 
26
 
27
- ## Motivation:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  SPLADE models are a fine balance between retrieval effectiveness (quality) and retrieval efficiency (latency and $), with that in mind we did **very minor retrieval efficiency tweaks** to make it more suitable for a industry setting.
29
  *(Pure MLE folks should not conflate efficiency to model inference efficiency. Our main focus is on retrieval efficiency. Hereinafter efficiency is a short hand for retrieval efficiency unless explicitly qualified otherwise. Not that inference efficiency is not important, we will address that subsequently.)*
30
 
@@ -42,16 +82,17 @@ SPLADE models are a fine balance between retrieval effectiveness (quality) and r
42
 
43
  <br/>
44
 
45
- ## Why FLOPS is one of the key metrics for industry setting ?
46
-
47
 
 
 
48
  While ONLY a empirical analysis on large sample make sense here is a spot checking - a qualitatively example to give you an idea. Our models achieve par competitive effectiveness with **~10% and ~100%, lesser tokens comparable SPLADE++ models including SoTA**.
49
  (We will show Quantitative results in the next section.)
50
 
51
  So, **by design "how to beat SoTA MRR?" was never our goal**, Instead "At what cost can we achieve an acceptable effectiveness i.e. MRR@10". Non-chalantly reducing lambda values (λQ,λD, see above table) will achieve a better MRR.
52
  But Lower lambda values = Higher FLOPS = More tokens = Poorer efficiency. This is NOT desirable for a Industry setting.
53
 
54
- <details>
55
 
56
  **Ours**
57
  ```python
@@ -78,7 +119,7 @@ SPLADE BOW rep:
78
 
79
  </details>
80
 
81
- ## How does it translate into Empirical metrics?
82
 
83
  Our models are token sparse and yet effective. It translates to faster retrieval (User experience) and smaller index size ($). Mean retrieval time on the standard MS-MARCO small dev set and Scaled total FLOPS loss are the respective metrics are below.
84
  This is why Google's SparseEmbed is interesting as they also achieve SPLADE quality retrieval effectiveness with much lower FLOPs. Compared to ColBERT, SPLADE and SparseEmbed match query and
@@ -99,7 +140,7 @@ The full [anserini evaluation log](https://huggingface.co/prithivida/Splade_PP_e
99
  - **Same size models:** Official SPLADE++, SparseEmbed and Ours all finetune on the same size based model. Size of `bert-base-uncased`.
100
  </details>
101
 
102
- ## Roadmap and future directions for Industry Suitability.
103
 
104
  - **Custom/Domain Finetuning**: OOD Zeroshot performance of SPLADE models is great but unimportant in the industry setting as we need the ability to finetune on custom datasets or domains. Finetuning SPLADE on a new dataset is not cheap and needs labelling of queries and passages.
105
  So we will continue to see how we can enable economically finetuning our recipe on custom datasets without expensive labelling.
@@ -107,7 +148,7 @@ The full [anserini evaluation log](https://huggingface.co/prithivida/Splade_PP_e
107
  120K and 250K vocab as opposed to 30K as in bert-base-uncased. We will continue to research to see how best we can extend our recipe to the multilingual world.
108
 
109
 
110
- ## Usage
111
 
112
  To enable a light weight inference solution without heavy **No Torch dependency** we will also release a library - **SPLADERunner**
113
  Ofcourse if it doesnt matter you could always use these models Huggingface transformers library.
@@ -116,7 +157,7 @@ Ofcourse if it doesnt matter you could always use these models Huggingface trans
116
  <h1 id="htu">How to use? </h1>
117
 
118
 
119
- ## With SPLADERunner Library
120
 
121
  [SPLADERunner Library](https://github.com/PrithivirajDamodaran/SPLADERunner)
122
 
@@ -134,7 +175,7 @@ sparse_rep = expander.expand(
134
  ```
135
 
136
 
137
- ## With HuggingFace
138
 
139
  ```python
140
  import torch
 
23
  This work stands on the shoulders of 2 robust researches: [Naver's From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective paper](https://arxiv.org/pdf/2205.04733.pdf) and [Google's SparseEmbed](https://storage.googleapis.com/gweb-research2023-media/pubtools/pdf/79f16d3b3b948706d191a7fe6dd02abe516f5564.pdf).
24
  Props to both the teams for such a robust work.
25
 
26
+ ## 1 What is a Sparse Representations and Why learn one?
27
 
28
+ **Experts in Sparse & Dense representations feel free skip to next section 2**,
29
+
30
+ <details>
31
+
32
+ 1. Lexical search:
33
+
34
+ Lexical search with BOW based sparse vectors are strong baselines, but they famously suffer from vocabulary mismatch problem, as they can only do exact term matching. Here are the pros and cons:
35
+
36
+ ✅ Efficient and Cheap.
37
+ ✅ No need to fine-tune models.
38
+ ✅️ Interpretable.
39
+ ✅️ Exact Term Matches.
40
+ ❌ Vocabulary mismatch (Need to remember exact terms)
41
+
42
+ 2. Semantic Search:
43
+
44
+ Learned Neural / Dense retrievers (DPR, Sentence transformers*, BGE* models) with approximate nearest neighbors search has shown impressive results. Here are the pros and cons:
45
+
46
+ ✅ Search how humans innately think.
47
+ ✅ When finetuned beats sparse by long way.
48
+ ✅ Easily works with Multiple modals.
49
+ ❌ Suffers token amnesia (misses term matching),
50
+ ❌ Resource intensive (both index & retreival),
51
+ ❌ Famously hard to interpret.
52
+ ❌ Needs fine-tuning for OOD data.
53
+
54
+ 3. The big idea:
55
+
56
+ Getting pros of both searches made sense and that gave rise to interest in learning sparse representations for queries and documents with some interpretability. The sparse representations also double as implicit or explicit (latent, contextualized) expansion mechanisms for both query and documents. If you are new to query expansion learn more here from the master himself Daniel Tunkelang (link below).
57
+
58
+ 4. What a Sparse model learns ?
59
+
60
+ The model learns to project it's learned dense representations over a MLM head to give a vocabulary distribution.
61
+
62
+
63
+
64
+ </details>
65
+
66
+
67
+ ## 2 Motivation:
68
  SPLADE models are a fine balance between retrieval effectiveness (quality) and retrieval efficiency (latency and $), with that in mind we did **very minor retrieval efficiency tweaks** to make it more suitable for a industry setting.
69
  *(Pure MLE folks should not conflate efficiency to model inference efficiency. Our main focus is on retrieval efficiency. Hereinafter efficiency is a short hand for retrieval efficiency unless explicitly qualified otherwise. Not that inference efficiency is not important, we will address that subsequently.)*
70
 
 
82
 
83
  <br/>
84
 
85
+ ## 3 Why FLOPS is one of the key metrics for industry setting ?
 
86
 
87
+ <details>
88
+
89
  While ONLY a empirical analysis on large sample make sense here is a spot checking - a qualitatively example to give you an idea. Our models achieve par competitive effectiveness with **~10% and ~100%, lesser tokens comparable SPLADE++ models including SoTA**.
90
  (We will show Quantitative results in the next section.)
91
 
92
  So, **by design "how to beat SoTA MRR?" was never our goal**, Instead "At what cost can we achieve an acceptable effectiveness i.e. MRR@10". Non-chalantly reducing lambda values (λQ,λD, see above table) will achieve a better MRR.
93
  But Lower lambda values = Higher FLOPS = More tokens = Poorer efficiency. This is NOT desirable for a Industry setting.
94
 
95
+
96
 
97
  **Ours**
98
  ```python
 
119
 
120
  </details>
121
 
122
+ ## 4 How does it translate into Empirical metrics?
123
 
124
  Our models are token sparse and yet effective. It translates to faster retrieval (User experience) and smaller index size ($). Mean retrieval time on the standard MS-MARCO small dev set and Scaled total FLOPS loss are the respective metrics are below.
125
  This is why Google's SparseEmbed is interesting as they also achieve SPLADE quality retrieval effectiveness with much lower FLOPs. Compared to ColBERT, SPLADE and SparseEmbed match query and
 
140
  - **Same size models:** Official SPLADE++, SparseEmbed and Ours all finetune on the same size based model. Size of `bert-base-uncased`.
141
  </details>
142
 
143
+ ## 5 Roadmap and future directions for Industry Suitability.
144
 
145
  - **Custom/Domain Finetuning**: OOD Zeroshot performance of SPLADE models is great but unimportant in the industry setting as we need the ability to finetune on custom datasets or domains. Finetuning SPLADE on a new dataset is not cheap and needs labelling of queries and passages.
146
  So we will continue to see how we can enable economically finetuning our recipe on custom datasets without expensive labelling.
 
148
  120K and 250K vocab as opposed to 30K as in bert-base-uncased. We will continue to research to see how best we can extend our recipe to the multilingual world.
149
 
150
 
151
+ ## 6 Usage
152
 
153
  To enable a light weight inference solution without heavy **No Torch dependency** we will also release a library - **SPLADERunner**
154
  Ofcourse if it doesnt matter you could always use these models Huggingface transformers library.
 
157
  <h1 id="htu">How to use? </h1>
158
 
159
 
160
+ ## 7 With SPLADERunner Library
161
 
162
  [SPLADERunner Library](https://github.com/PrithivirajDamodaran/SPLADERunner)
163
 
 
175
  ```
176
 
177
 
178
+ ## 8 With HuggingFace
179
 
180
  ```python
181
  import torch