rasyosef commited on
Commit
1e8af3a
·
verified ·
1 Parent(s): ad723d1

Add new SparseEncoder model

Browse files
Files changed (3) hide show
  1. README.md +89 -99
  2. config.json +4 -4
  3. model.safetensors +2 -2
README.md CHANGED
@@ -8,45 +8,35 @@ tags:
8
  - sparse
9
  - splade
10
  - generated_from_trainer
11
- - dataset_size:496123
12
  - loss:SpladeLoss
13
  - loss:SparseMarginMSELoss
14
  - loss:FlopsLoss
15
- base_model: prajjwal1/bert-tiny
16
  widget:
17
- - text: Hurley doesn't just want to be your go-to for surf gear, but the be the brand
18
- that represents your lifestyle. Of course you have your pick up board shorts,
19
- tanks and a Hurley hat while you're on the beach, but you can also look at graphic
20
- tees, sandals, and accessories when you're on the street.
21
- - text: 'Electric field of a positive and a negative point charge. Electric charge
22
- is the physical property of matter that causes it to experience a force when placed
23
- in an electromagnetic field.There are two types of electric charges: positive
24
- and negative.lectric charge is a characteristic property of many subatomic particles.
25
- The charges of free-standing particles are integer multiples of the elementary
26
- charge e; we say that electric charge is quantized. Michael Faraday, in his electrolysis
27
- experiments, was the first to note the discrete nature of electric charge.'
28
- - text: The term mechanical digestion refers to the physical breakdown of large pieces
29
- of food into smaller pieces which can subsequently be accessed by digestive enzymes.
30
- In chemical digestion, enzymes break down food into the small molecules the body
31
- can use.
32
- - text: Kids and Quick Solutions. Children learn to put away their clothes when they
33
- can reach the hanging rods. This is actually fun for little ones -- they may spend
34
- a long stretch of time putting hangers on and taking them off the rods -- as long
35
- as the rods are child-height.So take your stand against piles of clothes on the
36
- floor of the teen's bedroom early by re-sizing the closet to fit the kid.his is
37
- actually fun for little ones -- they may spend a long stretch of time putting
38
- hangers on and taking them off the rods -- as long as the rods are child-height.
39
- So take your stand against piles of clothes on the floor of the teen's bedroom
40
- early by re-sizing the closet to fit the kid.
41
- - text: About EUS (endoscopic ultrasound). An EUS, or endoscopic ultrasound, is an
42
- outpatient procedure used to closely examine the tissues in the digestive tract.
43
- The procedure is done using a standard endoscope and a tiny ultrasound device.The
44
- ultrasound sensor sends back visual images of the digestive tract to a screen,
45
- allowing the physician to see deeper into the tissues and the organs beneath the
46
- surface of the intestines.. In general, an EUS is a very safe procedure. If your
47
- procedure is being done on the upper GI tract, you may have a sore throat for
48
- a few days. As a result of the sedation, you should not drive, operate heavy machinery
49
- or make any important decisions for up to six hours following the procedure.
50
  pipeline_tag: feature-extraction
51
  library_name: sentence-transformers
52
  metrics:
@@ -70,7 +60,7 @@ metrics:
70
  - corpus_active_dims
71
  - corpus_sparsity_ratio
72
  model-index:
73
- - name: SPLADE-BERT-Tiny-Distil
74
  results:
75
  - task:
76
  type: sparse-information-retrieval
@@ -80,72 +70,72 @@ model-index:
80
  type: unknown
81
  metrics:
82
  - type: dot_accuracy@1
83
- value: 0.4602
84
  name: Dot Accuracy@1
85
  - type: dot_accuracy@3
86
- value: 0.7768
87
  name: Dot Accuracy@3
88
  - type: dot_accuracy@5
89
- value: 0.885
90
  name: Dot Accuracy@5
91
  - type: dot_accuracy@10
92
- value: 0.9548
93
  name: Dot Accuracy@10
94
  - type: dot_precision@1
95
- value: 0.4602
96
  name: Dot Precision@1
97
  - type: dot_precision@3
98
- value: 0.2653333333333333
99
  name: Dot Precision@3
100
  - type: dot_precision@5
101
- value: 0.18391999999999997
102
  name: Dot Precision@5
103
  - type: dot_precision@10
104
- value: 0.10024
105
  name: Dot Precision@10
106
  - type: dot_recall@1
107
- value: 0.4461833333333334
108
  name: Dot Recall@1
109
  - type: dot_recall@3
110
- value: 0.7631166666666666
111
  name: Dot Recall@3
112
  - type: dot_recall@5
113
- value: 0.8761
114
  name: Dot Recall@5
115
  - type: dot_recall@10
116
- value: 0.9500333333333334
117
  name: Dot Recall@10
118
  - type: dot_ndcg@10
119
- value: 0.7094495794736737
120
  name: Dot Ndcg@10
121
  - type: dot_mrr@10
122
- value: 0.6344716666666689
123
  name: Dot Mrr@10
124
  - type: dot_map@100
125
- value: 0.6306882016403095
126
  name: Dot Map@100
127
  - type: query_active_dims
128
- value: 16.77560043334961
129
  name: Query Active Dims
130
  - type: query_sparsity_ratio
131
- value: 0.9994503767632085
132
  name: Query Sparsity Ratio
133
  - type: corpus_active_dims
134
- value: 102.47956598021874
135
  name: Corpus Active Dims
136
  - type: corpus_sparsity_ratio
137
- value: 0.9966424360795421
138
  name: Corpus Sparsity Ratio
139
  ---
140
 
141
- # SPLADE-BERT-Tiny-Distil
142
 
143
- This is a [SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [prajjwal1/bert-tiny](https://huggingface.co/prajjwal1/bert-tiny) using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
144
  ## Model Details
145
 
146
  ### Model Description
147
  - **Model Type:** SPLADE Sparse Encoder
148
- - **Base model:** [prajjwal1/bert-tiny](https://huggingface.co/prajjwal1/bert-tiny) <!-- at revision 6f75de8b60a9f8a2fdf7b69cbd86d9e64bcb3837 -->
149
  - **Maximum Sequence Length:** 512 tokens
150
  - **Output Dimensionality:** 30522 dimensions
151
  - **Similarity Function:** Dot Product
@@ -184,15 +174,15 @@ Then you can load this model and run inference.
184
  from sentence_transformers import SparseEncoder
185
 
186
  # Download from the 🤗 Hub
187
- model = SparseEncoder("yosefw/SPLADE-BERT-Tiny-distil-msmarco")
188
  # Run inference
189
  queries = [
190
- "what is eus appointment",
191
  ]
192
  documents = [
193
- "Endoscopic Ultrasound (EUS). You've been referred to have an endoscopic ultrasound, or EUS, which will help your doctor, evaluate or treat your condition. This brochure will give you a basic understanding of the procedure-how it is performed, how it can help, and what side effects you might experience.our doctor can use EUS to diagnose the cause of conditions such as abdominal pain or abnormal weight loss. Or, if your doctor has ruled out certain conditions, EUS can confirm your diagnosis and give you a clean bill of health.",
194
- 'About EUS (endoscopic ultrasound). An EUS, or endoscopic ultrasound, is an outpatient procedure used to closely examine the tissues in the digestive tract. The procedure is done using a standard endoscope and a tiny ultrasound device.The ultrasound sensor sends back visual images of the digestive tract to a screen, allowing the physician to see deeper into the tissues and the organs beneath the surface of the intestines.. In general, an EUS is a very safe procedure. If your procedure is being done on the upper GI tract, you may have a sore throat for a few days. As a result of the sedation, you should not drive, operate heavy machinery or make any important decisions for up to six hours following the procedure.',
195
- 'Endoscopic Ultrasound (EUS) allows your doctor to examine the lining and the walls of your upper and lower gastrointestinal tract.The upper tract is the esophagus, stomach, and duodenum; the lower tract includes your colon and rectum.Doctors also use EUS to study internal organs that lie next to the gastrointestinal tract, such as the gall bladder and the pancreas. Your endoscopist will use a thin, flexible tube called an endoscope.he upper tract is the esophagus, stomach, and duodenum; the lower tract includes your colon and rectum. Doctors also use EUS to study internal organs that lie next to the gastrointestinal tract, such as the gall bladder and the pancreas.',
196
  ]
197
  query_embeddings = model.encode_query(queries)
198
  document_embeddings = model.encode_document(documents)
@@ -202,7 +192,7 @@ print(query_embeddings.shape, document_embeddings.shape)
202
  # Get the similarity scores for the embeddings
203
  similarities = model.similarity(query_embeddings, document_embeddings)
204
  print(similarities)
205
- # tensor([[12.9370, 14.3277, 12.9725]])
206
  ```
207
 
208
  <!--
@@ -239,25 +229,25 @@ You can finetune this model on your own dataset.
239
 
240
  | Metric | Value |
241
  |:----------------------|:-----------|
242
- | dot_accuracy@1 | 0.4602 |
243
- | dot_accuracy@3 | 0.7768 |
244
- | dot_accuracy@5 | 0.885 |
245
- | dot_accuracy@10 | 0.9548 |
246
- | dot_precision@1 | 0.4602 |
247
- | dot_precision@3 | 0.2653 |
248
- | dot_precision@5 | 0.1839 |
249
- | dot_precision@10 | 0.1002 |
250
- | dot_recall@1 | 0.4462 |
251
- | dot_recall@3 | 0.7631 |
252
- | dot_recall@5 | 0.8761 |
253
- | dot_recall@10 | 0.95 |
254
- | **dot_ndcg@10** | **0.7094** |
255
- | dot_mrr@10 | 0.6345 |
256
- | dot_map@100 | 0.6307 |
257
- | query_active_dims | 16.7756 |
258
- | query_sparsity_ratio | 0.9995 |
259
- | corpus_active_dims | 102.4796 |
260
- | corpus_sparsity_ratio | 0.9966 |
261
 
262
  <!--
263
  ## Bias, Risks and Limitations
@@ -277,19 +267,19 @@ You can finetune this model on your own dataset.
277
 
278
  #### Unnamed Dataset
279
 
280
- * Size: 496,123 training samples
281
  * Columns: <code>query</code>, <code>positive</code>, <code>negative_1</code>, <code>negative_2</code>, <code>negative_3</code>, <code>negative_4</code>, and <code>label</code>
282
  * Approximate statistics based on the first 1000 samples:
283
- | | query | positive | negative_1 | negative_2 | negative_3 | negative_4 | label |
284
- |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------|
285
- | type | string | string | string | string | string | string | list |
286
- | details | <ul><li>min: 4 tokens</li><li>mean: 9.09 tokens</li><li>max: 37 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 80.68 tokens</li><li>max: 215 tokens</li></ul> | <ul><li>min: 20 tokens</li><li>mean: 78.57 tokens</li><li>max: 238 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 77.8 tokens</li><li>max: 253 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 76.46 tokens</li><li>max: 248 tokens</li></ul> | <ul><li>min: 16 tokens</li><li>mean: 75.9 tokens</li><li>max: 190 tokens</li></ul> | <ul><li>size: 4 elements</li></ul> |
287
  * Samples:
288
- | query | positive | negative_1 | negative_2 | negative_3 | negative_4 | label |
289
- |:-------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------|
290
- | <code>could Nexium antacid cause sweating</code> | <code>Summary: Sweating-excessive is found among people who take Nexium, especially for people who are 60+ old, have been taking the drug for.Personalized health information: on eHealthMe you can find out what patients like me (same gender, age) reported their drugs and conditions on FDA and social media since 1977. I am a 56 year old female who has been taking Nexium for 13 years and has been plagued by shingles.. 2 Support group for people who have Sweating-Excessive. 3 Been on warfarin for 6 days and having sweating at times.</code> | <code>More questions for: Nexium, Sweating-excessive. You may be interested at these reviews (Write a review): 1 Xarelto caused shortness of breath. 2 After taking Xarelto for 3 years I suddently experienced shortness of breath, sweating and pain in my arms. 3 Myrbetriq & hyperhidrosis (night sweats). I am a 56 year old female who has been taking Nexium for 13 years and has been plagued by shingles.. 2 Support group for people who have Sweating-Excessive. 3 Been on warfarin for 6 days and having sweating at times.</code> | <code>NEXIUM may help your acid-related symptoms, but you could still have serious stomach problems. Talk with your doctor. NEXIUM can cause serious side effects, including: 1 Diarrhea. 2 NEXIUM may increase your risk of getting severe diarrhea.3 This diarrhea may be caused by an infection (Clostridium difficile) in your intestines.EXIUM can cause serious side effects, including: 1 Diarrhea. 2 NEXIUM may increase your risk of getting severe diarrhea. 3 This diarrhea may be caused by an infection (Clostridium difficile) in your intestines.</code> | <code>Treatment for sweating. The treatment you have will depend on the cause of your sweating. If you have an infection, antibiotics will treat the infection and stop the sweating. If your sweating is due to cancer, treating the cancer can get rid of the sweating.If you have sweating because treatment has changed your hormone levels, it may settle down after a few weeks or months, once your body is used to the treatment. Talk to your doctor or nurse about your sweats.nfection. Infection is one of the most common causes of sweating in people who have cancer. Infection can give you a high temperature and your body sweats to try and reduce it. Treating the infection can control or stop the sweating.</code> | <code>Esomeprazole is used to treat certain stomach and esophagus problems (such as acid reflux, ulcers). It works by decreasing the amount of acid your stomach makes.ide Effects. See also Precautions section. Headache or abdominal pain may occur. If any of these effects persist or worsen, tell your doctor or pharmacist promptly. Remember that your doctor has prescribed this medication because he or she has judged that the benefit to you is greater than the risk of side effects.</code> | <code>[0.5, 6.390576362609863, 11.97206974029541, 16.409034729003906]</code> |
291
- | <code>what is electronic document access</code> | <code>Electronic Document Access (EDA) is a web-based system that provides secure online access, storage, and retrieval of contracts, contract modifications, Government Bills of Lading (GBLs), DFAS Transactions for Others (E110), vouchers, and Contract Deficiency Reports (CDR) to authorized users throughout the Department of Defense (DoD).</code> | <code>An electronic document management system (EDMS) is a software system for organizing and storing different kinds of documents. This type of system is a more particular kind of document management system, a more general type of storage system that helps users to organize and store paper or digital documents.</code> | <code>In many cases, the specific documentation for original storage protocols is a major part of what makes an electronic document management system so valuable to a business or organization.</code> | <code>Benefits derived from DoD EDA include: 1 Single-source, timely information. 2 Electronic search and retrieval 24/7 access/retrieval capability. 3 Increased visibility of all procurement & payment actions. Reduction in data entry/human 1 error. Lower postage, handling, retention and document management costs.</code> | <code>If YES, go to www.docusign.net and log in with your email and password. On the DocuSign Web Application, select the Documents tab. Your documents are listed there. If NO, you can access the document by opening the DocuSign Completed email. This email is sent to you once you have finished signing a DocuSign document. See the instructions below. Note: In some cases, your documents might be attached to the Completed email. 1. Open the DocuSign Completed email.</code> | <code>[4.681269645690918, 9.322907447814941, 14.813400268554688, 20.356698989868164]</code> |
292
- | <code>does hpv cause uti</code> | <code>So now you get in the acidic environment can hpv cause urinary tract infection for the area of the blockage of the fruits and fiber as a completely eliminate urinate at all. Spending money on prescription of antibiotics will kill all of the bacterial infection keeps happening to your veterinarian will work to cure the condition.</code> | <code>HPV & Urinary Tract Infections. Human Papillomavirus (HPV) is a group of viruses that can cause warts and cancers of the cervix, anus and genitals. Urinary tract infection (UTI) occurs when bacteria multiply within the bladder, causing pain and urinary urgency. (Thomas Northcut/Digital Vision/Getty Images) Other People Are Reading.</code> | <code>Some types of the HPV virus can infect the genital epithelial cells (skin and mucous membranes). Some types of HPV virus cause warts that appear on the genitals (vagina, vulva, penis, etc.) and anus of women and men.</code> | <code>Most women with HPV have no signs of infection. Since most HPV infections go away on their own within two years, many women never know they had an infection. Some HPV infections cause genital warts that can be seen or felt. The only way to know if you have HPV is to ask your health care provider to do an HPV test.</code> | <code>Genital warts are caused by low-risk types of human papillomavirus (HPV). These viruses may not cause warts in everyone. Women can get genital warts from sexual contact with someone who has HPV. Genital warts are spread by skin-to-skin contact, usually from contact with the warts. It can be spread by vaginal, anal, oral, or handgenital sexual contact. Genital warts will spread HPV while visible, and after recent treatment.</code> | <code>[0.5, 2.4958395957946777, 3.76273775100708, 4.114340305328369]</code> |
293
  * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
294
  ```json
295
  {
@@ -439,12 +429,12 @@ You can finetune this model on your own dataset.
439
  ### Training Logs
440
  | Epoch | Step | Training Loss | dot_ndcg@10 |
441
  |:-------:|:---------:|:-------------:|:-----------:|
442
- | 1.0 | 10336 | 16309.8824 | 0.6698 |
443
- | 2.0 | 20672 | 14.4047 | 0.6920 |
444
- | 3.0 | 31008 | 13.0742 | 0.7004 |
445
- | 4.0 | 41344 | 11.8023 | 0.7060 |
446
- | 5.0 | 51680 | 11.0464 | 0.7085 |
447
- | **6.0** | **62016** | **10.6766** | **0.7094** |
448
 
449
  * The bold row denotes the saved checkpoint.
450
 
 
8
  - sparse
9
  - splade
10
  - generated_from_trainer
11
+ - dataset_size:250000
12
  - loss:SpladeLoss
13
  - loss:SparseMarginMSELoss
14
  - loss:FlopsLoss
15
+ base_model: prajjwal1/bert-mini
16
  widget:
17
+ - text: what did marlo thomas play on
18
+ - text: 'Unused vacation does not roll over. to next calendar year and is not paid
19
+ out at termination. Please Note: This table applies to employees in positions
20
+ with 100% FTE. The number of hours/days of vacation are pro-rated for FTEs between
21
+ 75%. and 99%. For Example: If a non-exempt employee who is within their first
22
+ five years of service has an FTE of 75%, then the number of hours they would.
23
+ accrue each month would be 6 (8 x .75 = 6) and not 8.'
24
+ - text: 'To convert from miles to feet by hand, multiply miles by 5280. miles * 5280
25
+ = feet. To convert from feet to miles by hand, divide feet by 5280. feet / 5280
26
+ = miles. An automated version of this calculator can be found here:'
27
+ - text: All you have to do is click on the button directly below and follow the instructions.
28
+ Click he relevant payment button below for £55 payment. --------------------------.
29
+ Star Attuned Crystals for Activation. Another way to receive star energies and
30
+ activations is to buy the attuned crystals that I provide.These carry the energies
31
+ of specific stars or star beings and guides. By holding these you can feel the
32
+ energies of the star or star being and this brings healing, activation, spiritual
33
+ growth and sometimes communication.he highest form of star energy work / activation
34
+ is to receive a Star Attunement. Star Attunements carry the power of stars and
35
+ evolved star beings. They are off the scale and profoundly beautiful and spiritual.
36
+ - text: Fermentation is a metabolic pathway that produce ATP molecules under anaerobic
37
+ conditions (only undergoes glycolysis), NAD+ is used directly in glycolysis to
38
+ form ATP molecules, which is not as efficient as cellular respiration because
39
+ only 2ATP molecules are formed during the glycolysis.
 
 
 
 
 
 
 
 
 
 
40
  pipeline_tag: feature-extraction
41
  library_name: sentence-transformers
42
  metrics:
 
60
  - corpus_active_dims
61
  - corpus_sparsity_ratio
62
  model-index:
63
+ - name: SPLADE-BERT-Mini-Distil
64
  results:
65
  - task:
66
  type: sparse-information-retrieval
 
70
  type: unknown
71
  metrics:
72
  - type: dot_accuracy@1
73
+ value: 0.4828
74
  name: Dot Accuracy@1
75
  - type: dot_accuracy@3
76
+ value: 0.8052
77
  name: Dot Accuracy@3
78
  - type: dot_accuracy@5
79
+ value: 0.9046
80
  name: Dot Accuracy@5
81
  - type: dot_accuracy@10
82
+ value: 0.9666
83
  name: Dot Accuracy@10
84
  - type: dot_precision@1
85
+ value: 0.4828
86
  name: Dot Precision@1
87
  - type: dot_precision@3
88
+ value: 0.27566666666666667
89
  name: Dot Precision@3
90
  - type: dot_precision@5
91
+ value: 0.18787999999999996
92
  name: Dot Precision@5
93
  - type: dot_precision@10
94
+ value: 0.10156
95
  name: Dot Precision@10
96
  - type: dot_recall@1
97
+ value: 0.4673
98
  name: Dot Recall@1
99
  - type: dot_recall@3
100
+ value: 0.792
101
  name: Dot Recall@3
102
  - type: dot_recall@5
103
+ value: 0.8949
104
  name: Dot Recall@5
105
  - type: dot_recall@10
106
+ value: 0.9624166666666668
107
  name: Dot Recall@10
108
  - type: dot_ndcg@10
109
+ value: 0.7302009825334612
110
  name: Dot Ndcg@10
111
  - type: dot_mrr@10
112
+ value: 0.6579904761904781
113
  name: Dot Mrr@10
114
  - type: dot_map@100
115
+ value: 0.6534502206938125
116
  name: Dot Map@100
117
  - type: query_active_dims
118
+ value: 19.52400016784668
119
  name: Query Active Dims
120
  - type: query_sparsity_ratio
121
+ value: 0.9993603302480883
122
  name: Query Sparsity Ratio
123
  - type: corpus_active_dims
124
+ value: 113.4705113862854
125
  name: Corpus Active Dims
126
  - type: corpus_sparsity_ratio
127
+ value: 0.9962823369573983
128
  name: Corpus Sparsity Ratio
129
  ---
130
 
131
+ # SPLADE-BERT-Mini-Distil
132
 
133
+ This is a [SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [prajjwal1/bert-mini](https://huggingface.co/prajjwal1/bert-mini) using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
134
  ## Model Details
135
 
136
  ### Model Description
137
  - **Model Type:** SPLADE Sparse Encoder
138
+ - **Base model:** [prajjwal1/bert-mini](https://huggingface.co/prajjwal1/bert-mini) <!-- at revision 5e123abc2480f0c4b4cac186d3b3f09299c258fc -->
139
  - **Maximum Sequence Length:** 512 tokens
140
  - **Output Dimensionality:** 30522 dimensions
141
  - **Similarity Function:** Dot Product
 
174
  from sentence_transformers import SparseEncoder
175
 
176
  # Download from the 🤗 Hub
177
+ model = SparseEncoder("yosefw/SPLADE-BERT-Mini-distil-v2")
178
  # Run inference
179
  queries = [
180
+ "definition of fermentation in the lab",
181
  ]
182
  documents = [
183
+ 'Fermentation is a metabolic pathway that produce ATP molecules under anaerobic conditions (only undergoes glycolysis), NAD+ is used directly in glycolysis to form ATP molecules, which is not as efficient as cellular respiration because only 2ATP molecules are formed during the glycolysis.',
184
+ 'Essay on Yeast Fermentation ... Yeast Fermentation Lab Report The purpose of this experiment was to observe the process in which cells must partake in a respiration process called anaerobic fermentation and as the name suggests, oxygen is not required.',
185
+ '\ufeffYeast Fermentation Lab Report The purpose of this experiment was to observe the process in which cells must partake in a respiration process called anaerobic fermentation and as the name suggests, oxygen is not required.',
186
  ]
187
  query_embeddings = model.encode_query(queries)
188
  document_embeddings = model.encode_document(documents)
 
192
  # Get the similarity scores for the embeddings
193
  similarities = model.similarity(query_embeddings, document_embeddings)
194
  print(similarities)
195
+ # tensor([[20.0220, 17.1372, 15.9159]])
196
  ```
197
 
198
  <!--
 
229
 
230
  | Metric | Value |
231
  |:----------------------|:-----------|
232
+ | dot_accuracy@1 | 0.4828 |
233
+ | dot_accuracy@3 | 0.8052 |
234
+ | dot_accuracy@5 | 0.9046 |
235
+ | dot_accuracy@10 | 0.9666 |
236
+ | dot_precision@1 | 0.4828 |
237
+ | dot_precision@3 | 0.2757 |
238
+ | dot_precision@5 | 0.1879 |
239
+ | dot_precision@10 | 0.1016 |
240
+ | dot_recall@1 | 0.4673 |
241
+ | dot_recall@3 | 0.792 |
242
+ | dot_recall@5 | 0.8949 |
243
+ | dot_recall@10 | 0.9624 |
244
+ | **dot_ndcg@10** | **0.7302** |
245
+ | dot_mrr@10 | 0.658 |
246
+ | dot_map@100 | 0.6535 |
247
+ | query_active_dims | 19.524 |
248
+ | query_sparsity_ratio | 0.9994 |
249
+ | corpus_active_dims | 113.4705 |
250
+ | corpus_sparsity_ratio | 0.9963 |
251
 
252
  <!--
253
  ## Bias, Risks and Limitations
 
267
 
268
  #### Unnamed Dataset
269
 
270
+ * Size: 250,000 training samples
271
  * Columns: <code>query</code>, <code>positive</code>, <code>negative_1</code>, <code>negative_2</code>, <code>negative_3</code>, <code>negative_4</code>, and <code>label</code>
272
  * Approximate statistics based on the first 1000 samples:
273
+ | | query | positive | negative_1 | negative_2 | negative_3 | negative_4 | label |
274
+ |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------|
275
+ | type | string | string | string | string | string | string | list |
276
+ | details | <ul><li>min: 4 tokens</li><li>mean: 8.87 tokens</li><li>max: 43 tokens</li></ul> | <ul><li>min: 24 tokens</li><li>mean: 81.23 tokens</li><li>max: 259 tokens</li></ul> | <ul><li>min: 20 tokens</li><li>mean: 79.21 tokens</li><li>max: 197 tokens</li></ul> | <ul><li>min: 20 tokens</li><li>mean: 77.89 tokens</li><li>max: 207 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 76.38 tokens</li><li>max: 271 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 75.46 tokens</li><li>max: 214 tokens</li></ul> | <ul><li>size: 4 elements</li></ul> |
277
  * Samples:
278
+ | query | positive | negative_1 | negative_2 | negative_3 | negative_4 | label |
279
+ |:------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------|
280
+ | <code>heart specialists in ridgeland ms</code> | <code>Dr. George Reynolds Jr, MD is a cardiology specialist in Ridgeland, MS and has been practicing for 35 years. He graduated from Vanderbilt University School Of Medicine in 1977 and specializes in cardiology and internal medicine.</code> | <code>Dr. James Kramer is a Internist in Ridgeland, MS. Find Dr. Kramer's phone number, address and more.</code> | <code>Dr. James Kramer is an internist in Ridgeland, Mississippi. He received his medical degree from Loma Linda University School of Medicine and has been in practice for more than 20 years. Dr. James Kramer's Details</code> | <code>Chronic Pulmonary Heart Diseases (incl. Pulmonary Hypertension) Coarctation of the Aorta; Congenital Aortic Valve Disorders; Congenital Heart Defects; Congenital Heart Disease; Congestive Heart Failure; Coronary Artery Disease (CAD) Endocarditis; Heart Attack (Acute Myocardial Infarction) Heart Disease; Heart Murmur; Heart Palpitations; Hyperlipidemia; Hypertension</code> | <code>A growing shortage of primary care doctors means you might have to look harder for ongoing care. How to Read an OTC Medication Label Purvi Parikh, M.D. | Feb. 12, 2018</code> | <code>[6.058592796325684, 6.587987422943115, 19.88274383544922, 20.211898803710938]</code> |
281
+ | <code>does baytril otic require a prescription</code> | <code>Baytril Otic Ear Drops-Enrofloxacin/Silver Sulfadiazine-Prices & Information. A prescription is required for this item. A prescription is required for this item. Brand medication is not available at this time.</code> | <code>RX required for this item. Click here for our full Prescription Policy and Form. Baytril Otic (enrofloxacin/silver sulfadiazine) Emulsion from Bayer is the first fluoroquinolone approved by the Food and Drug Administration for the topical treatment of canine otitis externa.</code> | <code>Product Details. Baytril Otic is a highly effective treatment prescribed by many veterinarians when your pet has an ear infection caused by susceptible bacteria or fungus. Baytril Otic is: a liquid emulsion that is used topically directly in the ear or on the skin in order to treat susceptible bacterial and yeast infections.</code> | <code>Baytril for dogs is an antibiotic often prescribed for bacterial infections, particularly those involving the ears. Ear infections are rare in many animals, but quite common in dogs. This is particularly true for dogs with long droopy ears, where it will stay very warm and moist.</code> | <code>Administer 5-10 Baytril ear drops per treatment in dogs 35 lbs or less and 10-15 drops per treatment in dogs more than 35 lbs.</code> | <code>[1.0, 3.640146493911743, 6.450072288513184, 11.96937084197998]</code> |
282
+ | <code>what is on a gyro</code> | <code>Report Abuse. Gyros or gyro (giros) (pronounced /ˈjɪəroʊ/ or /ˈdʒaɪroʊ/, Greek: γύρος turn) is a Greek dish consisting of meat (typically lamb and/or beef), tomato, onion, and tzatziki sauce, and is served with pita bread. Chicken and pork meat can be used too.</code> | <code>A gyroscope (from Ancient Greek γῦρος gûros, circle and σκοπέω skopéō, to look) is a spinning wheel or disc in which the axis of rotation is free to assume any orientation by itself. When rotating, the orientation of this axis is unaffected by tilting or rotation of the mounting, according to the conservation of angular momentum.</code> | <code>Diagram of a gyro wheel. Reaction arrows about the output axis (blue) correspond to forces applied about the input axis (green), and vice versa. A gyroscope is a wheel mounted in two or three gimbals, which are a pivoted supports that allow the rotation of the wheel about a single axis.</code> | <code>A fair number of our users are unsure of how to pronounce gyro. This isn't surprising, since there are two different gyros and they have two different pronunciations. The earlier gyro is the one that is a shortened form of gyrocompass or gyroscope, and it has a pronunciation that conforms to one's expectations: /JEYE-roh/.</code> | <code>Vibration Gyro Sensors. Vibration gyro sensors sense angular velocity from the Coriolis force applied to a vibrating element. For this reason, the accuracy with which angular velocity is measured differs significantly depending on element material and structural differences.</code> | <code>[2.1750364303588867, 2.634796142578125, 4.30520486831665, 6.382436752319336]</code> |
283
  * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
284
  ```json
285
  {
 
429
  ### Training Logs
430
  | Epoch | Step | Training Loss | dot_ndcg@10 |
431
  |:-------:|:---------:|:-------------:|:-----------:|
432
+ | 1.0 | 5209 | 30541.8683 | 0.6969 |
433
+ | 2.0 | 10418 | 13.3966 | 0.7167 |
434
+ | 3.0 | 15627 | 11.6531 | 0.7262 |
435
+ | 4.0 | 20836 | 9.9781 | 0.7280 |
436
+ | 5.0 | 26045 | 8.881 | 0.7289 |
437
+ | **6.0** | **31254** | **8.3454** | **0.7302** |
438
 
439
  * The bold row denotes the saved checkpoint.
440
 
config.json CHANGED
@@ -6,14 +6,14 @@
6
  "classifier_dropout": null,
7
  "hidden_act": "gelu",
8
  "hidden_dropout_prob": 0.1,
9
- "hidden_size": 128,
10
  "initializer_range": 0.02,
11
- "intermediate_size": 512,
12
  "layer_norm_eps": 1e-12,
13
  "max_position_embeddings": 512,
14
  "model_type": "bert",
15
- "num_attention_heads": 2,
16
- "num_hidden_layers": 2,
17
  "pad_token_id": 0,
18
  "position_embedding_type": "absolute",
19
  "torch_dtype": "float32",
 
6
  "classifier_dropout": null,
7
  "hidden_act": "gelu",
8
  "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 256,
10
  "initializer_range": 0.02,
11
+ "intermediate_size": 1024,
12
  "layer_norm_eps": 1e-12,
13
  "max_position_embeddings": 512,
14
  "model_type": "bert",
15
+ "num_attention_heads": 4,
16
+ "num_hidden_layers": 4,
17
  "pad_token_id": 0,
18
  "position_embedding_type": "absolute",
19
  "torch_dtype": "float32",
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c5578e5c58d8ff1c071f9ef9a555c2694c08a5b4c196697e4e199218dcc64ff0
3
- size 17671560
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eca5ef6ed2e950b988214239e64f87b79f40b939a74ad47b6994ffa4b5de2c25
3
+ size 44814856