rasyosef commited on
Commit
096700b
·
verified ·
1 Parent(s): 6a0277e

Add new SparseEncoder model

Browse files
1_SpladePooling/config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "pooling_strategy": "max",
3
+ "activation_function": "relu",
4
+ "word_embedding_dimension": 30522
5
+ }
README.md ADDED
@@ -0,0 +1,516 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ tags:
6
+ - sentence-transformers
7
+ - sparse-encoder
8
+ - sparse
9
+ - splade
10
+ - generated_from_trainer
11
+ - dataset_size:250000
12
+ - loss:SpladeLoss
13
+ - loss:SparseMultipleNegativesRankingLoss
14
+ - loss:FlopsLoss
15
+ base_model: prajjwal1/bert-mini
16
+ widget:
17
+ - text: icd medication reaction
18
+ - text: Report Abuse. An egg lives for around 12 hours after ovulation. Sperm can
19
+ live for about five days inside the uterus, so providing those two time frames
20
+ collide, it can be pretty soon after sex. Hours, usually. Implantation occurs
21
+ between 6 - 12 days.
22
+ - text: 'A warm-up is important for many reasons. Some of these reasons include: -
23
+ Facilitates transition from rest to exercise-Stretches postural muscles-Augments
24
+ blood flow-Ele … vates body temperature-Allows body to adjust to changing physiologic,
25
+ biomechanical and bioenergetic demands placed on it during conditioning phase.
26
+ warm-up helps your body prepare for any physical activity. Without a warm-up,
27
+ your muscles will be cold and stiff, the oxygen won''t be flowing to your muscles
28
+ and joints and you will not perform the activity well. Also, when you do a warm
29
+ up your recovery from exercising will be more comfortable and shorter.'
30
+ - text: First, you need to have a Kindle Fire HD or HDX as these are the Kindle Fires
31
+ that have bluetooth. The very first generation doesn't have this capability. (If
32
+ you don't know which tablet you have, see this article.). Second, you need to
33
+ have a bluetooth keyboard or other device, like headphones, speakers, or earbuds.
34
+ This is a picture of a Jawbone earpiece I've successfully paired to my Kindle
35
+ Fire and been able to listen to music with.
36
+ - text: Cantigny Park. Cantigny is a 500-acre (2.0 km2) park in Wheaton, Illinois,
37
+ 30 miles west of Chicago. It is the former estate of Joseph Medill and his grandson
38
+ Colonel Robert R. McCormick, publishers of the Chicago Tribune, and is open to
39
+ the public.
40
+ pipeline_tag: feature-extraction
41
+ library_name: sentence-transformers
42
+ metrics:
43
+ - dot_accuracy@1
44
+ - dot_accuracy@3
45
+ - dot_accuracy@5
46
+ - dot_accuracy@10
47
+ - dot_precision@1
48
+ - dot_precision@3
49
+ - dot_precision@5
50
+ - dot_precision@10
51
+ - dot_recall@1
52
+ - dot_recall@3
53
+ - dot_recall@5
54
+ - dot_recall@10
55
+ - dot_ndcg@10
56
+ - dot_mrr@10
57
+ - dot_map@100
58
+ - query_active_dims
59
+ - query_sparsity_ratio
60
+ - corpus_active_dims
61
+ - corpus_sparsity_ratio
62
+ model-index:
63
+ - name: SPLADE-BERT-Mini
64
+ results:
65
+ - task:
66
+ type: sparse-information-retrieval
67
+ name: Sparse Information Retrieval
68
+ dataset:
69
+ name: Unknown
70
+ type: unknown
71
+ metrics:
72
+ - type: dot_accuracy@1
73
+ value: 0.63028
74
+ name: Dot Accuracy@1
75
+ - type: dot_accuracy@3
76
+ value: 0.79716
77
+ name: Dot Accuracy@3
78
+ - type: dot_accuracy@5
79
+ value: 0.85096
80
+ name: Dot Accuracy@5
81
+ - type: dot_accuracy@10
82
+ value: 0.90548
83
+ name: Dot Accuracy@10
84
+ - type: dot_precision@1
85
+ value: 0.63028
86
+ name: Dot Precision@1
87
+ - type: dot_precision@3
88
+ value: 0.26571999999999996
89
+ name: Dot Precision@3
90
+ - type: dot_precision@5
91
+ value: 0.170192
92
+ name: Dot Precision@5
93
+ - type: dot_precision@10
94
+ value: 0.09054800000000002
95
+ name: Dot Precision@10
96
+ - type: dot_recall@1
97
+ value: 0.63028
98
+ name: Dot Recall@1
99
+ - type: dot_recall@3
100
+ value: 0.79716
101
+ name: Dot Recall@3
102
+ - type: dot_recall@5
103
+ value: 0.85096
104
+ name: Dot Recall@5
105
+ - type: dot_recall@10
106
+ value: 0.90548
107
+ name: Dot Recall@10
108
+ - type: dot_ndcg@10
109
+ value: 0.7689713558276354
110
+ name: Dot Ndcg@10
111
+ - type: dot_mrr@10
112
+ value: 0.7250807142857304
113
+ name: Dot Mrr@10
114
+ - type: dot_map@100
115
+ value: 0.728785316630622
116
+ name: Dot Map@100
117
+ - type: query_active_dims
118
+ value: 26.435239791870117
119
+ name: Query Active Dims
120
+ - type: query_sparsity_ratio
121
+ value: 0.9991338955575693
122
+ name: Query Sparsity Ratio
123
+ - type: corpus_active_dims
124
+ value: 326.6760399121094
125
+ name: Corpus Active Dims
126
+ - type: corpus_sparsity_ratio
127
+ value: 0.9892970303416517
128
+ name: Corpus Sparsity Ratio
129
+ ---
130
+
131
+ # SPLADE-BERT-Mini
132
+
133
+ This is a [SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [prajjwal1/bert-mini](https://huggingface.co/prajjwal1/bert-mini) using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
134
+ ## Model Details
135
+
136
+ ### Model Description
137
+ - **Model Type:** SPLADE Sparse Encoder
138
+ - **Base model:** [prajjwal1/bert-mini](https://huggingface.co/prajjwal1/bert-mini) <!-- at revision 5e123abc2480f0c4b4cac186d3b3f09299c258fc -->
139
+ - **Maximum Sequence Length:** 512 tokens
140
+ - **Output Dimensionality:** 30522 dimensions
141
+ - **Similarity Function:** Dot Product
142
+ <!-- - **Training Dataset:** Unknown -->
143
+ - **Language:** en
144
+ - **License:** mit
145
+
146
+ ### Model Sources
147
+
148
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
149
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
150
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
151
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
152
+
153
+ ### Full Model Architecture
154
+
155
+ ```
156
+ SparseEncoder(
157
+ (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertForMaskedLM'})
158
+ (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
159
+ )
160
+ ```
161
+
162
+ ## Usage
163
+
164
+ ### Direct Usage (Sentence Transformers)
165
+
166
+ First install the Sentence Transformers library:
167
+
168
+ ```bash
169
+ pip install -U sentence-transformers
170
+ ```
171
+
172
+ Then you can load this model and run inference.
173
+ ```python
174
+ from sentence_transformers import SparseEncoder
175
+
176
+ # Download from the 🤗 Hub
177
+ model = SparseEncoder("rasyosef/SPLADE-BERT-Mini")
178
+ # Run inference
179
+ queries = [
180
+ "cantigny gardens cost",
181
+ ]
182
+ documents = [
183
+ 'The fee for a ceremony ranges from $400 to $2,500 with reception rental or $3,000 for a ceremony-only wedding. Please inquire about discounted rates for ceremony guest counts under 75. The average wedding cost at Cantigny Park is estimated at between $12,881 and $22,238 for a ceremony & reception for 100 guests.',
184
+ 'Nestled in a serene setting, Cantigny Park is a scenic realm where you will create a unique wedding, the memories of which you will always cherish. This expansive estate encompasses 500 acres of beautiful gardens, colorful botanicals and tranquil water features, creating an idyllic background for this ideal day.',
185
+ 'Cantigny Park. Cantigny is a 500-acre (2.0 km2) park in Wheaton, Illinois, 30 miles west of Chicago. It is the former estate of Joseph Medill and his grandson Colonel Robert R. McCormick, publishers of the Chicago Tribune, and is open to the public.',
186
+ ]
187
+ query_embeddings = model.encode_query(queries)
188
+ document_embeddings = model.encode_document(documents)
189
+ print(query_embeddings.shape, document_embeddings.shape)
190
+ # [1, 30522] [3, 30522]
191
+
192
+ # Get the similarity scores for the embeddings
193
+ similarities = model.similarity(query_embeddings, document_embeddings)
194
+ print(similarities)
195
+ # tensor([[18.8703, 13.8253, 13.4587]])
196
+ ```
197
+
198
+ <!--
199
+ ### Direct Usage (Transformers)
200
+
201
+ <details><summary>Click to see the direct usage in Transformers</summary>
202
+
203
+ </details>
204
+ -->
205
+
206
+ <!--
207
+ ### Downstream Usage (Sentence Transformers)
208
+
209
+ You can finetune this model on your own dataset.
210
+
211
+ <details><summary>Click to expand</summary>
212
+
213
+ </details>
214
+ -->
215
+
216
+ <!--
217
+ ### Out-of-Scope Use
218
+
219
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
220
+ -->
221
+
222
+ ## Evaluation
223
+
224
+ ### Metrics
225
+
226
+ #### Sparse Information Retrieval
227
+
228
+ * Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator)
229
+
230
+ | Metric | Value |
231
+ |:----------------------|:----------|
232
+ | dot_accuracy@1 | 0.6303 |
233
+ | dot_accuracy@3 | 0.7972 |
234
+ | dot_accuracy@5 | 0.851 |
235
+ | dot_accuracy@10 | 0.9055 |
236
+ | dot_precision@1 | 0.6303 |
237
+ | dot_precision@3 | 0.2657 |
238
+ | dot_precision@5 | 0.1702 |
239
+ | dot_precision@10 | 0.0905 |
240
+ | dot_recall@1 | 0.6303 |
241
+ | dot_recall@3 | 0.7972 |
242
+ | dot_recall@5 | 0.851 |
243
+ | dot_recall@10 | 0.9055 |
244
+ | **dot_ndcg@10** | **0.769** |
245
+ | dot_mrr@10 | 0.7251 |
246
+ | dot_map@100 | 0.7288 |
247
+ | query_active_dims | 26.4352 |
248
+ | query_sparsity_ratio | 0.9991 |
249
+ | corpus_active_dims | 326.676 |
250
+ | corpus_sparsity_ratio | 0.9893 |
251
+
252
+ <!--
253
+ ## Bias, Risks and Limitations
254
+
255
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
256
+ -->
257
+
258
+ <!--
259
+ ### Recommendations
260
+
261
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
262
+ -->
263
+
264
+ ## Training Details
265
+
266
+ ### Training Dataset
267
+
268
+ #### Unnamed Dataset
269
+
270
+ * Size: 250,000 training samples
271
+ * Columns: <code>query</code>, <code>positive</code>, <code>negative_1</code>, and <code>negative_2</code>
272
+ * Approximate statistics based on the first 1000 samples:
273
+ | | query | positive | negative_1 | negative_2 |
274
+ |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
275
+ | type | string | string | string | string |
276
+ | details | <ul><li>min: 4 tokens</li><li>mean: 8.87 tokens</li><li>max: 31 tokens</li></ul> | <ul><li>min: 20 tokens</li><li>mean: 82.54 tokens</li><li>max: 218 tokens</li></ul> | <ul><li>min: 20 tokens</li><li>mean: 79.98 tokens</li><li>max: 252 tokens</li></ul> | <ul><li>min: 19 tokens</li><li>mean: 80.55 tokens</li><li>max: 211 tokens</li></ul> |
277
+ * Samples:
278
+ | query | positive | negative_1 | negative_2 |
279
+ |:------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
280
+ | <code>how do automotive technicians get paid</code> | <code>104 months ago. The amount of pay from company to company does not vary too much, but you do have a wide variety of compensation methods. There are various combinations of hourly and commission pay rates, which depending on what type of work you specialize in can vary your bottom line considerably.04 months ago. The amount of pay from company to company does not vary too much, but you do have a wide variety of compensation methods. There are various combinations of hourly and commission pay rates, which depending on what type of work you specialize in can vary your bottom line considerably.</code> | <code>Bureau of Labor Statistics figures indicate that automotive technicians earned an average annual salary of $38,560 and an average hourly wage of $18.54 as of May 2011.Half of auto technicians reported annual salaries of between $26,850 and $47,540 and hourly wages of between $12.91 and $22.86.The 10 percent of automotive techs who earned the lowest made $20,620 or less per year, and the top 10 percent of earners made $59,600 or more per year.ver one-third of all automotive technicians employed as of May 2011 worked in the automotive repair and maintenance industry, where they earned an average of $35,090 per year.</code> | <code>It really depends on what automaker your working for, how much experience you have, and how long you've been in the industry. Obviously if you're working for a highend company(BMW,Mercedes,Ferrari) you can expect to be paid more per hour. And automotive technicians don't get paid by the hour.We get paid per FLAT RATE hour. Which basically means that we get paid by the job. Which could range from 0.2 of an hour for replacing a headlight bulb to 10hours for a transmission overhaul. Then there's a difference between warranty jobs and cash jobs.ut I won't get into too much detail. Automotive technicians get paid around $12-$15/hr at entry level. But can make around $18-$26/hr with much more experience. Which means you can expect to make 30,000 to 60,000/year. Though most technicians don't see past 45,000 a year.</code> |
281
+ | <code>how far is steamboat springs from golden?</code> | <code>The distance between Steamboat Springs and Golden in a straight line is 100 miles or 160.9 Kilometers. Driving Directions & Drive Times from Steamboat Springs to Golden can be found further down the page.</code> | <code>Steamboat Springs Vacation Rentals Steamboat Springs Vacations Steamboat Springs Restaurants Things to Do in Steamboat Springs Steamboat Springs Travel Forum Steamboat Springs Photos Steamboat Springs Map Steamboat Springs Travel Guide All Steamboat Springs Hotels; Steamboat Springs Hotel Deals; Last Minute Hotels in Steamboat Springs; By Hotel Type Steamboat Springs Family Hotels</code> | <code>There are 98.92 miles from Golden to Steamboat Springs in northwest direction and 143 miles (230.14 kilometers) by car, following the US-40 route. Golden and Steamboat Springs are 3 hours 20 mins far apart, if you drive non-stop. This is the fastest route from Golden, CO to Steamboat Springs, CO. The halfway point is Heeney, CO. Golden, CO and Steamboat Springs, CO are in the same time zone (MDT). Current time in both locations is 1:26 pm.</code> |
282
+ | <code>incoming wire routing number for california bank and trust</code> | <code>Please call California Bank And Trust representative at (888) 315-2271 for more information. 1 Routing Number: 122003396. 2 250 EAST FIRST STREET # 700. LOS ANGELES, CA 90012-0000. 3 Phone Number: (888) 315-2271.</code> | <code>When asked to provide a routing number for incoming wire transfers to Union Bank accounts, the routing number to use is: 122000496. back to top What options do I have to send wires?</code> | <code>Business Contracting Officers (BCO) have access to Online Banking wires. Simply sign on to Online Banking, click “Send Wires”, and then complete the required information. This particular service is limited to sending wires to U.S. banks only.</code> |
283
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
284
+ ```json
285
+ {
286
+ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score')",
287
+ "document_regularizer_weight": 0.001,
288
+ "query_regularizer_weight": 0.002
289
+ }
290
+ ```
291
+
292
+ ### Training Hyperparameters
293
+ #### Non-Default Hyperparameters
294
+
295
+ - `eval_strategy`: epoch
296
+ - `per_device_train_batch_size`: 64
297
+ - `per_device_eval_batch_size`: 64
298
+ - `learning_rate`: 6e-05
299
+ - `num_train_epochs`: 4
300
+ - `lr_scheduler_type`: cosine
301
+ - `warmup_ratio`: 0.025
302
+ - `fp16`: True
303
+ - `optim`: adamw_torch_fused
304
+ - `batch_sampler`: no_duplicates
305
+
306
+ #### All Hyperparameters
307
+ <details><summary>Click to expand</summary>
308
+
309
+ - `overwrite_output_dir`: False
310
+ - `do_predict`: False
311
+ - `eval_strategy`: epoch
312
+ - `prediction_loss_only`: True
313
+ - `per_device_train_batch_size`: 64
314
+ - `per_device_eval_batch_size`: 64
315
+ - `per_gpu_train_batch_size`: None
316
+ - `per_gpu_eval_batch_size`: None
317
+ - `gradient_accumulation_steps`: 1
318
+ - `eval_accumulation_steps`: None
319
+ - `torch_empty_cache_steps`: None
320
+ - `learning_rate`: 6e-05
321
+ - `weight_decay`: 0.0
322
+ - `adam_beta1`: 0.9
323
+ - `adam_beta2`: 0.999
324
+ - `adam_epsilon`: 1e-08
325
+ - `max_grad_norm`: 1.0
326
+ - `num_train_epochs`: 4
327
+ - `max_steps`: -1
328
+ - `lr_scheduler_type`: cosine
329
+ - `lr_scheduler_kwargs`: {}
330
+ - `warmup_ratio`: 0.025
331
+ - `warmup_steps`: 0
332
+ - `log_level`: passive
333
+ - `log_level_replica`: warning
334
+ - `log_on_each_node`: True
335
+ - `logging_nan_inf_filter`: True
336
+ - `save_safetensors`: True
337
+ - `save_on_each_node`: False
338
+ - `save_only_model`: False
339
+ - `restore_callback_states_from_checkpoint`: False
340
+ - `no_cuda`: False
341
+ - `use_cpu`: False
342
+ - `use_mps_device`: False
343
+ - `seed`: 42
344
+ - `data_seed`: None
345
+ - `jit_mode_eval`: False
346
+ - `use_ipex`: False
347
+ - `bf16`: False
348
+ - `fp16`: True
349
+ - `fp16_opt_level`: O1
350
+ - `half_precision_backend`: auto
351
+ - `bf16_full_eval`: False
352
+ - `fp16_full_eval`: False
353
+ - `tf32`: None
354
+ - `local_rank`: 0
355
+ - `ddp_backend`: None
356
+ - `tpu_num_cores`: None
357
+ - `tpu_metrics_debug`: False
358
+ - `debug`: []
359
+ - `dataloader_drop_last`: False
360
+ - `dataloader_num_workers`: 0
361
+ - `dataloader_prefetch_factor`: None
362
+ - `past_index`: -1
363
+ - `disable_tqdm`: False
364
+ - `remove_unused_columns`: True
365
+ - `label_names`: None
366
+ - `load_best_model_at_end`: False
367
+ - `ignore_data_skip`: False
368
+ - `fsdp`: []
369
+ - `fsdp_min_num_params`: 0
370
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
371
+ - `fsdp_transformer_layer_cls_to_wrap`: None
372
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
373
+ - `deepspeed`: None
374
+ - `label_smoothing_factor`: 0.0
375
+ - `optim`: adamw_torch_fused
376
+ - `optim_args`: None
377
+ - `adafactor`: False
378
+ - `group_by_length`: False
379
+ - `length_column_name`: length
380
+ - `ddp_find_unused_parameters`: None
381
+ - `ddp_bucket_cap_mb`: None
382
+ - `ddp_broadcast_buffers`: False
383
+ - `dataloader_pin_memory`: True
384
+ - `dataloader_persistent_workers`: False
385
+ - `skip_memory_metrics`: True
386
+ - `use_legacy_prediction_loop`: False
387
+ - `push_to_hub`: False
388
+ - `resume_from_checkpoint`: None
389
+ - `hub_model_id`: None
390
+ - `hub_strategy`: every_save
391
+ - `hub_private_repo`: None
392
+ - `hub_always_push`: False
393
+ - `hub_revision`: None
394
+ - `gradient_checkpointing`: False
395
+ - `gradient_checkpointing_kwargs`: None
396
+ - `include_inputs_for_metrics`: False
397
+ - `include_for_metrics`: []
398
+ - `eval_do_concat_batches`: True
399
+ - `fp16_backend`: auto
400
+ - `push_to_hub_model_id`: None
401
+ - `push_to_hub_organization`: None
402
+ - `mp_parameters`:
403
+ - `auto_find_batch_size`: False
404
+ - `full_determinism`: False
405
+ - `torchdynamo`: None
406
+ - `ray_scope`: last
407
+ - `ddp_timeout`: 1800
408
+ - `torch_compile`: False
409
+ - `torch_compile_backend`: None
410
+ - `torch_compile_mode`: None
411
+ - `include_tokens_per_second`: False
412
+ - `include_num_input_tokens_seen`: False
413
+ - `neftune_noise_alpha`: None
414
+ - `optim_target_modules`: None
415
+ - `batch_eval_metrics`: False
416
+ - `eval_on_start`: False
417
+ - `use_liger_kernel`: False
418
+ - `liger_kernel_config`: None
419
+ - `eval_use_gather_object`: False
420
+ - `average_tokens_across_devices`: False
421
+ - `prompts`: None
422
+ - `batch_sampler`: no_duplicates
423
+ - `multi_dataset_batch_sampler`: proportional
424
+ - `router_mapping`: {}
425
+ - `learning_rate_mapping`: {}
426
+
427
+ </details>
428
+
429
+ ### Training Logs
430
+ | Epoch | Step | Training Loss | dot_ndcg@10 |
431
+ |:-----:|:-----:|:-------------:|:-----------:|
432
+ | 1.0 | 3907 | 23.8846 | 0.7509 |
433
+ | 2.0 | 7814 | 0.785 | 0.7670 |
434
+ | 3.0 | 11721 | 0.6873 | 0.7685 |
435
+ | 4.0 | 15628 | 0.6283 | 0.7690 |
436
+ | -1 | -1 | - | 0.7690 |
437
+
438
+
439
+ ### Framework Versions
440
+ - Python: 3.11.13
441
+ - Sentence Transformers: 5.0.0
442
+ - Transformers: 4.53.1
443
+ - PyTorch: 2.6.0+cu124
444
+ - Accelerate: 1.8.1
445
+ - Datasets: 3.6.0
446
+ - Tokenizers: 0.21.2
447
+
448
+ ## Citation
449
+
450
+ ### BibTeX
451
+
452
+ #### Sentence Transformers
453
+ ```bibtex
454
+ @inproceedings{reimers-2019-sentence-bert,
455
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
456
+ author = "Reimers, Nils and Gurevych, Iryna",
457
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
458
+ month = "11",
459
+ year = "2019",
460
+ publisher = "Association for Computational Linguistics",
461
+ url = "https://arxiv.org/abs/1908.10084",
462
+ }
463
+ ```
464
+
465
+ #### SpladeLoss
466
+ ```bibtex
467
+ @misc{formal2022distillationhardnegativesampling,
468
+ title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
469
+ author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
470
+ year={2022},
471
+ eprint={2205.04733},
472
+ archivePrefix={arXiv},
473
+ primaryClass={cs.IR},
474
+ url={https://arxiv.org/abs/2205.04733},
475
+ }
476
+ ```
477
+
478
+ #### SparseMultipleNegativesRankingLoss
479
+ ```bibtex
480
+ @misc{henderson2017efficient,
481
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
482
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
483
+ year={2017},
484
+ eprint={1705.00652},
485
+ archivePrefix={arXiv},
486
+ primaryClass={cs.CL}
487
+ }
488
+ ```
489
+
490
+ #### FlopsLoss
491
+ ```bibtex
492
+ @article{paria2020minimizing,
493
+ title={Minimizing flops to learn efficient sparse representations},
494
+ author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
495
+ journal={arXiv preprint arXiv:2004.05665},
496
+ year={2020}
497
+ }
498
+ ```
499
+
500
+ <!--
501
+ ## Glossary
502
+
503
+ *Clearly define terms in order to be accessible across audiences.*
504
+ -->
505
+
506
+ <!--
507
+ ## Model Card Authors
508
+
509
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
510
+ -->
511
+
512
+ <!--
513
+ ## Model Card Contact
514
+
515
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
516
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForMaskedLM"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 256,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 1024,
12
+ "layer_norm_eps": 1e-12,
13
+ "max_position_embeddings": 512,
14
+ "model_type": "bert",
15
+ "num_attention_heads": 4,
16
+ "num_hidden_layers": 4,
17
+ "pad_token_id": 0,
18
+ "position_embedding_type": "absolute",
19
+ "torch_dtype": "float32",
20
+ "transformers_version": "4.53.1",
21
+ "type_vocab_size": 2,
22
+ "use_cache": true,
23
+ "vocab_size": 30522
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SparseEncoder",
3
+ "__version__": {
4
+ "sentence_transformers": "5.0.0",
5
+ "transformers": "4.53.1",
6
+ "pytorch": "2.6.0+cu124"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "dot"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02ac0b450c721891da6146541abdcbb030bc4969b0f9817c3a9d4073a720241a
3
+ size 44814856
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.sparse_encoder.models.MLMTransformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_SpladePooling",
12
+ "type": "sentence_transformers.sparse_encoder.models.SpladePooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 1000000000000000019884624838656,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff