kroetenWanderung thierrydamiba commited on
Commit
0c7d9f8
·
0 Parent(s):

Duplicate from thierrydamiba/splade-ecommerce-multidomain

Browse files

Co-authored-by: Thierry Damiba <thierrydamiba@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
1_SpladePooling/config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "pooling_strategy": "max",
3
+ "activation_function": "relu",
4
+ "word_embedding_dimension": 30522
5
+ }
README.md ADDED
@@ -0,0 +1,441 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sparse-encoder
8
+ - sparse
9
+ - splade
10
+ - e-commerce
11
+ - product-search
12
+ - information-retrieval
13
+ - multi-domain
14
+ - dataset_size:99712
15
+ - loss:SpladeLoss
16
+ - loss:SparseMultipleNegativesRankingLoss
17
+ - loss:FlopsLoss
18
+ base_model: distilbert/distilbert-base-uncased
19
+ datasets:
20
+ - tasksource/esci
21
+ - wayfair/wands
22
+ widget:
23
+ - text: '[KIDS TOYLAND] Wooden Dessert Play Set for Kids, Pretend Play Food Sets for
24
+ Birthday Party ,Great for 3, 4, 5, and 6 Year Olds Girls and Boys Wooden Pretend
25
+ Play Food Desserts Set,Wood Dessert Tower and Cakes,Educational Play Food Toys
26
+ for 2 years old kids Birthday Gift<br> <br> <b>Packing Includ:</b><br> cake stand
27
+ *1 chocolates and cakes*12 <br> <br> <b>Pretend Play Wooden Food Set Features:</b><br>
28
+ This high-quality wooden toy is designed for kids three and up, can be used as
29
+ educational toys for shape matching, counting and concepts of reconstruction.
30
+ <br> <br> 1. size: 9.17*9.17*2.2 inch, this beautifully decorated multi shaped
31
+ c'
32
+ - text: mathematical compass
33
+ - text: '[NYX PROFESSIONAL MAKEUP] NYX PROFESSIONAL MAKEUP Lip Lingerie Matte Liquid
34
+ Lipstick - Beauty Mark, Chocolate Brown'
35
+ - text: '[Aladdin] Mrs. Frisby and the Rats of NIMH'
36
+ - text: '[Office Chairs] ginata salon beauty drafting chair'
37
+ pipeline_tag: feature-extraction
38
+ library_name: sentence-transformers
39
+ ---
40
+
41
+ # SPLADE Multi-Domain E-Commerce Search
42
+
43
+ A SPLADE sparse encoder fine-tuned on multiple e-commerce datasets (Amazon ESCI + Wayfair WANDS + Home Depot) for better cross-domain generalization. Trades slight in-domain performance for significantly better generalization across e-commerce domains.
44
+
45
+ ## Benchmark Results
46
+
47
+ ### Cross-Domain Performance (vs Single-Domain Model)
48
+
49
+ | Dataset | Single-Domain | **Multi-Domain** | Improvement |
50
+ |---------|---------------|------------------|-------------|
51
+ | ESCI (in-domain) | 0.389 | 0.372 | -4% |
52
+ | WANDS (Wayfair) | 0.355 | **0.366** | +3% |
53
+ | Home Depot | 0.384 | **0.410** | +7% |
54
+
55
+ ### vs BM25 Baseline
56
+
57
+ | Dataset | BM25 | **This Model** | Improvement |
58
+ |---------|------|----------------|-------------|
59
+ | ESCI | 0.305 | 0.372 | +22% |
60
+ | WANDS | 0.329 | 0.366 | +11% |
61
+ | Home Depot | 0.349 | 0.410 | +17% |
62
+
63
+ ## Model Description
64
+
65
+ This is a [SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
66
+ ## Model Details
67
+
68
+ ### Model Description
69
+ - **Model Type:** SPLADE Sparse Encoder
70
+ - **Base model:** [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) <!-- at revision 12040accade4e8a0f71eabdb258fecc2e7e948be -->
71
+ - **Maximum Sequence Length:** 512 tokens
72
+ - **Output Dimensionality:** 30522 dimensions
73
+ - **Similarity Function:** Dot Product
74
+ <!-- - **Training Dataset:** Unknown -->
75
+ <!-- - **Language:** Unknown -->
76
+ <!-- - **License:** Unknown -->
77
+
78
+ ### Model Sources
79
+
80
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
81
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
82
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
83
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
84
+
85
+ ### Full Model Architecture
86
+
87
+ ```
88
+ SparseEncoder(
89
+ (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'DistilBertForMaskedLM'})
90
+ (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
91
+ )
92
+ ```
93
+
94
+ ## Usage
95
+
96
+ ### Direct Usage (Sentence Transformers)
97
+
98
+ First install the Sentence Transformers library:
99
+
100
+ ```bash
101
+ pip install -U sentence-transformers
102
+ ```
103
+
104
+ Then you can load this model and run inference.
105
+ ```python
106
+ from sentence_transformers import SparseEncoder
107
+
108
+ # Download from the 🤗 Hub
109
+ model = SparseEncoder("sparse_encoder_model_id")
110
+ # Run inference
111
+ sentences = [
112
+ 'mpow',
113
+ '[Mpow] Wireless Earbuds Active Noise Cancelling, Mpow X3 ANC Bluetooth Earphones w/4 Mics Noise Cancelling, Stereo Earbuds w/Deep Bass, 30Hrs ANC Earbuds w/USB-C Charge, Smart Touch Control, IPX8 Waterproof',
114
+ '[Jerzees] Jerzees Dri-Power Poly Pocketed Open-Bottom Sweatpants, Large - Black 100% Polyester Pre-shrunk Jersey',
115
+ ]
116
+ embeddings = model.encode(sentences)
117
+ print(embeddings.shape)
118
+ # [3, 30522]
119
+
120
+ # Get the similarity scores for the embeddings
121
+ similarities = model.similarity(embeddings, embeddings)
122
+ print(similarities)
123
+ # tensor([[ 69.1663, 66.0022, 51.6937],
124
+ # [ 66.0022, 238.3157, 60.5486],
125
+ # [ 51.6937, 60.5486, 174.3004]])
126
+ ```
127
+
128
+ <!--
129
+ ### Direct Usage (Transformers)
130
+
131
+ <details><summary>Click to see the direct usage in Transformers</summary>
132
+
133
+ </details>
134
+ -->
135
+
136
+ <!--
137
+ ### Downstream Usage (Sentence Transformers)
138
+
139
+ You can finetune this model on your own dataset.
140
+
141
+ <details><summary>Click to expand</summary>
142
+
143
+ </details>
144
+ -->
145
+
146
+ <!--
147
+ ### Out-of-Scope Use
148
+
149
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
150
+ -->
151
+
152
+ <!--
153
+ ## Bias, Risks and Limitations
154
+
155
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
156
+ -->
157
+
158
+ <!--
159
+ ### Recommendations
160
+
161
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
162
+ -->
163
+
164
+ ## Training Details
165
+
166
+ ### Training Dataset
167
+
168
+ #### Unnamed Dataset
169
+
170
+ * Size: 99,712 training samples
171
+ * Columns: <code>anchor</code> and <code>positive</code>
172
+ * Approximate statistics based on the first 1000 samples:
173
+ | | anchor | positive |
174
+ |:--------|:--------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
175
+ | type | string | string |
176
+ | details | <ul><li>min: 3 tokens</li><li>mean: 6.2 tokens</li><li>max: 22 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 99.84 tokens</li><li>max: 494 tokens</li></ul> |
177
+ * Samples:
178
+ | anchor | positive |
179
+ |:---------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
180
+ | <code>bird feeder pole station</code> | <code>[EXCMARK] EXCMARK 2 Pack Shepherd Hook 32 inch 1/2 inch Thick Use at Weddings, Hanging Solar Lights, Lanterns, Bird Feeders, Metal Hanger Hook (Bronze, 32 inch) <p><b>Create the garden of your dreams with our Shepherds Hooks!</b></p> <p>These amazing hooks with the perfect balance of tradition and versatility are the perfect accessory to any outdoor space! A super easy and convenient way to tackle any outdoor gardening party or event! It will make any hanging object stand out with ultimate beauty. Hang your decorative lights, bird feeders, lanterns, and more!</p> <p>Each hook includes 2 extenders for three height options. The hooks can measure up to 32”</code> |
181
+ | <code>chrome bath lighting</code> | <code>Progress Lighting Archie Collection 2-Light Chrome Bath Light Archie is a standout in any room and provides a fun and fashionable way to light your home. The authentic, prismatic style glass shade diffuses light to provide functional and stylish illumination. This fixture can be installed with the glass facing up or down to suit your preference.California residents: see&nbsp;Proposition 65 informationChrome finishClear prismatic glass17 in. W x 8-3/4 in. HUses (2) 100-Watt medium base bulbs (not included)Fixture can be installed facing upwards or downwards</code> |
182
+ | <code>sex toys kinky for female</code> | <code>[Knaughty Knickers] Knaughty Knickers Daddys Little Lil Fuck Toy Fucktoy DDLG BDSM Owned Boyshort Black 95% combed and ringspun cotton/5% spandex --- Low rise shortie boyshort style panty --- Satin trim fold over elastic waistband --- Custom embelished on quality Bella product --- Super soft and comfortable --- Funny or rude underwear</code> |
183
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
184
+ ```json
185
+ {
186
+ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score', gather_across_devices=False)",
187
+ "document_regularizer_weight": 3e-05,
188
+ "query_regularizer_weight": 5e-05
189
+ }
190
+ ```
191
+
192
+ ### Training Hyperparameters
193
+ #### Non-Default Hyperparameters
194
+
195
+ - `per_device_train_batch_size`: 32
196
+ - `learning_rate`: 2e-05
197
+ - `num_train_epochs`: 1
198
+ - `warmup_ratio`: 0.1
199
+ - `fp16`: True
200
+ - `batch_sampler`: no_duplicates
201
+ - `router_mapping`: {'anchor': 'query', 'positive': 'document'}
202
+
203
+ #### All Hyperparameters
204
+ <details><summary>Click to expand</summary>
205
+
206
+ - `overwrite_output_dir`: False
207
+ - `do_predict`: False
208
+ - `eval_strategy`: no
209
+ - `prediction_loss_only`: True
210
+ - `per_device_train_batch_size`: 32
211
+ - `per_device_eval_batch_size`: 8
212
+ - `per_gpu_train_batch_size`: None
213
+ - `per_gpu_eval_batch_size`: None
214
+ - `gradient_accumulation_steps`: 1
215
+ - `eval_accumulation_steps`: None
216
+ - `torch_empty_cache_steps`: None
217
+ - `learning_rate`: 2e-05
218
+ - `weight_decay`: 0.0
219
+ - `adam_beta1`: 0.9
220
+ - `adam_beta2`: 0.999
221
+ - `adam_epsilon`: 1e-08
222
+ - `max_grad_norm`: 1.0
223
+ - `num_train_epochs`: 1
224
+ - `max_steps`: -1
225
+ - `lr_scheduler_type`: linear
226
+ - `lr_scheduler_kwargs`: {}
227
+ - `warmup_ratio`: 0.1
228
+ - `warmup_steps`: 0
229
+ - `log_level`: passive
230
+ - `log_level_replica`: warning
231
+ - `log_on_each_node`: True
232
+ - `logging_nan_inf_filter`: True
233
+ - `save_safetensors`: True
234
+ - `save_on_each_node`: False
235
+ - `save_only_model`: False
236
+ - `restore_callback_states_from_checkpoint`: False
237
+ - `no_cuda`: False
238
+ - `use_cpu`: False
239
+ - `use_mps_device`: False
240
+ - `seed`: 42
241
+ - `data_seed`: None
242
+ - `jit_mode_eval`: False
243
+ - `bf16`: False
244
+ - `fp16`: True
245
+ - `fp16_opt_level`: O1
246
+ - `half_precision_backend`: auto
247
+ - `bf16_full_eval`: False
248
+ - `fp16_full_eval`: False
249
+ - `tf32`: None
250
+ - `local_rank`: 0
251
+ - `ddp_backend`: None
252
+ - `tpu_num_cores`: None
253
+ - `tpu_metrics_debug`: False
254
+ - `debug`: []
255
+ - `dataloader_drop_last`: False
256
+ - `dataloader_num_workers`: 0
257
+ - `dataloader_prefetch_factor`: None
258
+ - `past_index`: -1
259
+ - `disable_tqdm`: False
260
+ - `remove_unused_columns`: True
261
+ - `label_names`: None
262
+ - `load_best_model_at_end`: False
263
+ - `ignore_data_skip`: False
264
+ - `fsdp`: []
265
+ - `fsdp_min_num_params`: 0
266
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
267
+ - `fsdp_transformer_layer_cls_to_wrap`: None
268
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
269
+ - `parallelism_config`: None
270
+ - `deepspeed`: None
271
+ - `label_smoothing_factor`: 0.0
272
+ - `optim`: adamw_torch_fused
273
+ - `optim_args`: None
274
+ - `adafactor`: False
275
+ - `group_by_length`: False
276
+ - `length_column_name`: length
277
+ - `project`: huggingface
278
+ - `trackio_space_id`: trackio
279
+ - `ddp_find_unused_parameters`: None
280
+ - `ddp_bucket_cap_mb`: None
281
+ - `ddp_broadcast_buffers`: False
282
+ - `dataloader_pin_memory`: True
283
+ - `dataloader_persistent_workers`: False
284
+ - `skip_memory_metrics`: True
285
+ - `use_legacy_prediction_loop`: False
286
+ - `push_to_hub`: False
287
+ - `resume_from_checkpoint`: None
288
+ - `hub_model_id`: None
289
+ - `hub_strategy`: every_save
290
+ - `hub_private_repo`: None
291
+ - `hub_always_push`: False
292
+ - `hub_revision`: None
293
+ - `gradient_checkpointing`: False
294
+ - `gradient_checkpointing_kwargs`: None
295
+ - `include_inputs_for_metrics`: False
296
+ - `include_for_metrics`: []
297
+ - `eval_do_concat_batches`: True
298
+ - `fp16_backend`: auto
299
+ - `push_to_hub_model_id`: None
300
+ - `push_to_hub_organization`: None
301
+ - `mp_parameters`:
302
+ - `auto_find_batch_size`: False
303
+ - `full_determinism`: False
304
+ - `torchdynamo`: None
305
+ - `ray_scope`: last
306
+ - `ddp_timeout`: 1800
307
+ - `torch_compile`: False
308
+ - `torch_compile_backend`: None
309
+ - `torch_compile_mode`: None
310
+ - `include_tokens_per_second`: False
311
+ - `include_num_input_tokens_seen`: no
312
+ - `neftune_noise_alpha`: None
313
+ - `optim_target_modules`: None
314
+ - `batch_eval_metrics`: False
315
+ - `eval_on_start`: False
316
+ - `use_liger_kernel`: False
317
+ - `liger_kernel_config`: None
318
+ - `eval_use_gather_object`: False
319
+ - `average_tokens_across_devices`: True
320
+ - `prompts`: None
321
+ - `batch_sampler`: no_duplicates
322
+ - `multi_dataset_batch_sampler`: proportional
323
+ - `router_mapping`: {'anchor': 'query', 'positive': 'document'}
324
+ - `learning_rate_mapping`: {}
325
+
326
+ </details>
327
+
328
+ ### Training Logs
329
+ | Epoch | Step | Training Loss |
330
+ |:------:|:----:|:-------------:|
331
+ | 0.0321 | 100 | 329.7303 |
332
+ | 0.0642 | 200 | 1.9189 |
333
+ | 0.0963 | 300 | 0.4059 |
334
+ | 0.1284 | 400 | 0.3173 |
335
+ | 0.1605 | 500 | 0.2776 |
336
+ | 0.1926 | 600 | 0.2812 |
337
+ | 0.2246 | 700 | 0.2648 |
338
+ | 0.2567 | 800 | 0.2821 |
339
+ | 0.2888 | 900 | 0.254 |
340
+ | 0.3209 | 1000 | 0.2789 |
341
+ | 0.3530 | 1100 | 0.2163 |
342
+ | 0.3851 | 1200 | 0.2375 |
343
+ | 0.4172 | 1300 | 0.2165 |
344
+ | 0.4493 | 1400 | 0.2254 |
345
+ | 0.4814 | 1500 | 0.2105 |
346
+ | 0.5135 | 1600 | 0.2147 |
347
+ | 0.5456 | 1700 | 0.2468 |
348
+ | 0.5777 | 1800 | 0.2438 |
349
+ | 0.6098 | 1900 | 0.209 |
350
+ | 0.6418 | 2000 | 0.2327 |
351
+ | 0.6739 | 2100 | 0.2475 |
352
+ | 0.7060 | 2200 | 0.227 |
353
+ | 0.7381 | 2300 | 0.1992 |
354
+ | 0.7702 | 2400 | 0.2258 |
355
+ | 0.8023 | 2500 | 0.1676 |
356
+ | 0.8344 | 2600 | 0.2081 |
357
+ | 0.8665 | 2700 | 0.1966 |
358
+ | 0.8986 | 2800 | 0.218 |
359
+ | 0.9307 | 2900 | 0.1998 |
360
+ | 0.9628 | 3000 | 0.2157 |
361
+ | 0.9949 | 3100 | 0.2011 |
362
+
363
+
364
+ ### Framework Versions
365
+ - Python: 3.11.10
366
+ - Sentence Transformers: 5.2.0
367
+ - Transformers: 4.57.3
368
+ - PyTorch: 2.9.1+cu128
369
+ - Accelerate: 1.12.0
370
+ - Datasets: 4.4.1
371
+ - Tokenizers: 0.22.1
372
+
373
+ ## Citation
374
+
375
+ ### BibTeX
376
+
377
+ #### Sentence Transformers
378
+ ```bibtex
379
+ @inproceedings{reimers-2019-sentence-bert,
380
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
381
+ author = "Reimers, Nils and Gurevych, Iryna",
382
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
383
+ month = "11",
384
+ year = "2019",
385
+ publisher = "Association for Computational Linguistics",
386
+ url = "https://arxiv.org/abs/1908.10084",
387
+ }
388
+ ```
389
+
390
+ #### SpladeLoss
391
+ ```bibtex
392
+ @misc{formal2022distillationhardnegativesampling,
393
+ title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
394
+ author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
395
+ year={2022},
396
+ eprint={2205.04733},
397
+ archivePrefix={arXiv},
398
+ primaryClass={cs.IR},
399
+ url={https://arxiv.org/abs/2205.04733},
400
+ }
401
+ ```
402
+
403
+ #### SparseMultipleNegativesRankingLoss
404
+ ```bibtex
405
+ @misc{henderson2017efficient,
406
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
407
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
408
+ year={2017},
409
+ eprint={1705.00652},
410
+ archivePrefix={arXiv},
411
+ primaryClass={cs.CL}
412
+ }
413
+ ```
414
+
415
+ #### FlopsLoss
416
+ ```bibtex
417
+ @article{paria2020minimizing,
418
+ title={Minimizing flops to learn efficient sparse representations},
419
+ author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
420
+ journal={arXiv preprint arXiv:2004.05665},
421
+ year={2020}
422
+ }
423
+ ```
424
+
425
+ <!--
426
+ ## Glossary
427
+
428
+ *Clearly define terms in order to be accessible across audiences.*
429
+ -->
430
+
431
+ <!--
432
+ ## Model Card Authors
433
+
434
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
435
+ -->
436
+
437
+ <!--
438
+ ## Model Card Contact
439
+
440
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
441
+ -->
config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForMaskedLM"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.1,
9
+ "dtype": "float32",
10
+ "hidden_dim": 3072,
11
+ "initializer_range": 0.02,
12
+ "max_position_embeddings": 512,
13
+ "model_type": "distilbert",
14
+ "n_heads": 12,
15
+ "n_layers": 6,
16
+ "pad_token_id": 0,
17
+ "qa_dropout": 0.1,
18
+ "seq_classif_dropout": 0.2,
19
+ "sinusoidal_pos_embds": false,
20
+ "tie_weights_": true,
21
+ "transformers_version": "4.57.3",
22
+ "vocab_size": 30522
23
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SparseEncoder",
3
+ "__version__": {
4
+ "sentence_transformers": "5.2.0",
5
+ "transformers": "4.57.3",
6
+ "pytorch": "2.9.1+cu128"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "dot"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28215f9812492073729fdfaefb4fb52921a100cb669f3b187109b167d55f4a07
3
+ size 267954768
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.sparse_encoder.models.MLMTransformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_SpladePooling",
12
+ "type": "sentence_transformers.sparse_encoder.models.SpladePooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "DistilBertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff