zacbrld commited on
Commit
ec7ac6a
·
verified ·
1 Parent(s): 68c1a3f

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,524 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:34235
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: zacbrld/MNLP_M2_document_encoder
10
+ widget:
11
+ - source_sentence: What is This?
12
+ sentences:
13
+ - Bellcranks are also seen in automotive applications, such as in the linkage connecting
14
+ the throttle pedal to the carburetor or connecting the brake pedal to the master
15
+ cylinder In vehicle suspensions, bellcranks are used in pullrod and pushrod suspensions
16
+ in cars or in the Christie suspension in tanks More vertical suspension designs
17
+ such as MacPherson struts may not be feasible in some vehicle designs due to space,
18
+ aerodynamic, or other design constraints; bellcranks translate the vertical motion
19
+ of the wheel into horizontal motion, allowing the suspension to be mounted transversely
20
+ or longitudinally within the vehicle
21
+ - DynaMo was also used as the face of the BBC's parental assistance website This
22
+ was created for parents to assist children with homework There was also a section
23
+ called "DynaMo's Den" which included educational games for children The website
24
+ was activated on 2 October 1998
25
+ - "The diode equation above is an example of an element constitutive equation of\
26
+ \ the general form,\n\n \n \n \n f\n (\n v\n \
27
+ \ ,\n i\n )\n =\n 0\n \n \n {\\displaystyle\
28
+ \ f(v,i)=0}\n \n\nThis can be thought of as a non-linear resistor The corresponding\
29
+ \ constitutive equations for non-linear inductors and capacitors are respectively;\n\
30
+ \n \n \n \n f\n (\n v\n ,\n φ\n \
31
+ \ )\n =\n 0\n \n \n {\\displaystyle f(v,\\varphi\
32
+ \ )=0}\n \n\n \n \n \n f\n (\n v\n ,\n \
33
+ \ q\n )\n =\n 0\n \n \n {\\displaystyle\
34
+ \ f(v,q)=0}\n \n\nwhere f is any arbitrary function, φ is the stored magnetic\
35
+ \ flux and q is the stored charge"
36
+ - source_sentence: algorithm explanation
37
+ sentences:
38
+ - 'Descriptive statistics
39
+
40
+ Average
41
+
42
+ Mean
43
+
44
+ Median
45
+
46
+ Mode
47
+
48
+ Measures of scale
49
+
50
+ Variance
51
+
52
+ Standard deviation
53
+
54
+ Median absolute deviation
55
+
56
+ Correlation
57
+
58
+ Polychoric correlation
59
+
60
+ Outlier
61
+
62
+ Statistical graphics
63
+
64
+ Histogram
65
+
66
+ Frequency distribution
67
+
68
+ Quantile
69
+
70
+ Survival function
71
+
72
+ Failure rate
73
+
74
+ Scatter plot
75
+
76
+ Bar chart'
77
+ - 'The various fields and topics that projects engineers are involved with include:
78
+
79
+
80
+ Work breakdown structure: a deliverable-oriented breakdown of a project into smaller
81
+ components
82
+
83
+ Gantt chart: type of bar chart that illustrates a project schedule
84
+
85
+ Critical Path Analysis: an algorithm for scheduling a set of project activities
86
+
87
+ Program evaluation and review technique: a statistical tool which was designed
88
+ to analyze and represent the tasks involved in completing a given project
89
+
90
+ Graphical Evaluation and Review Technique: network analysis technique that allows
91
+ probabilistic treatment both network logic and estimation of activity duration
92
+
93
+ Petri Nets: one of several mathematical modeling languages for the description
94
+ of distributed systems'
95
+ - Jessiko was marketed as a luxury decoration for businesses such as hotels, restaurants,
96
+ and museums Tiraby expressed hope that one day it would be common to find his
97
+ invention in household ponds and swimming pools
98
+ - source_sentence: 'The firm was founded as SECOR Ltd in 1994 by John Leeson, Alan
99
+ Sheppard, and David Richards After establishing the company in Oxford, United
100
+ Kingdom, in 1994, David oversaw the growth of the business from a small UK operator
101
+ into an environmental consultancies in the UK, with international operations across
102
+ Africa, Australasia, Canada, Europe, and the US
103
+
104
+ In 2000, the senior management team completed a management buyout and the company''s
105
+ name was changed to SLR Consulting Limited In 2004 they secured funding from Livingbridge,
106
+ who invested £4 85 million as part of a £13 million investment including other
107
+ partners, and took a significant minority stake in the company In 2008, 3i invested
108
+ £32 5 million in the firm, and replaced Livingbridge with a significant minority
109
+ stake In March 2018, Charterhouse Capital Partners (CCP) acquired a majority shareholding
110
+ in the business In June 2022 Charterhouse Capital Partners agreed to a sale of
111
+ SLR Consulting to Ares Management private equity partners David Richards was Chief
112
+ Executive Officer from 1994–2013 In line with the Group''s succession plans, Neil
113
+ Penhall, formerly Managing Director of SLR Consulting and an Executive Director
114
+ of SLR Management, assumed the role of CEO'
115
+ sentences:
116
+ - 'Institute for Transuranium Elements (ITU)
117
+
118
+ Institute for the Protection and the Security of the Citizen (IPSC)
119
+
120
+ Institute for Environment and Sustainability (IES)
121
+
122
+ Institute for Health and Consumer Protection (IHCP)
123
+
124
+ Institute for Energy (IE)
125
+
126
+ Institute for Prospective Technological Studies (IPTS)'
127
+ - Project NExT was founded by James (Jim) Leitzel (Ohio State University) and Chris
128
+ Stevens (Saint Louis University) The first fellows were selected in 1994 Jim Leitzel
129
+ died in 1998, and Aparna Higgins (University of Dayton) and Joe Gallian (University
130
+ of Minnesota Duluth) became co-directors of Project NExT Chris Stevens stepped
131
+ down as director in 2010, and was succeeded by Aparna Higgins and Joe Gallian
132
+ Judith Covington (Louisiana State University, Shreveport) and Gavin LaRose (University
133
+ of Michigan) first served as Associate Co-Directors and later became Co-Directors
134
+ In 2007, the total number of fellows surpassed 1000 By 2017 the total number of
135
+ fellows reached 1700 In 2023 Christine Kelley became director
136
+ - Quantum secure communication is a method that is expected to be 'quantum safe'
137
+ in the advent of quantum computing systems that could break current cryptography
138
+ systems using methods such as Shor's algorithm These methods include quantum key
139
+ distribution (QKD), a method of transmitting information using entangled light
140
+ in a way that makes any interception of the transmission obvious to the user Another
141
+ method is the quantum random number generator, which is capable of producing truly
142
+ random numbers unlike non-quantum algorithms that merely imitate randomness
143
+ - source_sentence: chemical reaction
144
+ sentences:
145
+ - With suitably encoded scales (multitrack, vernier, digital code, or pseudo-random
146
+ code) an encoder can determine its position without movement or needing to find
147
+ a reference position Such absolute encoders also communicate using serial communication
148
+ protocols Many of these protocols are proprietary (e g , Fanuc, Mitsubishi, FeeDat
149
+ (Fagor Automation), Heidenhain EnDat, DriveCliq, Panasonic, Yaskawa) but open
150
+ standards such as BiSS are now appearing, which avoid tying users to a particular
151
+ supplier
152
+ - Bonneau, Pierre; Allens, Gaspard d' (2020) Cent mille ans Bure ou le scandale
153
+ enfoui des déchets nucléaires [One hundred thousand years Bure, or the buried
154
+ scandal of nuclear waste] Illustrated by Cécile Guillard La Revue dessinée - Seuil
155
+ ISBN 978-2-02-145982-1
156
+ - The reason why MACE is heavily researched is that it allows completely anisotropic
157
+ etching of silicon substrates which is not possible with other wet chemical etching
158
+ methods (see figure to the right) Usually the silicon substrate is covered with
159
+ a protective layer such as photoresist before it is immersed in an etching solution
160
+ The etching solution usually has no preferred direction of attacking the substrate,
161
+ therefore isotropic etching takes place In semiconductor engineering, however
162
+ it is often required that the sidewalls of the etched trenches are steep This
163
+ is usually realized with methods that operate in the gas-phase such as reactive
164
+ ion etching These methods require expensive equipment compared to simple wet etching
165
+ MACE, in principle allows the fabrication of steep trenches but is still cheap
166
+ compared to gas-phase etching methods
167
+ - source_sentence: synthesis method
168
+ sentences:
169
+ - STEMNET used to receive funding from the Department for Education and Skills Since
170
+ June 2007, it receives funding from the Department for Children, Schools and Families
171
+ and Department for Innovation, Universities and Skills, since STEMNET sits on
172
+ the chronological dividing point (age 16) of both of the new departments
173
+ - The Arab States of the Persian Gulf plan to start their own joint civilian nuclear
174
+ program An agreement in the final days of the Bush administration provided for
175
+ cooperation between the United Arab Emirates and the United States of America
176
+ in which the United States would sell the UAE nuclear reactors and nuclear fuel
177
+ The UAE would, in return, renounce their right to enrich uranium for their civilian
178
+ nuclear program At the time of signing, this agreement was touted as a way to
179
+ reduce risks of nuclear proliferation in the Persian Gulf However, Mustafa Alani
180
+ of the Dubai-based Gulf Research Center stated that, should the Nuclear Non-Proliferation
181
+ Treaty collapse, nuclear reactors such as those slated to be sold to the UAE under
182
+ this agreement could provide the UAE with a path toward a nuclear weapon, raising
183
+ the specter of further nuclear proliferation In March 2007, foreign ministers
184
+ of the six-member Gulf Cooperation Council met in Saudi Arabia to discuss progress
185
+ in plans agreed in December 2006, for a joint civilian nuclear program
186
+ - Timber framing dates back thousands of years, and has been used in many parts
187
+ of the world during various periods such as ancient Japan, Europe and medieval
188
+ England in localities where timber was in good supply and building stone and the
189
+ skills to work it were not The use of timber framing in buildings provides their
190
+ complete skeletal framing which offers some structural benefits as the timber
191
+ frame, if properly engineered, lends itself to better seismic survivability
192
+ pipeline_tag: sentence-similarity
193
+ library_name: sentence-transformers
194
+ ---
195
+
196
+ # SentenceTransformer based on zacbrld/MNLP_M2_document_encoder
197
+
198
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [zacbrld/MNLP_M2_document_encoder](https://huggingface.co/zacbrld/MNLP_M2_document_encoder). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
199
+
200
+ ## Model Details
201
+
202
+ ### Model Description
203
+ - **Model Type:** Sentence Transformer
204
+ - **Base model:** [zacbrld/MNLP_M2_document_encoder](https://huggingface.co/zacbrld/MNLP_M2_document_encoder) <!-- at revision 6f1d702dcb1d5e9fd30b691c84fadd9a1704a148 -->
205
+ - **Maximum Sequence Length:** 256 tokens
206
+ - **Output Dimensionality:** 384 dimensions
207
+ - **Similarity Function:** Cosine Similarity
208
+ <!-- - **Training Dataset:** Unknown -->
209
+ <!-- - **Language:** Unknown -->
210
+ <!-- - **License:** Unknown -->
211
+
212
+ ### Model Sources
213
+
214
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
215
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
216
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
217
+
218
+ ### Full Model Architecture
219
+
220
+ ```
221
+ SentenceTransformer(
222
+ (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
223
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
224
+ (2): Normalize()
225
+ )
226
+ ```
227
+
228
+ ## Usage
229
+
230
+ ### Direct Usage (Sentence Transformers)
231
+
232
+ First install the Sentence Transformers library:
233
+
234
+ ```bash
235
+ pip install -U sentence-transformers
236
+ ```
237
+
238
+ Then you can load this model and run inference.
239
+ ```python
240
+ from sentence_transformers import SentenceTransformer
241
+
242
+ # Download from the 🤗 Hub
243
+ model = SentenceTransformer("zacbrld/MNLP_M3_document_encoder_V1")
244
+ # Run inference
245
+ sentences = [
246
+ 'synthesis method',
247
+ 'STEMNET used to receive funding from the Department for Education and Skills Since June 2007, it receives funding from the Department for Children, Schools and Families and Department for Innovation, Universities and Skills, since STEMNET sits on the chronological dividing point (age 16) of both of the new departments',
248
+ 'The Arab States of the Persian Gulf plan to start their own joint civilian nuclear program An agreement in the final days of the Bush administration provided for cooperation between the United Arab Emirates and the United States of America in which the United States would sell the UAE nuclear reactors and nuclear fuel The UAE would, in return, renounce their right to enrich uranium for their civilian nuclear program At the time of signing, this agreement was touted as a way to reduce risks of nuclear proliferation in the Persian Gulf However, Mustafa Alani of the Dubai-based Gulf Research Center stated that, should the Nuclear Non-Proliferation Treaty collapse, nuclear reactors such as those slated to be sold to the UAE under this agreement could provide the UAE with a path toward a nuclear weapon, raising the specter of further nuclear proliferation In March 2007, foreign ministers of the six-member Gulf Cooperation Council met in Saudi Arabia to discuss progress in plans agreed in December 2006, for a joint civilian nuclear program',
249
+ ]
250
+ embeddings = model.encode(sentences)
251
+ print(embeddings.shape)
252
+ # [3, 384]
253
+
254
+ # Get the similarity scores for the embeddings
255
+ similarities = model.similarity(embeddings, embeddings)
256
+ print(similarities.shape)
257
+ # [3, 3]
258
+ ```
259
+
260
+ <!--
261
+ ### Direct Usage (Transformers)
262
+
263
+ <details><summary>Click to see the direct usage in Transformers</summary>
264
+
265
+ </details>
266
+ -->
267
+
268
+ <!--
269
+ ### Downstream Usage (Sentence Transformers)
270
+
271
+ You can finetune this model on your own dataset.
272
+
273
+ <details><summary>Click to expand</summary>
274
+
275
+ </details>
276
+ -->
277
+
278
+ <!--
279
+ ### Out-of-Scope Use
280
+
281
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
282
+ -->
283
+
284
+ <!--
285
+ ## Bias, Risks and Limitations
286
+
287
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
288
+ -->
289
+
290
+ <!--
291
+ ### Recommendations
292
+
293
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
294
+ -->
295
+
296
+ ## Training Details
297
+
298
+ ### Training Dataset
299
+
300
+ #### Unnamed Dataset
301
+
302
+ * Size: 34,235 training samples
303
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
304
+ * Approximate statistics based on the first 1000 samples:
305
+ | | sentence_0 | sentence_1 |
306
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
307
+ | type | string | string |
308
+ | details | <ul><li>min: 3 tokens</li><li>mean: 21.24 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 34 tokens</li><li>mean: 133.62 tokens</li><li>max: 256 tokens</li></ul> |
309
+ * Samples:
310
+ | sentence_0 | sentence_1 |
311
+ |:----------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
312
+ | <code>chemistry experiment</code> | <code>Since 1982, research has been conducted to develop technologies, commonly referred to as electronic noses, that could detect and recognize odors and flavors Application areas include food, medicine and the environment</code> |
313
+ | <code>quantum physics</code> | <code>Hydro electric - Hydro-electric turbomachinery uses potential energy stored in water to flow over an open impeller to turn a generator which creates electricity<br>Steam turbines - Steam turbines used in power generation come in many different variations The overall principle is high pressure steam is forced over blades attached to a shaft, which turns a generator As the steam travels through the turbine, it passes through smaller blades causing the shaft to spin faster, creating more electricity Gas turbines - Gas turbines work much like steam turbines Air is forced in through a series of blades that turn a shaft Then fuel is mixed with the air and causes a combustion reaction, increasing the power This then causes the shaft to spin faster, creating more electricity Windmills - Also known as a wind turbine, windmills are increasing in popularity for their ability to efficiently use the wind to generate electricity Although they come in many shapes and sizes, the most common one is the la...</code> |
314
+ | <code>physics law</code> | <code>Backlash in gear couplings allows for slight angular misalignment There can be significant backlash in unsynchronized transmissions because of the intentional gap between the dogs in dog clutches The gap is necessary to engage dogs when input shaft (engine) speed and output shaft (driveshaft) speed are imperfectly synchronized If there was a smaller clearance, it would be nearly impossible to engage the gears because the dogs would interfere with each other in most configurations In synchronized transmissions, synchromesh solves this problem However, backlash is undesirable in precision positioning applications such as machine tool tables It can be minimized by choosing ball screws or leadscrews with preloaded nuts, and mounting them in preloaded bearings A preloaded bearing uses a spring and/or a second bearing to provide a compressive axial force that maintains bearing surfaces in contact despite reversal of the load direction</code> |
315
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
316
+ ```json
317
+ {
318
+ "scale": 20.0,
319
+ "similarity_fct": "cos_sim"
320
+ }
321
+ ```
322
+
323
+ ### Training Hyperparameters
324
+ #### Non-Default Hyperparameters
325
+
326
+ - `num_train_epochs`: 2
327
+ - `multi_dataset_batch_sampler`: round_robin
328
+
329
+ #### All Hyperparameters
330
+ <details><summary>Click to expand</summary>
331
+
332
+ - `overwrite_output_dir`: False
333
+ - `do_predict`: False
334
+ - `eval_strategy`: no
335
+ - `prediction_loss_only`: True
336
+ - `per_device_train_batch_size`: 8
337
+ - `per_device_eval_batch_size`: 8
338
+ - `per_gpu_train_batch_size`: None
339
+ - `per_gpu_eval_batch_size`: None
340
+ - `gradient_accumulation_steps`: 1
341
+ - `eval_accumulation_steps`: None
342
+ - `torch_empty_cache_steps`: None
343
+ - `learning_rate`: 5e-05
344
+ - `weight_decay`: 0.0
345
+ - `adam_beta1`: 0.9
346
+ - `adam_beta2`: 0.999
347
+ - `adam_epsilon`: 1e-08
348
+ - `max_grad_norm`: 1
349
+ - `num_train_epochs`: 2
350
+ - `max_steps`: -1
351
+ - `lr_scheduler_type`: linear
352
+ - `lr_scheduler_kwargs`: {}
353
+ - `warmup_ratio`: 0.0
354
+ - `warmup_steps`: 0
355
+ - `log_level`: passive
356
+ - `log_level_replica`: warning
357
+ - `log_on_each_node`: True
358
+ - `logging_nan_inf_filter`: True
359
+ - `save_safetensors`: True
360
+ - `save_on_each_node`: False
361
+ - `save_only_model`: False
362
+ - `restore_callback_states_from_checkpoint`: False
363
+ - `no_cuda`: False
364
+ - `use_cpu`: False
365
+ - `use_mps_device`: False
366
+ - `seed`: 42
367
+ - `data_seed`: None
368
+ - `jit_mode_eval`: False
369
+ - `use_ipex`: False
370
+ - `bf16`: False
371
+ - `fp16`: False
372
+ - `fp16_opt_level`: O1
373
+ - `half_precision_backend`: auto
374
+ - `bf16_full_eval`: False
375
+ - `fp16_full_eval`: False
376
+ - `tf32`: None
377
+ - `local_rank`: 0
378
+ - `ddp_backend`: None
379
+ - `tpu_num_cores`: None
380
+ - `tpu_metrics_debug`: False
381
+ - `debug`: []
382
+ - `dataloader_drop_last`: False
383
+ - `dataloader_num_workers`: 0
384
+ - `dataloader_prefetch_factor`: None
385
+ - `past_index`: -1
386
+ - `disable_tqdm`: False
387
+ - `remove_unused_columns`: True
388
+ - `label_names`: None
389
+ - `load_best_model_at_end`: False
390
+ - `ignore_data_skip`: False
391
+ - `fsdp`: []
392
+ - `fsdp_min_num_params`: 0
393
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
394
+ - `fsdp_transformer_layer_cls_to_wrap`: None
395
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
396
+ - `deepspeed`: None
397
+ - `label_smoothing_factor`: 0.0
398
+ - `optim`: adamw_torch
399
+ - `optim_args`: None
400
+ - `adafactor`: False
401
+ - `group_by_length`: False
402
+ - `length_column_name`: length
403
+ - `ddp_find_unused_parameters`: None
404
+ - `ddp_bucket_cap_mb`: None
405
+ - `ddp_broadcast_buffers`: False
406
+ - `dataloader_pin_memory`: True
407
+ - `dataloader_persistent_workers`: False
408
+ - `skip_memory_metrics`: True
409
+ - `use_legacy_prediction_loop`: False
410
+ - `push_to_hub`: False
411
+ - `resume_from_checkpoint`: None
412
+ - `hub_model_id`: None
413
+ - `hub_strategy`: every_save
414
+ - `hub_private_repo`: None
415
+ - `hub_always_push`: False
416
+ - `gradient_checkpointing`: False
417
+ - `gradient_checkpointing_kwargs`: None
418
+ - `include_inputs_for_metrics`: False
419
+ - `include_for_metrics`: []
420
+ - `eval_do_concat_batches`: True
421
+ - `fp16_backend`: auto
422
+ - `push_to_hub_model_id`: None
423
+ - `push_to_hub_organization`: None
424
+ - `mp_parameters`:
425
+ - `auto_find_batch_size`: False
426
+ - `full_determinism`: False
427
+ - `torchdynamo`: None
428
+ - `ray_scope`: last
429
+ - `ddp_timeout`: 1800
430
+ - `torch_compile`: False
431
+ - `torch_compile_backend`: None
432
+ - `torch_compile_mode`: None
433
+ - `include_tokens_per_second`: False
434
+ - `include_num_input_tokens_seen`: False
435
+ - `neftune_noise_alpha`: None
436
+ - `optim_target_modules`: None
437
+ - `batch_eval_metrics`: False
438
+ - `eval_on_start`: False
439
+ - `use_liger_kernel`: False
440
+ - `eval_use_gather_object`: False
441
+ - `average_tokens_across_devices`: False
442
+ - `prompts`: None
443
+ - `batch_sampler`: batch_sampler
444
+ - `multi_dataset_batch_sampler`: round_robin
445
+
446
+ </details>
447
+
448
+ ### Training Logs
449
+ | Epoch | Step | Training Loss |
450
+ |:------:|:----:|:-------------:|
451
+ | 0.1168 | 500 | 1.465 |
452
+ | 0.2336 | 1000 | 1.189 |
453
+ | 0.3505 | 1500 | 1.1209 |
454
+ | 0.4673 | 2000 | 1.0333 |
455
+ | 0.5841 | 2500 | 0.993 |
456
+ | 0.7009 | 3000 | 0.9573 |
457
+ | 0.8178 | 3500 | 0.9275 |
458
+ | 0.9346 | 4000 | 0.9177 |
459
+ | 1.0514 | 4500 | 0.8241 |
460
+ | 1.1682 | 5000 | 0.7726 |
461
+ | 1.2850 | 5500 | 0.7685 |
462
+ | 1.4019 | 6000 | 0.7623 |
463
+ | 1.5187 | 6500 | 0.7668 |
464
+ | 1.6355 | 7000 | 0.7556 |
465
+ | 1.7523 | 7500 | 0.7002 |
466
+ | 1.8692 | 8000 | 0.7363 |
467
+ | 1.9860 | 8500 | 0.7396 |
468
+
469
+
470
+ ### Framework Versions
471
+ - Python: 3.12.8
472
+ - Sentence Transformers: 3.4.1
473
+ - Transformers: 4.52.2
474
+ - PyTorch: 2.7.0+cu126
475
+ - Accelerate: 1.3.0
476
+ - Datasets: 3.2.0
477
+ - Tokenizers: 0.21.0
478
+
479
+ ## Citation
480
+
481
+ ### BibTeX
482
+
483
+ #### Sentence Transformers
484
+ ```bibtex
485
+ @inproceedings{reimers-2019-sentence-bert,
486
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
487
+ author = "Reimers, Nils and Gurevych, Iryna",
488
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
489
+ month = "11",
490
+ year = "2019",
491
+ publisher = "Association for Computational Linguistics",
492
+ url = "https://arxiv.org/abs/1908.10084",
493
+ }
494
+ ```
495
+
496
+ #### MultipleNegativesRankingLoss
497
+ ```bibtex
498
+ @misc{henderson2017efficient,
499
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
500
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
501
+ year={2017},
502
+ eprint={1705.00652},
503
+ archivePrefix={arXiv},
504
+ primaryClass={cs.CL}
505
+ }
506
+ ```
507
+
508
+ <!--
509
+ ## Glossary
510
+
511
+ *Clearly define terms in order to be accessible across audiences.*
512
+ -->
513
+
514
+ <!--
515
+ ## Model Card Authors
516
+
517
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
518
+ -->
519
+
520
+ <!--
521
+ ## Model Card Contact
522
+
523
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
524
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 1536,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 6,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.52.2",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.52.2",
5
+ "pytorch": "2.7.0+cu126"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42ab139d799878bacbe9a0fb469dd44bdc78dcc0a5805d6b8154429956ccbe85
3
+ size 90864192
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "max_length": 128,
51
+ "model_max_length": 256,
52
+ "never_split": null,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "[PAD]",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "[SEP]",
58
+ "stride": 0,
59
+ "strip_accents": null,
60
+ "tokenize_chinese_chars": true,
61
+ "tokenizer_class": "BertTokenizer",
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "[UNK]"
65
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff