File size: 27,177 Bytes
ec7ac6a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:34235
- loss:MultipleNegativesRankingLoss
base_model: zacbrld/MNLP_M2_document_encoder
widget:
- source_sentence: What is This?
  sentences:
  - Bellcranks are also seen in automotive applications, such as in the linkage connecting
    the throttle pedal to the carburetor or connecting the brake pedal to the master
    cylinder In vehicle suspensions, bellcranks are used in pullrod and pushrod suspensions
    in cars or in the Christie suspension in tanks More vertical suspension designs
    such as MacPherson struts may not be feasible in some vehicle designs due to space,
    aerodynamic, or other design constraints; bellcranks translate the vertical motion
    of the wheel into horizontal motion, allowing the suspension to be mounted transversely
    or longitudinally within the vehicle
  - DynaMo was also used as the face of the BBC's parental assistance website This
    was created for parents to assist children with homework There was also a section
    called "DynaMo's Den" which included educational games for children The website
    was activated on 2 October 1998
  - "The diode equation above is an example of an element constitutive equation of\
    \ the general form,\n\n  \n    \n      \n        f\n        (\n        v\n   \
    \     ,\n        i\n        )\n        =\n        0\n      \n    \n    {\\displaystyle\
    \ f(v,i)=0}\n  \n\nThis can be thought of as a non-linear resistor The corresponding\
    \ constitutive equations for non-linear inductors and capacitors are respectively;\n\
    \n  \n    \n      \n        f\n        (\n        v\n        ,\n        φ\n  \
    \      )\n        =\n        0\n      \n    \n    {\\displaystyle f(v,\\varphi\
    \ )=0}\n  \n\n  \n    \n      \n        f\n        (\n        v\n        ,\n \
    \       q\n        )\n        =\n        0\n      \n    \n    {\\displaystyle\
    \ f(v,q)=0}\n  \n\nwhere f is any arbitrary function, φ is the stored magnetic\
    \ flux and q is the stored charge"
- source_sentence: algorithm explanation
  sentences:
  - 'Descriptive statistics

    Average

    Mean

    Median

    Mode

    Measures of scale

    Variance

    Standard deviation

    Median absolute deviation

    Correlation

    Polychoric correlation

    Outlier

    Statistical graphics

    Histogram

    Frequency distribution

    Quantile

    Survival function

    Failure rate

    Scatter plot

    Bar chart'
  - 'The various fields and topics that projects engineers are involved with include:


    Work breakdown structure: a deliverable-oriented breakdown of a project into smaller
    components

    Gantt chart:  type of bar chart that illustrates a project schedule

    Critical Path Analysis: an algorithm for scheduling a set of project activities

    Program evaluation and review technique: a statistical tool which was designed
    to analyze and represent the tasks involved in completing a given project

    Graphical Evaluation and Review Technique: network analysis technique that allows
    probabilistic treatment both network logic and estimation of activity duration

    Petri Nets: one of several mathematical modeling languages for the description
    of distributed systems'
  - Jessiko was marketed as a luxury decoration for businesses such as hotels, restaurants,
    and museums Tiraby expressed hope that one day it would be common to find his
    invention in household ponds and swimming pools
- source_sentence: 'The firm was founded as SECOR Ltd in 1994 by John Leeson, Alan
    Sheppard, and David Richards After establishing the company in Oxford, United
    Kingdom, in 1994, David oversaw the growth of the business from a small UK operator
    into an environmental consultancies in the UK, with international operations across
    Africa, Australasia, Canada, Europe, and the US

    In 2000, the senior management team completed a management buyout and the company''s
    name was changed to SLR Consulting Limited In 2004 they secured funding from Livingbridge,
    who invested £4 85 million as part of a £13 million investment including other
    partners, and took a significant minority stake in the company In 2008, 3i invested
    £32 5 million in the firm, and replaced Livingbridge with a significant minority
    stake In March 2018, Charterhouse Capital Partners (CCP) acquired a majority shareholding
    in the business In June 2022 Charterhouse Capital Partners agreed to a sale of
    SLR Consulting to Ares Management private equity partners David Richards was Chief
    Executive Officer from 1994–2013 In line with the Group''s succession plans, Neil
    Penhall, formerly Managing Director of SLR Consulting and an Executive Director
    of SLR Management, assumed the role of CEO'
  sentences:
  - 'Institute for Transuranium Elements (ITU)

    Institute for the Protection and the Security of the Citizen (IPSC)

    Institute for Environment and Sustainability (IES)

    Institute for Health and Consumer Protection (IHCP)

    Institute for Energy (IE)

    Institute for Prospective Technological Studies (IPTS)'
  - Project NExT was founded by James (Jim) Leitzel (Ohio State University) and Chris
    Stevens (Saint Louis University) The first fellows were selected in 1994 Jim Leitzel
    died in 1998, and Aparna Higgins (University of Dayton) and Joe Gallian (University
    of Minnesota Duluth) became co-directors of Project NExT Chris Stevens stepped
    down as director in 2010, and was succeeded by Aparna Higgins and Joe Gallian
    Judith Covington (Louisiana State University, Shreveport) and Gavin LaRose (University
    of Michigan) first served as Associate Co-Directors and later became Co-Directors
    In 2007, the total number of fellows surpassed 1000 By 2017 the total number of
    fellows reached 1700 In 2023 Christine Kelley became director
  - Quantum secure communication is a method that is expected to be 'quantum safe'
    in the advent of quantum computing systems that could break current cryptography
    systems using methods such as Shor's algorithm These methods include quantum key
    distribution (QKD), a method of transmitting information using entangled light
    in a way that makes any interception of the transmission obvious to the user Another
    method is the quantum random number generator, which is capable of producing truly
    random numbers unlike non-quantum algorithms that merely imitate randomness
- source_sentence: chemical reaction
  sentences:
  - With suitably encoded scales (multitrack, vernier, digital code, or pseudo-random
    code) an encoder can determine its position without movement or needing to find
    a reference position Such absolute encoders also communicate using serial communication
    protocols Many of these protocols are proprietary (e g , Fanuc, Mitsubishi, FeeDat
    (Fagor Automation), Heidenhain EnDat, DriveCliq, Panasonic, Yaskawa) but open
    standards such as BiSS are now appearing, which avoid tying users to a particular
    supplier
  - Bonneau, Pierre; Allens, Gaspard d' (2020) Cent mille ans Bure ou le scandale
    enfoui des déchets nucléaires [One hundred thousand years Bure, or the buried
    scandal of nuclear waste] Illustrated by Cécile Guillard La Revue dessinée - Seuil
    ISBN 978-2-02-145982-1
  - The reason why MACE is heavily researched is that it allows completely anisotropic
    etching of silicon substrates which is not possible with other wet chemical etching
    methods (see figure to the right) Usually the silicon substrate is covered with
    a protective layer such as photoresist before it is immersed in an etching solution
    The etching solution usually has no preferred direction of attacking the substrate,
    therefore isotropic etching takes place In semiconductor engineering, however
    it is often required that the sidewalls of the etched trenches are steep This
    is usually realized with methods that operate in the gas-phase such as reactive
    ion etching These methods require expensive equipment compared to simple wet etching
    MACE, in principle allows the fabrication of steep trenches but is still cheap
    compared to gas-phase etching methods
- source_sentence: synthesis method
  sentences:
  - STEMNET used to receive funding from the Department for Education and Skills Since
    June 2007, it receives funding from the Department for Children, Schools and Families
    and Department for Innovation, Universities and Skills, since STEMNET sits on
    the chronological dividing point (age 16) of both of the new departments
  - The Arab States of the Persian Gulf plan to start their own joint civilian nuclear
    program An agreement in the final days of the Bush administration provided for
    cooperation between the United Arab Emirates and the United States of America
    in which the United States would sell the UAE nuclear reactors and nuclear fuel
    The UAE would, in return, renounce their right to enrich uranium for their civilian
    nuclear program At the time of signing, this agreement was touted as a way to
    reduce risks of nuclear proliferation in the Persian Gulf However, Mustafa Alani
    of the Dubai-based Gulf Research Center stated that, should the Nuclear Non-Proliferation
    Treaty collapse, nuclear reactors such as those slated to be sold to the UAE under
    this agreement could provide the UAE with a path toward a nuclear weapon, raising
    the specter of further nuclear proliferation In March 2007, foreign ministers
    of the six-member Gulf Cooperation Council met in Saudi Arabia to discuss progress
    in plans agreed in December 2006, for a joint civilian nuclear program
  - Timber framing dates back thousands of years, and has been used in many parts
    of the world during various periods such as ancient Japan, Europe and medieval
    England in localities where timber was in good supply and building stone and the
    skills to work it were not The use of timber framing in buildings provides their
    complete skeletal framing which offers some structural benefits as the timber
    frame, if properly engineered, lends itself to better seismic survivability
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---

# SentenceTransformer based on zacbrld/MNLP_M2_document_encoder

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [zacbrld/MNLP_M2_document_encoder](https://huggingface.co/zacbrld/MNLP_M2_document_encoder). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [zacbrld/MNLP_M2_document_encoder](https://huggingface.co/zacbrld/MNLP_M2_document_encoder) <!-- at revision 6f1d702dcb1d5e9fd30b691c84fadd9a1704a148 -->
- **Maximum Sequence Length:** 256 tokens
- **Output Dimensionality:** 384 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("zacbrld/MNLP_M3_document_encoder_V1")
# Run inference
sentences = [
    'synthesis method',
    'STEMNET used to receive funding from the Department for Education and Skills Since June 2007, it receives funding from the Department for Children, Schools and Families and Department for Innovation, Universities and Skills, since STEMNET sits on the chronological dividing point (age 16) of both of the new departments',
    'The Arab States of the Persian Gulf plan to start their own joint civilian nuclear program An agreement in the final days of the Bush administration provided for cooperation between the United Arab Emirates and the United States of America in which the United States would sell the UAE nuclear reactors and nuclear fuel The UAE would, in return, renounce their right to enrich uranium for their civilian nuclear program At the time of signing, this agreement was touted as a way to reduce risks of nuclear proliferation in the Persian Gulf However, Mustafa Alani of the Dubai-based Gulf Research Center stated that, should the Nuclear Non-Proliferation Treaty collapse, nuclear reactors such as those slated to be sold to the UAE under this agreement could provide the UAE with a path toward a nuclear weapon, raising the specter of further nuclear proliferation In March 2007, foreign ministers of the six-member Gulf Cooperation Council met in Saudi Arabia to discuss progress in plans agreed in December 2006, for a joint civilian nuclear program',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### Unnamed Dataset

* Size: 34,235 training samples
* Columns: <code>sentence_0</code> and <code>sentence_1</code>
* Approximate statistics based on the first 1000 samples:
  |         | sentence_0                                                                         | sentence_1                                                                           |
  |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
  | type    | string                                                                             | string                                                                               |
  | details | <ul><li>min: 3 tokens</li><li>mean: 21.24 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 34 tokens</li><li>mean: 133.62 tokens</li><li>max: 256 tokens</li></ul> |
* Samples:
  | sentence_0                        | sentence_1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
  |:----------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>chemistry experiment</code> | <code>Since 1982, research has been conducted to develop technologies, commonly referred to as electronic noses, that could detect and recognize odors and flavors Application areas include food, medicine and the environment</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
  | <code>quantum physics</code>      | <code>Hydro electric - Hydro-electric turbomachinery uses potential energy stored in water to flow over an open impeller to turn a generator which creates electricity<br>Steam turbines - Steam turbines used in power generation come in many different variations The overall principle is high pressure steam is forced over blades attached to a shaft, which turns a generator As the steam travels through the turbine, it passes through smaller blades causing the shaft to spin faster, creating more electricity Gas turbines - Gas turbines work much like steam turbines Air is forced in through a series of blades that turn a shaft Then fuel is mixed with the air and causes a combustion reaction, increasing the power This then causes the shaft to spin faster, creating more electricity Windmills - Also known as a wind turbine, windmills are increasing in popularity for their ability to efficiently use the wind to generate electricity Although they come in many shapes and sizes, the most common one is the la...</code> |
  | <code>physics law</code>          | <code>Backlash in gear couplings allows for slight angular misalignment There can be significant backlash in unsynchronized transmissions because of the intentional gap between the dogs in dog clutches The gap is necessary to engage dogs when input shaft (engine) speed and output shaft (driveshaft) speed are imperfectly synchronized If there was a smaller clearance, it would be nearly impossible to engage the gears because the dogs would interfere with each other in most configurations In synchronized transmissions, synchromesh solves this problem However, backlash is undesirable in precision positioning applications such as machine tool tables It can be minimized by choosing ball screws or leadscrews with preloaded nuts, and mounting them in preloaded bearings A preloaded bearing uses a spring and/or a second bearing to provide a compressive axial force that maintains bearing surfaces in contact despite reversal of the load direction</code>                                                                 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "cos_sim"
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `num_train_epochs`: 2
- `multi_dataset_batch_sampler`: round_robin

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: no
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 8
- `per_device_eval_batch_size`: 8
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1
- `num_train_epochs`: 2
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.0
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: round_robin

</details>

### Training Logs
| Epoch  | Step | Training Loss |
|:------:|:----:|:-------------:|
| 0.1168 | 500  | 1.465         |
| 0.2336 | 1000 | 1.189         |
| 0.3505 | 1500 | 1.1209        |
| 0.4673 | 2000 | 1.0333        |
| 0.5841 | 2500 | 0.993         |
| 0.7009 | 3000 | 0.9573        |
| 0.8178 | 3500 | 0.9275        |
| 0.9346 | 4000 | 0.9177        |
| 1.0514 | 4500 | 0.8241        |
| 1.1682 | 5000 | 0.7726        |
| 1.2850 | 5500 | 0.7685        |
| 1.4019 | 6000 | 0.7623        |
| 1.5187 | 6500 | 0.7668        |
| 1.6355 | 7000 | 0.7556        |
| 1.7523 | 7500 | 0.7002        |
| 1.8692 | 8000 | 0.7363        |
| 1.9860 | 8500 | 0.7396        |


### Framework Versions
- Python: 3.12.8
- Sentence Transformers: 3.4.1
- Transformers: 4.52.2
- PyTorch: 2.7.0+cu126
- Accelerate: 1.3.0
- Datasets: 3.2.0
- Tokenizers: 0.21.0

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->