File size: 34,030 Bytes
4f7bfe4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:6300
- loss:CachedMultipleNegativesRankingLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
- source_sentence: How can AnimateDiff, a motion adapter for pretrained diffusion
    models, be used to generate videos from images?
  sentences:
  - 'Performs a single real input radix-2 transformation on the provided data Kind:
    instance method ofP2FFT The input data array The output data array The output
    offset The input offset The step'
  - AnimateDiffis an adapter model that inserts a motion module into a pretrained
    diffusion model to animate an image. The adapter is trained on video clips to
    learn motion which is used to condition the generation process to create a video.
    It is faster and easier to only train the adapter and it can be loaded into most
    diffusion models, effectively turning them into “video models”. Start by loading
    aMotionAdapter. Then load a finetuned Stable Diffusion model with theAnimateDiffPipeline.
    Create a prompt and generate the video.
  - 'Utility class to handle streaming of tokens generated by whisper speech-to-text
    models. Callback functions are invoked when each of the following events occur:
    Kind: static class ofgeneration/streamers'
- source_sentence: How to configure DeepSpeed, including ZeRO-2 and bf16 precision,
    for optimal performance with Intel Gaudi HPUs?
  sentences:
  - 'The DeepSpeed configuration to use is passed through a JSON file and enables
    you to choose the optimizations to apply. Here is an example for applying ZeRO-2
    optimizations andbf16precision: The special value"auto"enables to automatically
    get the correct or most efficient value. You can also specify the values yourself
    but, if you do so, you should be careful not to have conflicting values with your
    training arguments. It is strongly advised to readthis sectionin the Transformers
    documentation to completely understand how this works. Other examples of configurations
    for HPUs are proposedhereby Intel. TheTransformers documentationexplains how to
    write a configuration from scratch very well. A more complete description of all
    configuration possibilities is availablehere.'
  - Creates a new instance of TokenizerModel. The configuration object for the TokenizerModel.
  - Most Spaces should run out of the box after a GPU upgrade, but sometimes you’ll
    need to install CUDA versions of the machine learning frameworks you use. Please,
    follow this guide to ensure your Space takes advantage of the improved hardware.
- source_sentence: Can DeBERTa's question-answering model be fine-tuned for improved
    information retrieval?
  sentences:
  - 'RegNetXis a convolutional network design space with simple, regular models with
    parameters: depthddd, initial widthw0>0w_{0} > 0w0​>0, and slopewa>0w_{a} > 0wa​>0,
    and generates a different block widthuju_{j}uj​for each blockj<dj < dj<d. The
    key restriction for the RegNet types of model is that there is a linear parameterisation
    of block widths (the design space only contains models with this linear structure):uj=w0+wa⋅ju_{j}
    = w_{0} + w_{a}\cdot{j}uj​=w0​+wa​⋅j ForRegNetXwe have additional restrictions:
    we setb=1b = 1b=1(the bottleneck ratio),12≤d≤2812 \leq d \leq 2812≤d≤28, andwm≥2w_{m}
    \geq 2wm​≥2(the width multiplier).'
  - 'DeBERTa Model with a span classification head on top for extractive question-answering
    tasks like SQuAD (a linear layers on top of the hidden-states output to computespan
    start logitsandspan end logits). Kind: static class ofmodels'
  - 'The minimum length of the sequence to be generated. Corresponds to the length
    of the input prompt +min_new_tokens. Its effect is overridden bymin_new_tokens,
    if also set. Kind: instance property ofGenerationConfigDefault:0'
- source_sentence: How can I efficiently upload models from supported libraries like
    Transformers to the Hugging Face Hub for improved information retrieval?
  sentences:
  - '🤗 Diffusers is compatible with Habana Gaudi through 🤗Optimum. Follow theinstallationguide
    to install the SynapseAI and Gaudi drivers, and then install Optimum Habana: To
    generate images with Stable Diffusion 1 and 2 on Gaudi, you need to instantiate
    two instances: When you initialize the pipeline, you have to specifyuse_habana=Trueto
    deploy it on HPUs and to get the fastest possible generation, you should enableHPU
    graphswithuse_hpu_graphs=True. Finally, specify aGaudiConfigwhich can be downloaded
    from theHabanaorganization on the Hub. Now you can call the pipeline to generate
    images by batches from one or several prompts: For more information, check out
    🤗 Optimum Habana’sdocumentationand theexampleprovided in the official GitHub repository.'
  - 'While training and evaluating we record the following reward metrics:'
  - 'First check if your model is from a library that has built-in support to push
    to/load from the Hub, like Transformers, Diffusers, Timm, Asteroid, etc.:https://huggingface.co/docs/hub/models-libraries.
    Below we’ll show how easy this is for a library like Transformers: Some libraries,
    like Transformers, support loadingcode from the Hub. This is a way to make your
    model work with Transformers using thetrust_remote_code=Trueflag. You may want
    to consider this option instead of a full-fledged library integration.'
- source_sentence: How can I use Shiny for Python to build and deploy a Hugging Face
    Space application?
  sentences:
  - Shiny for Pythonis a pure Python implementation of Shiny. This gives you access
    to all of the great features of Shiny like reactivity, complex layouts, and modules
    without needing to use R. Shiny for Python is ideal for Hugging Face applications
    because it integrates smoothly with other Hugging Face tools. To get started deploying
    a Space, click this button to select your hardware and specify if you want a public
    or private Space. The Space template will populate a few files to get your app
    started. app.py This file defines your app’s logic. To learn more about how to
    modify this file, seethe Shiny for Python documentation. As your app gets more
    complex, it’s a good idea to break your application logic up intomodules. Dockerfile
    The Dockerfile for a Shiny for Python app is very minimal because the library
    doesn’t have many system dependencies, but you may need to modify this file if
    your application has additional system dependencies. The one essential feature
    of this file is that it exposes and runs the app on the port specified in the
    space README file (which is 7860 by default). requirements.txt The Space will
    automatically install dependencies listed in the requirements.txt file. Note that
    you must include shiny in this file.
  - '(**kwargs) A context manager that will add each keyword argument passed toos.environand
    remove them when exiting. Will convert the values inkwargsto strings and upper-case
    all the keys. () A context manager that will temporarily clear environment variables.
    When this context exits, the previous environment variables will be back. (mixed_precision=
    ''no''save_location: str = ''/github/home/.cache/huggingface/accelerate/default_config.yaml''use_xpu:
    bool = False) Parameters Creates and saves a basic cluster config to be used on
    a local machine with potentially multiple GPUs. Will also set CPU if it is a CPU-only
    machine. When setting up 🤗 Accelerate for the first time, rather than runningaccelerate
    config[~utils.write_basic_config] can be used as an alternative for quick configuration.
    (local_process_index: intverbose: typing.Optional[bool] = None) Parameters Assigns
    the current process to a specific NUMA node. Ideally most efficient when having
    at least 2 cpus per node. This result is cached between calls. If you want to
    override it, please useaccelerate.utils.environment.override_numa_afifnity. (local_process_index:
    intverbose: typing.Optional[bool] = None) Parameters Overrides whatever NUMA affinity
    is set for the current process. This is very taxing and requires recalculating
    the affinity to set, ideally you should useutils.environment.set_numa_affinityinstead.
    (func_or_cls) Decorator to clean up accelerate environment variables set by the
    decorated class or function. In some circumstances, calling certain classes or
    functions can result in accelerate env vars being set and not being cleaned up
    afterwards. As an example, when calling: TrainingArguments(fp16=True, …) The following
    env var will be set: ACCELERATE_MIXED_PRECISION=fp16 This can affect subsequent
    code, since the env var takes precedence over TrainingArguments(fp16=False). This
    is especially relevant for unit testing, where we want to avoid the individual
    tests to have side effects on one another. Decorate the unit test function or
    whole class with this decorator to ensure that after each test, the env vars are
    cleaned up. This works for both unittest.TestCase and normal classes (pytest);
    it also works when decorating the parent class.'
  - 'Performs a real-valued forward FFT on the given input buffer and stores the result
    in the given output buffer. The input buffer must contain real values only, while
    the output buffer will contain complex values. The input and output buffers must
    be different. Kind: instance method ofP2FFTThrows: The output buffer. The input
    buffer containing real values.'
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---

# SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
- **Maximum Sequence Length:** 256 tokens
- **Output Dimensionality:** 384 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'How can I use Shiny for Python to build and deploy a Hugging Face Space application?',
    'Shiny for Pythonis a pure Python implementation of Shiny. This gives you access to all of the great features of Shiny like reactivity, complex layouts, and modules without needing to use R. Shiny for Python is ideal for Hugging Face applications because it integrates smoothly with other Hugging Face tools. To get started deploying a Space, click this button to select your hardware and specify if you want a public or private Space. The Space template will populate a few files to get your app started. app.py This file defines your app’s logic. To learn more about how to modify this file, seethe Shiny for Python documentation. As your app gets more complex, it’s a good idea to break your application logic up intomodules. Dockerfile The Dockerfile for a Shiny for Python app is very minimal because the library doesn’t have many system dependencies, but you may need to modify this file if your application has additional system dependencies. The one essential feature of this file is that it exposes and runs the app on the port specified in the space README file (which is 7860 by default). requirements.txt The Space will automatically install dependencies listed in the requirements.txt file. Note that you must include shiny in this file.',
    'Performs a real-valued forward FFT on the given input buffer and stores the result in the given output buffer. The input buffer must contain real values only, while the output buffer will contain complex values. The input and output buffers must be different. Kind: instance method ofP2FFTThrows: The output buffer. The input buffer containing real values.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### Unnamed Dataset

* Size: 6,300 training samples
* Columns: <code>anchor</code> and <code>positive</code>
* Approximate statistics based on the first 1000 samples:
  |         | anchor                                                                             | positive                                                                            |
  |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
  | type    | string                                                                             | string                                                                              |
  | details | <ul><li>min: 8 tokens</li><li>mean: 26.77 tokens</li><li>max: 189 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 116.82 tokens</li><li>max: 256 tokens</li></ul> |
* Samples:
  | anchor                                                                                                                                                 | positive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
  |:-------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>How can I configure the `TextEncoderOnnxConfig` class for optimal ONNX export of a text encoder model intended for information retrieval?</code> | <code>(config: PretrainedConfigtask: str = 'feature-extraction'preprocessors: typing.Optional[typing.List[typing.Any]] = Noneint_dtype: str = 'int64'float_dtype: str = 'fp32'legacy: bool = False) Handles encoder-based text architectures.</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
  | <code>How does PyTorch's shared tensor mechanism handle loading and saving, and what are its limitations?</code>                                       | <code>The design is rather simple. We’re going to look for all shared tensors, then looking for all tensors covering the entire buffer (there can be multiple such tensors). That gives us multiple names which can be saved, we simply choose the first one Duringload_model, we are loading a bit likeload_state_dictdoes, except we’re looking into the model itself, to check for shared buffers, and ignoring the “missed keys” which were actually covered by virtue of buffer sharing (they were properly loaded since there was a buffer that loaded under the hood). Every other error is raised as-is Caveat: This means we’re dropping some keys within the file. meaning if you’re checking for the keys saved on disk, you will see some “missing tensors” or if you’re usingload_state_dict. Unless we start supporting shared tensors directly in the format there’s no real way around it.</code> |
  | <code>How can I manage access tokens to secure my organization's resources?</code>                                                                     | <code>Tokens Management enables organization administrators to oversee access tokens within their organization, ensuring secure access to organization resources.</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
* Loss: [<code>CachedMultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "cos_sim",
      "mini_batch_size": 1024
  }
  ```

### Evaluation Dataset

#### Unnamed Dataset

* Size: 700 evaluation samples
* Columns: <code>anchor</code> and <code>positive</code>
* Approximate statistics based on the first 700 samples:
  |         | anchor                                                                            | positive                                                                            |
  |:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
  | type    | string                                                                            | string                                                                              |
  | details | <ul><li>min: 8 tokens</li><li>mean: 26.76 tokens</li><li>max: 67 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 115.51 tokens</li><li>max: 256 tokens</li></ul> |
* Samples:
  | anchor                                                                                                                                                                                     | positive                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
  |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>How can I configure a DecoderSequence object for optimal information retrieval using a list of decoders and a configuration object?</code>                                           | <code>Creates a new instance of DecoderSequence. The configuration object. The list of decoders to apply.</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
  | <code>How can the `generationlogits_process.NoBadWordsLogitsProcessor` static class be effectively integrated into a retrieval model to improve filtering of inappropriate content?</code> | <code>Kind: static class ofgeneration/logits_process</code>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
  | <code>How can I fine-tune the OpenVINO Sequence Classification model for improved information retrieval performance?</code>                                                                | <code>(model= Noneconfig= None**kwargs) Parameters OpenVINO Model with a SequenceClassifierOutput for sequence classification tasks. This model inherits fromoptimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving) (input_ids: typing.Union[torch.Tensor, numpy.ndarray]attention_mask: typing.Union[torch.Tensor, numpy.ndarray]token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None**kwargs) Parameters TheOVModelForSequenceClassificationforward method, overrides the__call__special method. Although the recipe for forward pass needs to be defined within this function, one should call theModuleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them. Example of sequence classification usingtransformers.pipeline:</code> |
* Loss: [<code>CachedMultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "cos_sim",
      "mini_batch_size": 1024
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `eval_strategy`: steps
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 32
- `learning_rate`: 2e-05
- `weight_decay`: 0.01
- `num_train_epochs`: 5
- `warmup_ratio`: 0.1
- `warmup_steps`: 50
- `fp16`: True
- `load_best_model_at_end`: True
- `batch_sampler`: no_duplicates

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: steps
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 32
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 2e-05
- `weight_decay`: 0.01
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 5
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 50
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: True
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: True
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: no_duplicates
- `multi_dataset_batch_sampler`: proportional

</details>

### Training Logs
| Epoch  | Step | Training Loss | Validation Loss |
|:------:|:----:|:-------------:|:---------------:|
| 0.5076 | 100  | 0.308         | -               |
| 1.0152 | 200  | 0.179         | -               |
| 1.5228 | 300  | 0.127         | 0.0739          |
| 2.0305 | 400  | 0.0828        | -               |
| 2.5381 | 500  | 0.0528        | -               |
| 3.0457 | 600  | 0.0576        | 0.0436          |
| 3.5533 | 700  | 0.0396        | -               |
| 1.0152 | 200  | 0.0262        | 0.0379          |
| 2.0305 | 400  | 0.0159        | 0.0360          |
| 3.0457 | 600  | 0.0082        | 0.0340          |


### Framework Versions
- Python: 3.10.12
- Sentence Transformers: 4.0.1
- Transformers: 4.47.0
- PyTorch: 2.5.1+cu121
- Accelerate: 1.2.1
- Datasets: 3.3.1
- Tokenizers: 0.21.0

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### CachedMultipleNegativesRankingLoss
```bibtex
@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->