File size: 34,030 Bytes
4f7bfe4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 | ---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:6300
- loss:CachedMultipleNegativesRankingLoss
base_model: sentence-transformers/all-MiniLM-L6-v2
widget:
- source_sentence: How can AnimateDiff, a motion adapter for pretrained diffusion
models, be used to generate videos from images?
sentences:
- 'Performs a single real input radix-2 transformation on the provided data Kind:
instance method ofP2FFT The input data array The output data array The output
offset The input offset The step'
- AnimateDiffis an adapter model that inserts a motion module into a pretrained
diffusion model to animate an image. The adapter is trained on video clips to
learn motion which is used to condition the generation process to create a video.
It is faster and easier to only train the adapter and it can be loaded into most
diffusion models, effectively turning them into “video models”. Start by loading
aMotionAdapter. Then load a finetuned Stable Diffusion model with theAnimateDiffPipeline.
Create a prompt and generate the video.
- 'Utility class to handle streaming of tokens generated by whisper speech-to-text
models. Callback functions are invoked when each of the following events occur:
Kind: static class ofgeneration/streamers'
- source_sentence: How to configure DeepSpeed, including ZeRO-2 and bf16 precision,
for optimal performance with Intel Gaudi HPUs?
sentences:
- 'The DeepSpeed configuration to use is passed through a JSON file and enables
you to choose the optimizations to apply. Here is an example for applying ZeRO-2
optimizations andbf16precision: The special value"auto"enables to automatically
get the correct or most efficient value. You can also specify the values yourself
but, if you do so, you should be careful not to have conflicting values with your
training arguments. It is strongly advised to readthis sectionin the Transformers
documentation to completely understand how this works. Other examples of configurations
for HPUs are proposedhereby Intel. TheTransformers documentationexplains how to
write a configuration from scratch very well. A more complete description of all
configuration possibilities is availablehere.'
- Creates a new instance of TokenizerModel. The configuration object for the TokenizerModel.
- Most Spaces should run out of the box after a GPU upgrade, but sometimes you’ll
need to install CUDA versions of the machine learning frameworks you use. Please,
follow this guide to ensure your Space takes advantage of the improved hardware.
- source_sentence: Can DeBERTa's question-answering model be fine-tuned for improved
information retrieval?
sentences:
- 'RegNetXis a convolutional network design space with simple, regular models with
parameters: depthddd, initial widthw0>0w_{0} > 0w0>0, and slopewa>0w_{a} > 0wa>0,
and generates a different block widthuju_{j}ujfor each blockj<dj < dj<d. The
key restriction for the RegNet types of model is that there is a linear parameterisation
of block widths (the design space only contains models with this linear structure):uj=w0+wa⋅ju_{j}
= w_{0} + w_{a}\cdot{j}uj=w0+wa⋅j ForRegNetXwe have additional restrictions:
we setb=1b = 1b=1(the bottleneck ratio),12≤d≤2812 \leq d \leq 2812≤d≤28, andwm≥2w_{m}
\geq 2wm≥2(the width multiplier).'
- 'DeBERTa Model with a span classification head on top for extractive question-answering
tasks like SQuAD (a linear layers on top of the hidden-states output to computespan
start logitsandspan end logits). Kind: static class ofmodels'
- 'The minimum length of the sequence to be generated. Corresponds to the length
of the input prompt +min_new_tokens. Its effect is overridden bymin_new_tokens,
if also set. Kind: instance property ofGenerationConfigDefault:0'
- source_sentence: How can I efficiently upload models from supported libraries like
Transformers to the Hugging Face Hub for improved information retrieval?
sentences:
- '🤗 Diffusers is compatible with Habana Gaudi through 🤗Optimum. Follow theinstallationguide
to install the SynapseAI and Gaudi drivers, and then install Optimum Habana: To
generate images with Stable Diffusion 1 and 2 on Gaudi, you need to instantiate
two instances: When you initialize the pipeline, you have to specifyuse_habana=Trueto
deploy it on HPUs and to get the fastest possible generation, you should enableHPU
graphswithuse_hpu_graphs=True. Finally, specify aGaudiConfigwhich can be downloaded
from theHabanaorganization on the Hub. Now you can call the pipeline to generate
images by batches from one or several prompts: For more information, check out
🤗 Optimum Habana’sdocumentationand theexampleprovided in the official GitHub repository.'
- 'While training and evaluating we record the following reward metrics:'
- 'First check if your model is from a library that has built-in support to push
to/load from the Hub, like Transformers, Diffusers, Timm, Asteroid, etc.:https://huggingface.co/docs/hub/models-libraries.
Below we’ll show how easy this is for a library like Transformers: Some libraries,
like Transformers, support loadingcode from the Hub. This is a way to make your
model work with Transformers using thetrust_remote_code=Trueflag. You may want
to consider this option instead of a full-fledged library integration.'
- source_sentence: How can I use Shiny for Python to build and deploy a Hugging Face
Space application?
sentences:
- Shiny for Pythonis a pure Python implementation of Shiny. This gives you access
to all of the great features of Shiny like reactivity, complex layouts, and modules
without needing to use R. Shiny for Python is ideal for Hugging Face applications
because it integrates smoothly with other Hugging Face tools. To get started deploying
a Space, click this button to select your hardware and specify if you want a public
or private Space. The Space template will populate a few files to get your app
started. app.py This file defines your app’s logic. To learn more about how to
modify this file, seethe Shiny for Python documentation. As your app gets more
complex, it’s a good idea to break your application logic up intomodules. Dockerfile
The Dockerfile for a Shiny for Python app is very minimal because the library
doesn’t have many system dependencies, but you may need to modify this file if
your application has additional system dependencies. The one essential feature
of this file is that it exposes and runs the app on the port specified in the
space README file (which is 7860 by default). requirements.txt The Space will
automatically install dependencies listed in the requirements.txt file. Note that
you must include shiny in this file.
- '(**kwargs) A context manager that will add each keyword argument passed toos.environand
remove them when exiting. Will convert the values inkwargsto strings and upper-case
all the keys. () A context manager that will temporarily clear environment variables.
When this context exits, the previous environment variables will be back. (mixed_precision=
''no''save_location: str = ''/github/home/.cache/huggingface/accelerate/default_config.yaml''use_xpu:
bool = False) Parameters Creates and saves a basic cluster config to be used on
a local machine with potentially multiple GPUs. Will also set CPU if it is a CPU-only
machine. When setting up 🤗 Accelerate for the first time, rather than runningaccelerate
config[~utils.write_basic_config] can be used as an alternative for quick configuration.
(local_process_index: intverbose: typing.Optional[bool] = None) Parameters Assigns
the current process to a specific NUMA node. Ideally most efficient when having
at least 2 cpus per node. This result is cached between calls. If you want to
override it, please useaccelerate.utils.environment.override_numa_afifnity. (local_process_index:
intverbose: typing.Optional[bool] = None) Parameters Overrides whatever NUMA affinity
is set for the current process. This is very taxing and requires recalculating
the affinity to set, ideally you should useutils.environment.set_numa_affinityinstead.
(func_or_cls) Decorator to clean up accelerate environment variables set by the
decorated class or function. In some circumstances, calling certain classes or
functions can result in accelerate env vars being set and not being cleaned up
afterwards. As an example, when calling: TrainingArguments(fp16=True, …) The following
env var will be set: ACCELERATE_MIXED_PRECISION=fp16 This can affect subsequent
code, since the env var takes precedence over TrainingArguments(fp16=False). This
is especially relevant for unit testing, where we want to avoid the individual
tests to have side effects on one another. Decorate the unit test function or
whole class with this decorator to ensure that after each test, the env vars are
cleaned up. This works for both unittest.TestCase and normal classes (pytest);
it also works when decorating the parent class.'
- 'Performs a real-valued forward FFT on the given input buffer and stores the result
in the given output buffer. The input buffer must contain real values only, while
the output buffer will contain complex values. The input and output buffers must
be different. Kind: instance method ofP2FFTThrows: The output buffer. The input
buffer containing real values.'
pipeline_tag: sentence-similarity
library_name: sentence-transformers
---
# SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
- **Maximum Sequence Length:** 256 tokens
- **Output Dimensionality:** 384 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'How can I use Shiny for Python to build and deploy a Hugging Face Space application?',
'Shiny for Pythonis a pure Python implementation of Shiny. This gives you access to all of the great features of Shiny like reactivity, complex layouts, and modules without needing to use R. Shiny for Python is ideal for Hugging Face applications because it integrates smoothly with other Hugging Face tools. To get started deploying a Space, click this button to select your hardware and specify if you want a public or private Space. The Space template will populate a few files to get your app started. app.py This file defines your app’s logic. To learn more about how to modify this file, seethe Shiny for Python documentation. As your app gets more complex, it’s a good idea to break your application logic up intomodules. Dockerfile The Dockerfile for a Shiny for Python app is very minimal because the library doesn’t have many system dependencies, but you may need to modify this file if your application has additional system dependencies. The one essential feature of this file is that it exposes and runs the app on the port specified in the space README file (which is 7860 by default). requirements.txt The Space will automatically install dependencies listed in the requirements.txt file. Note that you must include shiny in this file.',
'Performs a real-valued forward FFT on the given input buffer and stores the result in the given output buffer. The input buffer must contain real values only, while the output buffer will contain complex values. The input and output buffers must be different. Kind: instance method ofP2FFTThrows: The output buffer. The input buffer containing real values.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
<!--
### Direct Usage (Transformers)
<details><summary>Click to see the direct usage in Transformers</summary>
</details>
-->
<!--
### Downstream Usage (Sentence Transformers)
You can finetune this model on your own dataset.
<details><summary>Click to expand</summary>
</details>
-->
<!--
### Out-of-Scope Use
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->
<!--
## Bias, Risks and Limitations
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->
<!--
### Recommendations
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 6,300 training samples
* Columns: <code>anchor</code> and <code>positive</code>
* Approximate statistics based on the first 1000 samples:
| | anchor | positive |
|:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
| type | string | string |
| details | <ul><li>min: 8 tokens</li><li>mean: 26.77 tokens</li><li>max: 189 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 116.82 tokens</li><li>max: 256 tokens</li></ul> |
* Samples:
| anchor | positive |
|:-------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <code>How can I configure the `TextEncoderOnnxConfig` class for optimal ONNX export of a text encoder model intended for information retrieval?</code> | <code>(config: PretrainedConfigtask: str = 'feature-extraction'preprocessors: typing.Optional[typing.List[typing.Any]] = Noneint_dtype: str = 'int64'float_dtype: str = 'fp32'legacy: bool = False) Handles encoder-based text architectures.</code> |
| <code>How does PyTorch's shared tensor mechanism handle loading and saving, and what are its limitations?</code> | <code>The design is rather simple. We’re going to look for all shared tensors, then looking for all tensors covering the entire buffer (there can be multiple such tensors). That gives us multiple names which can be saved, we simply choose the first one Duringload_model, we are loading a bit likeload_state_dictdoes, except we’re looking into the model itself, to check for shared buffers, and ignoring the “missed keys” which were actually covered by virtue of buffer sharing (they were properly loaded since there was a buffer that loaded under the hood). Every other error is raised as-is Caveat: This means we’re dropping some keys within the file. meaning if you’re checking for the keys saved on disk, you will see some “missing tensors” or if you’re usingload_state_dict. Unless we start supporting shared tensors directly in the format there’s no real way around it.</code> |
| <code>How can I manage access tokens to secure my organization's resources?</code> | <code>Tokens Management enables organization administrators to oversee access tokens within their organization, ensuring secure access to organization resources.</code> |
* Loss: [<code>CachedMultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 1024
}
```
### Evaluation Dataset
#### Unnamed Dataset
* Size: 700 evaluation samples
* Columns: <code>anchor</code> and <code>positive</code>
* Approximate statistics based on the first 700 samples:
| | anchor | positive |
|:--------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
| type | string | string |
| details | <ul><li>min: 8 tokens</li><li>mean: 26.76 tokens</li><li>max: 67 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 115.51 tokens</li><li>max: 256 tokens</li></ul> |
* Samples:
| anchor | positive |
|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <code>How can I configure a DecoderSequence object for optimal information retrieval using a list of decoders and a configuration object?</code> | <code>Creates a new instance of DecoderSequence. The configuration object. The list of decoders to apply.</code> |
| <code>How can the `generationlogits_process.NoBadWordsLogitsProcessor` static class be effectively integrated into a retrieval model to improve filtering of inappropriate content?</code> | <code>Kind: static class ofgeneration/logits_process</code> |
| <code>How can I fine-tune the OpenVINO Sequence Classification model for improved information retrieval performance?</code> | <code>(model= Noneconfig= None**kwargs) Parameters OpenVINO Model with a SequenceClassifierOutput for sequence classification tasks. This model inherits fromoptimum.intel.openvino.modeling.OVBaseModel. Check the superclass documentation for the generic methods the library implements for all its model (such as downloading or saving) (input_ids: typing.Union[torch.Tensor, numpy.ndarray]attention_mask: typing.Union[torch.Tensor, numpy.ndarray]token_type_ids: typing.Union[torch.Tensor, numpy.ndarray, NoneType] = None**kwargs) Parameters TheOVModelForSequenceClassificationforward method, overrides the__call__special method. Although the recipe for forward pass needs to be defined within this function, one should call theModuleinstance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them. Example of sequence classification usingtransformers.pipeline:</code> |
* Loss: [<code>CachedMultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) with these parameters:
```json
{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 1024
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 32
- `learning_rate`: 2e-05
- `weight_decay`: 0.01
- `num_train_epochs`: 5
- `warmup_ratio`: 0.1
- `warmup_steps`: 50
- `fp16`: True
- `load_best_model_at_end`: True
- `batch_sampler`: no_duplicates
#### All Hyperparameters
<details><summary>Click to expand</summary>
- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: steps
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 32
- `per_device_eval_batch_size`: 32
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 2e-05
- `weight_decay`: 0.01
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 5
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 50
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: True
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: True
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`:
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: no_duplicates
- `multi_dataset_batch_sampler`: proportional
</details>
### Training Logs
| Epoch | Step | Training Loss | Validation Loss |
|:------:|:----:|:-------------:|:---------------:|
| 0.5076 | 100 | 0.308 | - |
| 1.0152 | 200 | 0.179 | - |
| 1.5228 | 300 | 0.127 | 0.0739 |
| 2.0305 | 400 | 0.0828 | - |
| 2.5381 | 500 | 0.0528 | - |
| 3.0457 | 600 | 0.0576 | 0.0436 |
| 3.5533 | 700 | 0.0396 | - |
| 1.0152 | 200 | 0.0262 | 0.0379 |
| 2.0305 | 400 | 0.0159 | 0.0360 |
| 3.0457 | 600 | 0.0082 | 0.0340 |
### Framework Versions
- Python: 3.10.12
- Sentence Transformers: 4.0.1
- Transformers: 4.47.0
- PyTorch: 2.5.1+cu121
- Accelerate: 1.2.1
- Datasets: 3.3.1
- Tokenizers: 0.21.0
## Citation
### BibTeX
#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
```
#### CachedMultipleNegativesRankingLoss
```bibtex
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
<!--
## Glossary
*Clearly define terms in order to be accessible across audiences.*
-->
<!--
## Model Card Authors
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->
<!--
## Model Card Contact
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
--> |