SentenceTransformer

This model was finetuned with Unsloth.

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 2560-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 2560 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'PeftModelForFeatureExtraction'})
  (1): Pooling({'word_embedding_dimension': 2560, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': True, 'include_prompt': True})
  (2): Normalize()
)

Evaluation Highlights

Pre-Post Train Relevancy

snapd-embedder-v1_query_7 snapd-embedder-v1_query_8 snapd-embedder-v1_query_9 snapd-embedder-v1_query_10

Pre/Post Train Spread

snapd-embedder-v1_spread_query_7 snapd-embedder-v1_spread_query_8 snapd-embedder-v1_spread_query_9 snapd-embedder-v1_spread_query_10

Spread Summary

snapd-embedder-v1_spread_summary

Training Summary

snapd-embedder-v1_stats

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'craniofacial',
    'head and face structure',
    'Anchor-positive pairs are fundamental to contrastive learning, serving to define what the model should consider as semantically similar data points, guiding it to learn meaningful representations.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 2560]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7268, 0.0036],
#         [0.7268, 1.0000, 0.0179],
#         [0.0036, 0.0179, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 223,748 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 2 tokens
    • mean: 8.95 tokens
    • max: 33 tokens
    • min: 2 tokens
    • mean: 38.48 tokens
    • max: 124 tokens
  • Samples:
    anchor positive
    groupthink Psychological tendency for group conformity
    customs and border protection DHS component enforcing trade and immigration laws
    What is the meaning and purpose of the //go:noescape directive in Go functions? The //go:noescape comment is a hint to the Go compiler. It asserts that none of the pointer parameters of the decorated function will escape the function's stack frame. This is primarily used for performance tuning in low-level code, ensuring that objects pointed to by function arguments are not allocated on the heap, thus avoiding garbage collection cycles.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • gradient_accumulation_steps: 8
  • learning_rate: 3e-05
  • num_train_epochs: 1
  • lr_scheduler_type: constant_with_warmup
  • warmup_ratio: 0.03
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 8
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.03
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss
0.0023 1 0.5184
0.0046 2 0.5683
0.0069 3 0.5821
0.0092 4 0.4948
0.0114 5 0.4001
0.0137 6 0.3097
0.0160 7 0.257
0.0183 8 0.2752
0.0206 9 0.2311
0.0229 10 0.1433
0.0252 11 0.2507
0.0275 12 0.1944
0.0297 13 0.2052
0.0320 14 0.1044
0.0343 15 0.2027
0.0366 16 0.1969
0.0389 17 0.1833
0.0412 18 0.1641
0.0435 19 0.1629
0.0458 20 0.1702
0.0480 21 0.1855
0.0503 22 0.1697
0.0526 23 0.116
0.0549 24 0.1373
0.0572 25 0.1323
0.0595 26 0.1349
0.0618 27 0.1199
0.0641 28 0.1353
0.0663 29 0.143
0.0686 30 0.1305
0.0709 31 0.1088
0.0732 32 0.0908
0.0755 33 0.1502
0.0778 34 0.1139
0.0801 35 0.1311
0.0824 36 0.1291
0.0846 37 0.0977
0.0869 38 0.0962
0.0892 39 0.1166
0.0915 40 0.0965
0.0938 41 0.1242
0.0961 42 0.0705
0.0984 43 0.0813
0.1007 44 0.1545
0.1029 45 0.0868
0.1052 46 0.0987
0.1075 47 0.0938
0.1098 48 0.1086
0.1121 49 0.0982
0.1144 50 0.0817
0.1167 51 0.0527
0.1190 52 0.0986
0.1212 53 0.098
0.1235 54 0.1074
0.1258 55 0.1396
0.1281 56 0.1101
0.1304 57 0.0829
0.1327 58 0.1261
0.1350 59 0.048
0.1373 60 0.1215
0.1395 61 0.0981
0.1418 62 0.0739
0.1441 63 0.0525
0.1464 64 0.0757
0.1487 65 0.0543
0.1510 66 0.0878
0.1533 67 0.0791
0.1556 68 0.0816
0.1578 69 0.0999
0.1601 70 0.086
0.1624 71 0.0775
0.1647 72 0.1048
0.1670 73 0.0552
0.1693 74 0.0619
0.1716 75 0.0667
0.1739 76 0.0787
0.1762 77 0.1022
0.1784 78 0.0937
0.1807 79 0.0751
0.1830 80 0.0642
0.1853 81 0.0508
0.1876 82 0.1169
0.1899 83 0.09
0.1922 84 0.0725
0.1945 85 0.0476
0.1967 86 0.0737
0.1990 87 0.0968
0.2013 88 0.0988
0.2036 89 0.0575
0.2059 90 0.0629
0.2082 91 0.0627
0.2105 92 0.0565
0.2128 93 0.0696
0.2150 94 0.0413
0.2173 95 0.0625
0.2196 96 0.0593
0.2219 97 0.0511
0.2242 98 0.1168
0.2265 99 0.0601
0.2288 100 0.0919
0.2311 101 0.0471
0.2333 102 0.0701
0.2356 103 0.1032
0.2379 104 0.0823
0.2402 105 0.0825
0.2425 106 0.0626
0.2448 107 0.0821
0.2471 108 0.0532
0.2494 109 0.1171
0.2516 110 0.0814
0.2539 111 0.1167
0.2562 112 0.0918
0.2585 113 0.0704
0.2608 114 0.0726
0.2631 115 0.0522
0.2654 116 0.0628
0.2677 117 0.0716
0.2699 118 0.0676
0.2722 119 0.0616
0.2745 120 0.0505
0.2768 121 0.0653
0.2791 122 0.051
0.2814 123 0.0888
0.2837 124 0.1061
0.2860 125 0.104
0.2882 126 0.095
0.2905 127 0.0715
0.2928 128 0.0766
0.2951 129 0.076
0.2974 130 0.1154
0.2997 131 0.0463
0.3020 132 0.0596
0.3043 133 0.0705
0.3065 134 0.0654
0.3088 135 0.0802
0.3111 136 0.0882
0.3134 137 0.0872
0.3157 138 0.0853
0.3180 139 0.0661
0.3203 140 0.0633
0.3226 141 0.0784
0.3248 142 0.0832
0.3271 143 0.0799
0.3294 144 0.0954
0.3317 145 0.0744
0.3340 146 0.0559
0.3363 147 0.0892
0.3386 148 0.0424
0.3409 149 0.0742
0.3432 150 0.1025
0.3454 151 0.0814
0.3477 152 0.051
0.3500 153 0.1313
0.3523 154 0.0645
0.3546 155 0.1006
0.3569 156 0.0524
0.3592 157 0.0635
0.3615 158 0.0467
0.3637 159 0.0741
0.3660 160 0.0593
0.3683 161 0.0698
0.3706 162 0.0835
0.3729 163 0.0715
0.3752 164 0.0628
0.3775 165 0.0772
0.3798 166 0.1167
0.3820 167 0.0981
0.3843 168 0.0595
0.3866 169 0.041
0.3889 170 0.0728
0.3912 171 0.0937
0.3935 172 0.0757
0.3958 173 0.0603
0.3981 174 0.0542
0.4003 175 0.0701
0.4026 176 0.0372
0.4049 177 0.125
0.4072 178 0.0545
0.4095 179 0.0476
0.4118 180 0.0516
0.4141 181 0.1243
0.4164 182 0.0599
0.4186 183 0.1026
0.4209 184 0.077
0.4232 185 0.0732
0.4255 186 0.0798
0.4278 187 0.0538
0.4301 188 0.0679
0.4324 189 0.0759
0.4347 190 0.0761
0.4369 191 0.0557
0.4392 192 0.0534
0.4415 193 0.0747
0.4438 194 0.0672
0.4461 195 0.0376
0.4484 196 0.0466
0.4507 197 0.0783
0.4530 198 0.0864
0.4552 199 0.0423
0.4575 200 0.0708
0.4598 201 0.0429
0.4621 202 0.0718
0.4644 203 0.0802
0.4667 204 0.073
0.4690 205 0.0628
0.4713 206 0.055
0.4735 207 0.0468
0.4758 208 0.0536
0.4781 209 0.0429
0.4804 210 0.0388
0.4827 211 0.0962
0.4850 212 0.0475
0.4873 213 0.0589
0.4896 214 0.0606
0.4919 215 0.0512
0.4941 216 0.0836
0.4964 217 0.0659
0.4987 218 0.0924
0.5010 219 0.0711
0.5033 220 0.0676
0.5056 221 0.0393
0.5079 222 0.0668
0.5102 223 0.0511
0.5124 224 0.0575
0.5147 225 0.0594
0.5170 226 0.126
0.5193 227 0.0787
0.5216 228 0.0509
0.5239 229 0.0684
0.5262 230 0.0792
0.5285 231 0.0501
0.5307 232 0.0988
0.5330 233 0.0414
0.5353 234 0.0596
0.5376 235 0.0607
0.5399 236 0.0556
0.5422 237 0.0578
0.5445 238 0.0238
0.5468 239 0.0509
0.5490 240 0.0431
0.5513 241 0.0377
0.5536 242 0.0814
0.5559 243 0.0779
0.5582 244 0.0574
0.5605 245 0.0681
0.5628 246 0.0513
0.5651 247 0.0573
0.5673 248 0.0758
0.5696 249 0.0442
0.5719 250 0.0458
0.5742 251 0.0853
0.5765 252 0.0825
0.5788 253 0.065
0.5811 254 0.0429
0.5834 255 0.0438
0.5856 256 0.1028
0.5879 257 0.04
0.5902 258 0.0406
0.5925 259 0.0465
0.5948 260 0.068
0.5971 261 0.0532
0.5994 262 0.0503
0.6017 263 0.0421
0.6039 264 0.0663
0.6062 265 0.0621
0.6085 266 0.0845
0.6108 267 0.049
0.6131 268 0.0503
0.6154 269 0.0392
0.6177 270 0.0505
0.6200 271 0.0594
0.6222 272 0.0573
0.6245 273 0.0383
0.6268 274 0.0568
0.6291 275 0.0386
0.6314 276 0.0573
0.6337 277 0.0397
0.6360 278 0.0459
0.6383 279 0.0624
0.6405 280 0.0706
0.6428 281 0.0743
0.6451 282 0.0405
0.6474 283 0.0761
0.6497 284 0.0583
0.6520 285 0.0444
0.6543 286 0.0305
0.6566 287 0.0716
0.6589 288 0.041
0.6611 289 0.043
0.6634 290 0.0574
0.6657 291 0.0479
0.6680 292 0.062
0.6703 293 0.0441
0.6726 294 0.0657
0.6749 295 0.0515
0.6772 296 0.0718
0.6794 297 0.0839
0.6817 298 0.0751
0.6840 299 0.073
0.6863 300 0.0656
0.6886 301 0.0717
0.6909 302 0.0457
0.6932 303 0.0761
0.6955 304 0.0557
0.6977 305 0.0646
0.7000 306 0.0688
0.7023 307 0.0396
0.7046 308 0.0444
0.7069 309 0.0627
0.7092 310 0.0594
0.7115 311 0.0496
0.7138 312 0.0406
0.7160 313 0.0513
0.7183 314 0.0483
0.7206 315 0.0527
0.7229 316 0.0646
0.7252 317 0.0351
0.7275 318 0.0432
0.7298 319 0.06
0.7321 320 0.0487
0.7343 321 0.0398
0.7366 322 0.0279
0.7389 323 0.0594
0.7412 324 0.0808
0.7435 325 0.0461
0.7458 326 0.0452
0.7481 327 0.0887
0.7504 328 0.057
0.7526 329 0.082
0.7549 330 0.0693
0.7572 331 0.0245
0.7595 332 0.0476
0.7618 333 0.051
0.7641 334 0.0539
0.7664 335 0.0325
0.7687 336 0.0431
0.7709 337 0.0534
0.7732 338 0.0346
0.7755 339 0.0577
0.7778 340 0.086
0.7801 341 0.0705
0.7824 342 0.0412
0.7847 343 0.0426
0.7870 344 0.0829
0.7892 345 0.0767
0.7915 346 0.0702
0.7938 347 0.0662
0.7961 348 0.0436
0.7984 349 0.0292
0.8007 350 0.0586
0.8030 351 0.0416
0.8053 352 0.0874
0.8075 353 0.0378
0.8098 354 0.036
0.8121 355 0.0426
0.8144 356 0.0375
0.8167 357 0.0296
0.8190 358 0.0535
0.8213 359 0.0654
0.8236 360 0.0756
0.8259 361 0.0591
0.8281 362 0.0603
0.8304 363 0.0664
0.8327 364 0.0403
0.8350 365 0.0418
0.8373 366 0.047
0.8396 367 0.077
0.8419 368 0.0597
0.8442 369 0.0683
0.8464 370 0.0557
0.8487 371 0.0487
0.8510 372 0.0499
0.8533 373 0.0328
0.8556 374 0.0211
0.8579 375 0.0411
0.8602 376 0.0648
0.8625 377 0.0583
0.8647 378 0.0483
0.8670 379 0.0362
0.8693 380 0.0616
0.8716 381 0.0634
0.8739 382 0.0542
0.8762 383 0.053
0.8785 384 0.0436
0.8808 385 0.0426
0.8830 386 0.0503
0.8853 387 0.0522
0.8876 388 0.083
0.8899 389 0.0317
0.8922 390 0.0571
0.8945 391 0.0464
0.8968 392 0.0179
0.8991 393 0.0389
0.9013 394 0.0317
0.9036 395 0.0605
0.9059 396 0.0389
0.9082 397 0.0407
0.9105 398 0.0478
0.9128 399 0.0304
0.9151 400 0.0572
0.9174 401 0.037
0.9196 402 0.062
0.9219 403 0.0539
0.9242 404 0.039
0.9265 405 0.0265
0.9288 406 0.0398
0.9311 407 0.0369
0.9334 408 0.053
0.9357 409 0.0503
0.9379 410 0.0535
0.9402 411 0.0645
0.9425 412 0.0328
0.9448 413 0.0438
0.9471 414 0.0435
0.9494 415 0.1018
0.9517 416 0.0403
0.9540 417 0.0577
0.9562 418 0.0234
0.9585 419 0.041
0.9608 420 0.0226
0.9631 421 0.0497
0.9654 422 0.0493
0.9677 423 0.0223
0.9700 424 0.0192
0.9723 425 0.0322
0.9745 426 0.0483
0.9768 427 0.041
0.9791 428 0.0628
0.9814 429 0.0861
0.9837 430 0.0645
0.9860 431 0.0386
0.9883 432 0.0378
0.9906 433 0.0613
0.9929 434 0.067
0.9951 435 0.049
0.9974 436 0.0644
0.9997 437 0.02
1.0 438 0.0001

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 5.3.0
  • Transformers: 4.56.2
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}
Downloads last month
3,686
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Rnfudge/snapd-embedder-v1

Finetuned
(4)
this model

Papers for Rnfudge/snapd-embedder-v1