SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("AryehRotberg/ToS-Sentence-Transformers-V4")
# Run inference
sentences = [
    'The Services may contain links or connections to third party websites or services that are not owned or controlled by Guilded. When you access third party websites or use third party services, you accept that there are risks in doing so, and that Guilded is not responsible for such risks.',
    'This service assumes no responsibility and liability for the contents of links to other websites',
    'Copyright license limited for the purposes of that same service but transferable and sublicenseable',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.6397, -0.0500],
#         [ 0.6397,  1.0000,  0.0874],
#         [-0.0500,  0.0874,  1.0000]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9993

Training Details

Training Dataset

Unnamed Dataset

  • Size: 122,856 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 3 tokens
    • mean: 48.49 tokens
    • max: 256 tokens
    • min: 6 tokens
    • mean: 15.21 tokens
    • max: 29 tokens
    • min: 6 tokens
    • mean: 14.34 tokens
    • max: 29 tokens
  • Samples:
    anchor positive negative
    If you ever decide to stop using Snapchat, you can just ask us to delete your account. You have the right to leave this service at any time Your personal information is used for many different purposes
    you forever waive and agree not to claim or assert any entitlement to any and all moral rights of an author in any of the User Content. You waive your moral rights You aren’t allowed to remove or edit user-generated content
    You agree and shall indemnify and hold Dailymotion- harmless from and against any liability, loss, damages (including punitive damages), claim, settlement payment, cost and expense, interest, award, judgment, diminution in value, fine, fee (including reasonable attorneys’ fees), and penalty, or other charge (including reasonable attorneys’ fees and all other cost of investigating, defending or asserting any claim for indemnification under these Terms) arising from or relating to (i) Your Content, (ii) Your violation of the Terms or any other policy of Dailymotion. (iii) Your use of the Dailymotion Service. and (iv) Your violation of any third party rights, including without limitation any copyright, property, publicity or privacy rights. You agree to defend, indemnify, and hold the service harmless in case of a claim related to your use of the service User-generated content can be blocked or censored for any reason
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 30,714 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 4 tokens
    • mean: 49.34 tokens
    • max: 256 tokens
    • min: 6 tokens
    • mean: 15.13 tokens
    • max: 29 tokens
    • min: 6 tokens
    • mean: 14.28 tokens
    • max: 29 tokens
  • Samples:
    anchor positive negative
    YOU AGREE THAT USE OF THE WEB SITE AND THE SERVICES IS AT YOUR SOLE RISK. The service is provided 'as is' and to be used at your sole risk The court of law governing the terms is in a jurisdiction that is friendlier to user privacy protection.
    If you continue to use our services after the changes have taken effect, it means that you agree to the changes. Terms may be changed at any time The service is only available in some countries approved by its government
    We may revise these Terms of Use or any of the other Terms from time to time. You are ,expected to check this page and our Terms from time to time to take notice of any changes Terms may be changed at any time Voice data is collected and shared with third-parties
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss all-nli-dev_cosine_accuracy
-1 -1 - - 0.9426
0.0130 100 1.4227 1.1709 0.9595
0.0260 200 1.1178 0.9104 0.9727
0.0391 300 0.9473 0.7546 0.9799
0.0521 400 0.7559 0.6471 0.9853
0.0651 500 0.6617 0.5684 0.9880
0.0781 600 0.5857 0.5047 0.9899
0.0912 700 0.5768 0.4578 0.9910
0.1042 800 0.493 0.4281 0.9921
0.1172 900 0.4877 0.3899 0.9931
0.1302 1000 0.4315 0.3593 0.9939
0.1432 1100 0.3894 0.3458 0.9940
0.1563 1200 0.3681 0.3215 0.9945
0.1693 1300 0.3533 0.3151 0.9951
0.1823 1400 0.3242 0.3093 0.9949
0.1953 1500 0.346 0.2820 0.9955
0.2084 1600 0.3212 0.2637 0.9960
0.2214 1700 0.2889 0.2601 0.9960
0.2344 1800 0.2855 0.2423 0.9960
0.2474 1900 0.2621 0.2396 0.9964
0.2605 2000 0.265 0.2299 0.9968
0.2735 2100 0.2401 0.2191 0.9969
0.2865 2200 0.254 0.2166 0.9966
0.2995 2300 0.2543 0.2036 0.9971
0.3125 2400 0.2667 0.1958 0.9973
0.3256 2500 0.2236 0.1937 0.9972
0.3386 2600 0.232 0.1875 0.9974
0.3516 2700 0.2021 0.1806 0.9977
0.3646 2800 0.2147 0.1787 0.9974
0.3777 2900 0.1929 0.1727 0.9975
0.3907 3000 0.1778 0.1721 0.9977
0.4037 3100 0.2031 0.1678 0.9974
0.4167 3200 0.1784 0.1645 0.9978
0.4297 3300 0.183 0.1593 0.9977
0.4428 3400 0.1878 0.1508 0.9979
0.4558 3500 0.1915 0.1478 0.9980
0.4688 3600 0.1611 0.1448 0.9983
0.4818 3700 0.1606 0.1385 0.9983
0.4949 3800 0.1604 0.1408 0.9984
0.5079 3900 0.1733 0.1327 0.9983
0.5209 4000 0.159 0.1277 0.9986
0.5339 4100 0.1554 0.1255 0.9987
0.5469 4200 0.1546 0.1225 0.9985
0.5600 4300 0.1536 0.1222 0.9984
0.5730 4400 0.1253 0.1174 0.9987
0.5860 4500 0.151 0.1137 0.9986
0.5990 4600 0.1293 0.1116 0.9988
0.6121 4700 0.1272 0.1093 0.9986
0.6251 4800 0.1326 0.1074 0.9985
0.6381 4900 0.135 0.1044 0.9987
0.6511 5000 0.1253 0.1013 0.9989
0.6641 5100 0.1466 0.0995 0.9989
0.6772 5200 0.1378 0.0993 0.9991
0.6902 5300 0.1245 0.0959 0.9989
0.7032 5400 0.1124 0.0946 0.9989
0.7162 5500 0.0937 0.0926 0.9988
0.7293 5600 0.1378 0.0907 0.9990
0.7423 5700 0.1234 0.0889 0.9991
0.7553 5800 0.1153 0.0876 0.9991
0.7683 5900 0.1172 0.0865 0.9990
0.7814 6000 0.1135 0.0855 0.9992
0.7944 6100 0.1178 0.0834 0.9991
0.8074 6200 0.1195 0.0812 0.9991
0.8204 6300 0.1068 0.0795 0.9991
0.8334 6400 0.0824 0.0791 0.9992
0.8465 6500 0.1173 0.0768 0.9992
0.8595 6600 0.1166 0.0757 0.9992
0.8725 6700 0.1119 0.0755 0.9992
0.8855 6800 0.1017 0.0750 0.9993
0.8986 6900 0.1148 0.0745 0.9993
0.9116 7000 0.0976 0.0736 0.9993
0.9246 7100 0.0973 0.0728 0.9993
0.9376 7200 0.0984 0.0726 0.9993
0.9506 7300 0.0943 0.0723 0.9993
0.9637 7400 0.0825 0.0719 0.9993
0.9767 7500 0.0961 0.0716 0.9993
0.9897 7600 0.0893 0.0715 0.9993

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.1.1
  • Transformers: 4.57.0
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
2
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AryehRotberg/ToS-Sentence-Transformers-V4

Finetuned
(752)
this model

Papers for AryehRotberg/ToS-Sentence-Transformers-V4

Evaluation results