SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/all-MiniLM-L6-v2
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("AryehRotberg/ToS-Sentence-Transformers-V2")
# Run inference
sentences = [
    'Each customer may register only one Coinbase account.',
    'Alternative accounts are not allowed',
    'Usernames can be rejected or changed for any reason',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9993

Training Details

Training Dataset

Unnamed Dataset

  • Size: 203,040 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 5 tokens
    • mean: 47.01 tokens
    • max: 256 tokens
    • min: 6 tokens
    • mean: 15.08 tokens
    • max: 29 tokens
    • min: 4 tokens
    • mean: 14.45 tokens
    • max: 29 tokens
  • Samples:
    anchor positive negative
    but remains subject to the promises made in any pre-existing Privacy Policy (unless, of course, the customer consents otherwise). Promises will be kept after a merger or acquisition When the service wants to change its terms, you are notified a week or more in advance.
    Visits are logged by the Web server. These logs are only used for maintenance purposes and to generate anonymous access statistics. Only necessary logs are kept by the service to ensure quality An onion site accessible over Tor is provided
    You affirm that you are over the age of 13, as the FanFiction.Net Service is not intended for children under 13. This service is only available to users over a certain age No need to register
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 50,760 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 4 tokens
    • mean: 45.97 tokens
    • max: 256 tokens
    • min: 4 tokens
    • mean: 14.82 tokens
    • max: 29 tokens
    • min: 4 tokens
    • mean: 14.36 tokens
    • max: 29 tokens
  • Samples:
    anchor positive negative
    HP is not required to host, display, or distribute any User Submissions on or through This Website and may remove at any time or refuse any User Submissions for any reason. User-generated content can be blocked or censored for any reason The service will only respond to government requests that are reasonable
    How we use information we collect Information is provided about how your personal data is used The service does not index or open files that you upload
    your use of the LYKA Service is solely for your own personal use and you therefore must not, nor attempt to, resell or charge others for use of or access to the LYKA Service or for any business purposes; This service is only available for use individually and non-commercially. You cannot opt out of promotional communications
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss all-nli-dev_cosine_accuracy
-1 -1 - - 0.9547
0.0079 100 1.3098 1.1250 0.9618
0.0158 200 1.0671 0.9039 0.9726
0.0236 300 0.8861 0.7616 0.9788
0.0315 400 0.7625 0.6672 0.9824
0.0394 500 0.7217 0.5984 0.9852
0.0473 600 0.6612 0.5432 0.9875
0.0552 700 0.5484 0.5048 0.9884
0.0630 800 0.5435 0.4699 0.9898
0.0709 900 0.522 0.4319 0.9909
0.0788 1000 0.4715 0.4152 0.9915
0.0867 1100 0.4495 0.3909 0.9923
0.0946 1200 0.4552 0.3741 0.9929
0.1024 1300 0.4159 0.3559 0.9934
0.1103 1400 0.4095 0.3404 0.9937
0.1182 1500 0.3849 0.3267 0.9936
0.1261 1600 0.3357 0.3208 0.9941
0.1340 1700 0.4029 0.2989 0.9946
0.1418 1800 0.3413 0.2882 0.9949
0.1497 1900 0.3254 0.2842 0.9952
0.1576 2000 0.3123 0.2817 0.9950
0.1655 2100 0.3003 0.2652 0.9955
0.1734 2200 0.3117 0.2559 0.9959
0.1812 2300 0.332 0.2504 0.9959
0.1891 2400 0.2923 0.2481 0.9962
0.1970 2500 0.2747 0.2389 0.9961
0.2049 2600 0.2507 0.2355 0.9962
0.2128 2700 0.2563 0.2294 0.9965
0.2206 2800 0.2512 0.2228 0.9967
0.2285 2900 0.2622 0.2201 0.9967
0.2364 3000 0.234 0.2183 0.9968
0.2443 3100 0.2607 0.2158 0.9969
0.2522 3200 0.2221 0.2077 0.9973
0.2600 3300 0.2559 0.2037 0.9971
0.2679 3400 0.2261 0.2044 0.9969
0.2758 3500 0.2453 0.1985 0.9969
0.2837 3600 0.2251 0.1927 0.9975
0.2916 3700 0.2716 0.1913 0.9976
0.2994 3800 0.1949 0.1894 0.9975
0.3073 3900 0.2361 0.1868 0.9973
0.3152 4000 0.223 0.1812 0.9974
0.3231 4100 0.1846 0.1788 0.9974
0.3310 4200 0.2143 0.1771 0.9974
0.3388 4300 0.2063 0.1705 0.9976
0.3467 4400 0.2207 0.1693 0.9977
0.3546 4500 0.2053 0.1608 0.9980
0.3625 4600 0.1705 0.1603 0.9981
0.3704 4700 0.2085 0.1597 0.9980
0.3783 4800 0.2034 0.1561 0.9981
0.3861 4900 0.1765 0.1562 0.9981
0.3940 5000 0.1955 0.1497 0.9982
0.4019 5100 0.1843 0.1487 0.9981
0.4098 5200 0.186 0.1479 0.9981
0.4177 5300 0.1631 0.1498 0.9980
0.4255 5400 0.1719 0.1468 0.9980
0.4334 5500 0.1916 0.1436 0.9983
0.4413 5600 0.1706 0.1421 0.9982
0.4492 5700 0.1512 0.1372 0.9984
0.4571 5800 0.1626 0.1357 0.9984
0.4649 5900 0.1652 0.1332 0.9985
0.4728 6000 0.146 0.1325 0.9986
0.4807 6100 0.1487 0.1308 0.9986
0.4886 6200 0.1565 0.1290 0.9985
0.4965 6300 0.1567 0.1281 0.9985
0.5043 6400 0.1678 0.1264 0.9985
0.5122 6500 0.1203 0.1261 0.9986
0.5201 6600 0.1572 0.1245 0.9985
0.5280 6700 0.1539 0.1221 0.9985
0.5359 6800 0.1546 0.1226 0.9986
0.5437 6900 0.1216 0.1185 0.9987
0.5516 7000 0.1272 0.1193 0.9986
0.5595 7100 0.1321 0.1179 0.9988
0.5674 7200 0.1305 0.1144 0.9988
0.5753 7300 0.1558 0.1151 0.9987
0.5831 7400 0.1282 0.1133 0.9986
0.5910 7500 0.1442 0.1113 0.9986
0.5989 7600 0.1529 0.1094 0.9988
0.6068 7700 0.1254 0.1086 0.9987
0.6147 7800 0.1158 0.1061 0.9988
0.6225 7900 0.1127 0.1063 0.9988
0.6304 8000 0.1253 0.1052 0.9988
0.6383 8100 0.1542 0.1050 0.9989
0.6462 8200 0.1237 0.1038 0.9990
0.6541 8300 0.1307 0.1029 0.9988
0.6619 8400 0.1231 0.1022 0.9989
0.6698 8500 0.1573 0.1002 0.9990
0.6777 8600 0.1257 0.0990 0.9990
0.6856 8700 0.103 0.0986 0.9990
0.6935 8800 0.1143 0.0983 0.9990
0.7013 8900 0.1138 0.0965 0.9991
0.7092 9000 0.1158 0.0962 0.9990
0.7171 9100 0.1104 0.0960 0.9991
0.7250 9200 0.1054 0.0967 0.9991
0.7329 9300 0.1194 0.0946 0.9991
0.7407 9400 0.1245 0.0936 0.9991
0.7486 9500 0.126 0.0926 0.9991
0.7565 9600 0.1059 0.0913 0.9992
0.7644 9700 0.1101 0.0906 0.9992
0.7723 9800 0.1192 0.0898 0.9993
0.7801 9900 0.1241 0.0886 0.9993
0.7880 10000 0.1134 0.0876 0.9993
0.7959 10100 0.1071 0.0868 0.9993
0.8038 10200 0.1043 0.0869 0.9993
0.8117 10300 0.1191 0.0864 0.9993
0.8195 10400 0.1188 0.0853 0.9993
0.8274 10500 0.1014 0.0847 0.9993
0.8353 10600 0.0878 0.0846 0.9993
0.8432 10700 0.0952 0.0839 0.9993
0.8511 10800 0.1169 0.0841 0.9993
0.8589 10900 0.1032 0.0825 0.9993
0.8668 11000 0.1086 0.0823 0.9993
0.8747 11100 0.1058 0.0820 0.9993
0.8826 11200 0.0973 0.0818 0.9993
0.8905 11300 0.1166 0.0811 0.9993
0.8983 11400 0.0965 0.0807 0.9993
0.9062 11500 0.0974 0.0805 0.9993
0.9141 11600 0.0984 0.0803 0.9993
0.9220 11700 0.1199 0.0798 0.9993
0.9299 11800 0.0854 0.0794 0.9993
0.9377 11900 0.1004 0.0798 0.9993
0.9456 12000 0.1119 0.0792 0.9993
0.9535 12100 0.1171 0.0790 0.9993
0.9614 12200 0.1045 0.0787 0.9993
0.9693 12300 0.1116 0.0784 0.9993
0.9771 12400 0.091 0.0781 0.9993
0.9850 12500 0.083 0.0781 0.9993
0.9929 12600 0.1146 0.0779 0.9993

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 3.4.1
  • Transformers: 4.51.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
1
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AryehRotberg/ToS-Sentence-Transformers-V2

Finetuned
(752)
this model

Papers for AryehRotberg/ToS-Sentence-Transformers-V2

Evaluation results