SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Besides that which the men brought him that were over the tributes, and the merchants, and they that sold by retail, and all the kings of Arabia, and the governors of the country.',
    'If this needs a federal mandate and 100% global consensus, than leaders like Macron should let us renegotiate. As it stands right now, this agreement is 100% toothless. There are no penalties for not following through with it.',
    "I don't look for much to come out of government ownership as long as we have Democrats and Republicans.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5648, 0.5502],
#         [0.5648, 1.0000, 0.7965],
#         [0.5502, 0.7965, 1.0000]])

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.3879
spearman_cosine 0.4048

Training Details

Training Dataset

Unnamed Dataset

  • Size: 11,180 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 6 tokens
    • mean: 104.26 tokens
    • max: 512 tokens
    • min: 4 tokens
    • mean: 118.5 tokens
    • max: 512 tokens
    • min: 0.0
    • mean: 0.53
    • max: 1.0
  • Samples:
    sentence_0 sentence_1 label
    He worked at Rothschild as an investment banker. Great. Am I supposed to be alarmed that France elected a technocrat who has worked in the private banking sector?

    I also don't give a shit about what macron does in his personal life. Clearly the French people don't either.
    Chad runs over the raccoon since it's been bothering him anyway. 0.3535533905932737
    Amazing effects for a movie of this time. A primer of the uselessness of war and how war becomes a nurturer of itself.A wonderful thing about this movie is it is now public domain and available at archive.org. No charge, no sign up necessary. Watch it in one sitting and you will be propelled.I plan to share this flick with as many people as possible as I had never heard of it before and I am a hard core sci fi fan.I would like to see how others react to this movie.Watch it.Rate it.Tell us what you think. First off, I must say that I made the mistake of watching the Election films out of sequence. I say unfortunately, because after seeing Election 2 first, Election seems a bit of a disappointment. Both films are gangster epics that are similar in form. And while Election is an enjoyable piece of cinema... it's just not nearly as good as it's sequel.In the first Election installment, we are shown the two competitors for Chairman; Big D and Lok. After a few scenes of discussion amongst the "Uncle's" as to who should have the Chairman title, they (almost unanimously) decide That Lok (Simon Yam) will helm the Triads. Suffice to say this doesn't go over very well with competitor Big D (Tony Leung Ka Fai) and in a bid to influence the takeover, Big D kidnaps two of the uncles in order to sway the election board to his side. This has disastrous results and heads the triads into an all out war. Lok is determined to become Chairman but won't become official until he can recover the "Dragon Head ... 0.7071067811865475
    MY SINCERE APOLOGIES 2U WHO I'VE OFFENDED WITH ALLEGATIONS OF COMPLACENT COWARDS & ASSHOLES FOR CLIMATE CHANGE INDIFFERENCE! yeah man fucking disgusting. as if we didn't waste enough time at work 1.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss similarity_spearman_cosine
0.0286 10 - 0.2006
0.0571 20 - 0.2012
0.0857 30 - 0.2023
0.1143 40 - 0.2036
0.1429 50 - 0.2054
0.1714 60 - 0.2081
0.2 70 - 0.2098
0.2286 80 - 0.2115
0.2571 90 - 0.2128
0.2857 100 - 0.2149
0.3143 110 - 0.2177
0.3429 120 - 0.2207
0.3714 130 - 0.2243
0.4 140 - 0.2278
0.4286 150 - 0.2310
0.4571 160 - 0.2332
0.4857 170 - 0.2350
0.5143 180 - 0.2361
0.5429 190 - 0.2360
0.5714 200 - 0.2369
0.6 210 - 0.2423
0.6286 220 - 0.2533
0.6571 230 - 0.2691
0.6857 240 - 0.2808
0.7143 250 - 0.2889
0.7429 260 - 0.2960
0.7714 270 - 0.2939
0.8 280 - 0.3007
0.8286 290 - 0.3010
0.8571 300 - 0.3016
0.8857 310 - 0.3035
0.9143 320 - 0.3078
0.9429 330 - 0.3138
0.9714 340 - 0.3206
1.0 350 - 0.3234
1.0286 360 - 0.3299
1.0571 370 - 0.3367
1.0857 380 - 0.3267
1.1143 390 - 0.3307
1.1429 400 - 0.3359
1.1714 410 - 0.3417
1.2 420 - 0.3504
1.2286 430 - 0.3324
1.2571 440 - 0.3365
1.2857 450 - 0.3580
1.3143 460 - 0.3622
1.3429 470 - 0.3073
1.3714 480 - 0.3596
1.4 490 - 0.3473
1.4286 500 0.1278 0.3573
1.4571 510 - 0.3539
1.4857 520 - 0.3355
1.5143 530 - 0.3299
1.5429 540 - 0.3559
1.5714 550 - 0.3285
1.6 560 - 0.3435
1.6286 570 - 0.3654
1.6571 580 - 0.3824
1.6857 590 - 0.3426
1.7143 600 - 0.3413
1.7429 610 - 0.3395
1.7714 620 - 0.3492
1.8 630 - 0.3664
1.8286 640 - 0.3634
1.8571 650 - 0.3392
1.8857 660 - 0.3686
1.9143 670 - 0.3722
1.9429 680 - 0.3557
1.9714 690 - 0.3896
2.0 700 - 0.3908
2.0286 710 - 0.3859
2.0571 720 - 0.3536
2.0857 730 - 0.3606
2.1143 740 - 0.3638
2.1429 750 - 0.3713
2.1714 760 - 0.3704
2.2 770 - 0.3441
2.2286 780 - 0.3435
2.2571 790 - 0.3668
2.2857 800 - 0.3735
2.3143 810 - 0.3373
2.3429 820 - 0.3474
2.3714 830 - 0.3560
2.4 840 - 0.3028
2.4286 850 - 0.3485
2.4571 860 - 0.3604
2.4857 870 - 0.3769
2.5143 880 - 0.3600
2.5429 890 - 0.3916
2.5714 900 - 0.3957
2.6 910 - 0.3797
2.6286 920 - 0.3875
2.6571 930 - 0.3978
2.6857 940 - 0.3951
2.7143 950 - 0.3831
2.7429 960 - 0.3912
2.7714 970 - 0.3800
2.8 980 - 0.3955
2.8286 990 - 0.3976
2.8571 1000 0.1036 0.4048

Framework Versions

  • Python: 3.11.9
  • Sentence Transformers: 5.1.0
  • Transformers: 4.49.0
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.10.0
  • Datasets: 2.14.4
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
1
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Culture-and-Morality-Lab/psyembedding-gte-large

Paper for Culture-and-Morality-Lab/psyembedding-gte-large

Evaluation results