SentenceTransformer

This is a sentence-transformers model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False, 'architecture': 'MPNetModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'On March 7, 2018, China National Service Corporation donated 100,000 national flags, 150,000 pins, and 10,000 mugs worth Rs 16,610,000 to celebrate Mauritius 50th anniversary of independence. The Mauritius Ministry of Arts and Culture received the donation. According to the Minister of Arts and Culture, Mr. Prithvirajsing Roopun, the national flags, pins and mugs will be distributed among the population for the celebrations of Independence Day.',
    'Target 16.4 of SDG 16: By 2030, significantly reduce illicit financial and arms flows, strengthen the recovery and return of stolen assets and combat all forms of organized crime',
    'Target 4.A of SDG 4: Build and upgrade education facilities that are child, disability and gender sensitive and provide safe, non-violent, inclusive and effective learning environments for all',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7700, 0.4148],
#         [0.7700, 1.0000, 0.3792],
#         [0.4148, 0.3792, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 307,331 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 20 tokens
    • mean: 163.19 tokens
    • max: 384 tokens
    • min: 15 tokens
    • mean: 33.44 tokens
    • max: 144 tokens
  • Samples:
    sentence_0 sentence_1
    Around April 2011, the Chinese Embassy donated three computers to schools in the Tonota North constituency of Botswana. Each computer cost 6,000 BWP (total amount of donation is 18,000 BWP). Target 4.1 of SDG 4: By 2030, ensure that all girls and boys complete free, equitable and quality primary and secondary education leading to relevant and effective learning outcomes
    On November 8, 2007, the Export-Import Bank of China and Kazkomemrtsbank JSC signed a buyer's credit loan agreement to finance Kazakhstani mobile operator Mobile Telecom Service LLP's purchase of equipment from Huawei Technologies Co., Ltd. The loan carried a maturity of seven years and a flexible schedule of repayment, based on the profit generated by the project. The loan sought to further the development of Mobile Telecom Service's Global System for Mobile Communications (GSM) network. Mobile Telecom Service was a new GSM-operator in Kazakhstan that provided its services under the trademark "NEO." The commercial start of its network took place February 14, 2007; its network covered 45 cities in Kazakhstan in 2007, with plans to expand to 84 by the end of that year. In 2007, 51% of Mobile Telecom Service was owned by Kazakh state-owned Kazakhtelecom JSC. It appears that Kazkommertsbank JSC then used the proceeds from the export credit agreement to on-lend to Mobile Telecom Service L... Target 9.B of SDG 9: Encourage local innovation and technology growth
    In 2000, the Chinese Government provided a grant of USD 19,931 to the Government of Zimbabwe for food acquisition after the flood of March 2000. This project is captured in UNOCHA Financial Tracking Service as Flow ID #2614. The exact start and end dates of this donation are unknown. This project is completed. Target 11.1 of SDG 11: Provide safe and affordable housing for everyone
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0521 500 2.6494
0.1041 1000 2.4342
0.1562 1500 2.4027
0.2082 2000 2.3301
0.2603 2500 2.3267
0.3123 3000 2.2793
0.3644 3500 2.2734
0.4164 4000 2.2402
0.4685 4500 2.2295
0.5206 5000 2.2161
0.5726 5500 2.2258
0.6247 6000 2.2128
0.6767 6500 2.2012
0.7288 7000 2.1859
0.7808 7500 2.2036
0.8329 8000 2.1855
0.8850 8500 2.1702
0.9370 9000 2.1671
0.9891 9500 2.1593
1.0411 10000 2.1199
1.0932 10500 2.1572
1.1452 11000 2.1436
1.1973 11500 2.1294
1.2493 12000 2.148
1.3014 12500 2.1336
1.3535 13000 2.153
1.4055 13500 2.1291
1.4576 14000 2.1353
1.5096 14500 2.1161
1.5617 15000 2.1274
1.6137 15500 2.1181
1.6658 16000 2.1137
1.7179 16500 2.1303
1.7699 17000 2.1069
1.8220 17500 2.1177
1.8740 18000 2.1066
1.9261 18500 2.1066
1.9781 19000 2.1156
2.0302 19500 2.0948
2.0822 20000 2.1024
2.1343 20500 2.0983
2.1864 21000 2.1061
2.2384 21500 2.1009
2.2905 22000 2.1179
2.3425 22500 2.0958
2.3946 23000 2.0668
2.4466 23500 2.0807
2.4987 24000 2.1011
2.5508 24500 2.0767
2.6028 25000 2.0838
2.6549 25500 2.0873
2.7069 26000 2.1036
2.7590 26500 2.0815
2.8110 27000 2.0853
2.8631 27500 2.09
2.9151 28000 2.086
2.9672 28500 2.0795

Framework Versions

  • Python: 3.12.2
  • Sentence Transformers: 5.3.0
  • Transformers: 4.49.0
  • PyTorch: 2.3.1.post100
  • Accelerate: 1.12.0
  • Datasets: 3.3.2
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}
Downloads last month
17
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for voyager205/sdg-variant-finetuned