Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
12
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("along26/mpnet_manglish-sentence-transformer")
# Run inference
sentences = [
'Why have critics accused Najib Razak of mishandling the economy and what evidence supports these claims?',
'Mengapa pengkritik menuduh Najib Razak salah mengendalikan ekonomi dan bukti apa yang menyokong dakwaan ini?',
"How does adding more reactant or product affect the equilibrium position of a chemical reaction? Explain using Le Chatelier's principle with at least three different examples.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, -0.8678, 0.9288],
# [-0.8678, 1.0000, -0.8804],
# [ 0.9288, -0.8804, 1.0000]])
sentence_0, sentence_1, and sentence_2| sentence_0 | sentence_1 | sentence_2 | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| sentence_0 | sentence_1 | sentence_2 |
|---|---|---|
Suppose there are 6 guests at a party. Among them, some are friends with each other and some are strangers. What is the minimum number of guests who must be friends with each other or at least strangers with each other, in order to guarantee that there are 3 guests who are all friends or 3 guests who are all strangers? |
Eh, imagine got 6 people go party lah. Some of them kakis, some don't know each other one. How many people must be friends or at least don't know each other, so can confirm got 3 people all kakis or 3 people all don't know each other? Aiyoh, very headache leh! |
Why is there still a lack of emphasis on science, technology, engineering, and mathematics (STEM) education in Malaysia? |
A photon at point A is entangled with a photon at point B. Using the quantum teleportation protocol, how can you transfer the quantum state of the photon at point A to point C, without physically moving the photon from point A to point C, given that a shared entangled pair of photons is available between point A and point C? Provide a step-by-step explanation of the procedure. |
Foton pada titik A terikat dengan foton pada titik B. Menggunakan protokol teleportasi kuantum, bagaimana anda boleh memindahkan keadaan kuantum foton pada titik A ke titik C, tanpa memindahkan foton secara fizikal dari titik A ke titik C, diberikan bahawa sepasang foton terjerat yang dikongsi tersedia antara titik A dan titik C? Berikan penjelasan langkah demi langkah tentang prosedur. |
Civil society groups and activists in Malaysia have expressed concern about the government's handling of the 1MDB scandal and the prosecution of those involved for several reasons. The 1MDB scandal involves allegations of massive corruption and money laundering at the state-owned investment fund, with billions of dollars allegedly misappropriated and used for personal gain. |
To solve this problem, we can use the generating functions method. Let's represent each number in the set {1, 2, 3, ..., 10} as a variable x raised to the power of that number. The generating function for this set is: |
Untuk menyelesaikan masalah ini, kita boleh menggunakan kaedah fungsi penjanaan. Mari kita wakili setiap nombor dalam set {1, 2, 3, ..., 10} sebagai pembolehubah x dinaikkan kepada kuasa nombor itu. Fungsi penjanaan untuk set ini ialah: |
Why does the Malaysian government still insist on implementing the controversial National Security Council Act, which gives the government excessive power to declare a security area and restrict civil liberties? |
TripletLoss with these parameters:{
"distance_metric": "TripletDistanceMetric.EUCLIDEAN",
"triplet_margin": 5
}
per_device_train_batch_size: 16per_device_eval_batch_size: 16multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss |
|---|---|---|
| 0.0574 | 500 | 3.8344 |
| 0.1148 | 1000 | 0.2259 |
| 0.1722 | 1500 | 0.0166 |
| 0.2295 | 2000 | 0.0025 |
| 0.2869 | 2500 | 0.0024 |
| 0.3443 | 3000 | 0.0022 |
| 0.4017 | 3500 | 0.0014 |
| 0.4591 | 4000 | 0.0016 |
| 0.5165 | 4500 | 0.0003 |
| 0.5739 | 5000 | 0.0002 |
| 0.6312 | 5500 | 0.0013 |
| 0.6886 | 6000 | 0.0002 |
| 0.7460 | 6500 | 0.0 |
| 0.8034 | 7000 | 0.0 |
| 0.8608 | 7500 | 0.0011 |
| 0.9182 | 8000 | 0.0 |
| 0.9756 | 8500 | 0.0007 |
| 1.0329 | 9000 | 0.0008 |
| 1.0903 | 9500 | 0.0 |
| 1.1477 | 10000 | 0.0 |
| 1.2051 | 10500 | 0.0008 |
| 1.2625 | 11000 | 0.0 |
| 1.3199 | 11500 | 0.0 |
| 1.3773 | 12000 | 0.0003 |
| 1.4346 | 12500 | 0.0009 |
| 1.4920 | 13000 | 0.0 |
| 1.5494 | 13500 | 0.0012 |
| 1.6068 | 14000 | 0.0012 |
| 1.6642 | 14500 | 0.0 |
| 1.7216 | 15000 | 0.0 |
| 1.7790 | 15500 | 0.0 |
| 1.8363 | 16000 | 0.0 |
| 1.8937 | 16500 | 0.0015 |
| 1.9511 | 17000 | 0.0004 |
| 2.0085 | 17500 | 0.0 |
| 2.0659 | 18000 | 0.0 |
| 2.1233 | 18500 | 0.0007 |
| 2.1806 | 19000 | 0.0001 |
| 2.2380 | 19500 | 0.0006 |
| 2.2954 | 20000 | 0.0006 |
| 2.3528 | 20500 | 0.0001 |
| 2.4102 | 21000 | 0.0 |
| 2.4676 | 21500 | 0.0 |
| 2.5250 | 22000 | 0.0003 |
| 2.5823 | 22500 | 0.0001 |
| 2.6397 | 23000 | 0.0 |
| 2.6971 | 23500 | 0.0 |
| 2.7545 | 24000 | 0.0 |
| 2.8119 | 24500 | 0.0006 |
| 2.8693 | 25000 | 0.0 |
| 2.9267 | 25500 | 0.0 |
| 2.9840 | 26000 | 0.0 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Base model
sentence-transformers/all-MiniLM-L6-v2