Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
10
This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2 on the csv dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Gurveer05/mpnet-base-eedi-2024")
# Run inference
sentences = [
'Construct: Solve coordinate geometry questions involving ratio.\n\nQuestion: A straight line on squared paper. Points P, Q and R lie on this line. The leftmost end of the line is labelled P. If you travel right 4 squares and up 1 square you get to point Q. If you then travel 8 squares right and 2 squares up from Q you reach point R. What is the ratio of P Q: P R ?\n\nOptions:\nA. 1: 12\nB. 1: 4\nC. 1: 2\nD. 1: 3\n\nCorrect Answer: 1: 3\n\nIncorrect Answer: 1: 2\n\nPredicted Misconception: Misunderstanding the ratio calculation by not considering the correct horizontal and vertical distances between points P, Q, and R.',
'May have estimated when using ratios with geometry',
'Thinks x = y is an axis',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
qa_pair_text, MisconceptionName, and negative| qa_pair_text | MisconceptionName | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| qa_pair_text | MisconceptionName | negative |
|---|---|---|
Construct: Construct frequency tables. |
Frequency 0 |
4 1 |
Construct: Convert between any other time periods. |
Answers as if there are 60 hours in a day |
Confuses an equation with an expression |
Construct: Given information about one part, work out other parts. |
Thinks a difference of one part in a ratio means the quantities will differ by one unit |
Believes dividing two positives will give a negative answer |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
qa_pair_text, MisconceptionName, and negative| qa_pair_text | MisconceptionName | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| qa_pair_text | MisconceptionName | negative |
|---|---|---|
Construct: Identify when rounding a calculation will give an over or under approximation. |
Believes that the larger the dividend, the smaller the answer. |
Does not know how to calculate the mean |
Construct: Substitute negative integer values into expressions involving no powers or roots. |
y_1 | x_2 |
Construct: Round numbers to three or more decimal places. |
Rounds up instead of down |
When dividing decimals, does not realize that the order and position of the digits (relative to each other) has to remain constant. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16gradient_accumulation_steps: 16learning_rate: 1e-05weight_decay: 0.01num_train_epochs: 40lr_scheduler_type: cosinelr_scheduler_kwargs: {'num_cycles': 20}warmup_ratio: 0.1fp16: Trueload_best_model_at_end: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 16eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 1e-05weight_decay: 0.01adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 40max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {'num_cycles': 20}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falsebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | loss |
|---|---|---|---|
| 0.5026 | 12 | 1.6224 | - |
| 1.0026 | 24 | 1.4736 | 1.6492 |
| 1.5052 | 36 | 1.3341 | - |
| 2.0052 | 48 | 1.1563 | 1.3401 |
| 2.5079 | 60 | 1.0641 | - |
| 3.0079 | 72 | 0.9238 | 1.1597 |
| 3.5105 | 84 | 0.8253 | - |
| 4.0105 | 96 | 0.7101 | 1.0224 |
| 4.5131 | 108 | 0.6285 | - |
| 5.0131 | 120 | 0.5821 | 0.9944 |
| 5.5157 | 132 | 0.5676 | - |
| 6.0157 | 144 | 0.5018 | 0.9471 |
| 6.5183 | 156 | 0.4599 | - |
| 7.0183 | 168 | 0.4403 | 0.9292 |
| 7.5209 | 180 | 0.4161 | - |
| 8.0209 | 192 | 0.3784 | 0.9107 |
| 8.5236 | 204 | 0.3503 | - |
| 9.0236 | 216 | 0.3451 | 0.9042 |
| 9.5262 | 228 | 0.3141 | - |
| 10.0262 | 240 | 0.2916 | 0.9012 |
| 10.5288 | 252 | 0.2863 | - |
| 11.0288 | 264 | 0.2713 | 0.8977 |
| 11.5314 | 276 | 0.244 | - |
| 12.0314 | 288 | 0.2323 | 0.8922 |
| 12.5340 | 300 | 0.2293 | - |
| 13.0340 | 312 | 0.211 | 0.8933 |
| 13.5366 | 324 | 0.1972 | - |
| 14.0366 | 336 | 0.1918 | 0.9024 |
| 14.5393 | 348 | 0.1868 | - |
| 15.0393 | 360 | 0.1704 | 0.8930 |
| 15.5419 | 372 | 0.1661 | - |
| 16.0419 | 384 | 0.1666 | 0.9077 |
| 16.5445 | 396 | 0.1558 | - |
| 17.0445 | 408 | 0.1459 | 0.9153 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
sentence-transformers/all-mpnet-base-v2