Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
11
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the csv dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Gurveer05/all-MiniLM-eedi-2024")
# Run inference
sentences = [
'Construct: Solve coordinate geometry questions involving ratio.\n\nQuestion: A straight line on squared paper. Points P, Q and R lie on this line. The leftmost end of the line is labelled P. If you travel right 4 squares and up 1 square you get to point Q. If you then travel 8 squares right and 2 squares up from Q you reach point R. What is the ratio of P Q: P R ?\n\nOptions:\nA. 1: 12\nB. 1: 4\nC. 1: 2\nD. 1: 3\n\nCorrect Answer: 1: 3\n\nIncorrect Answer: 1: 2\n\nPredicted Misconception: Misunderstanding the ratio calculation by not considering the correct horizontal and vertical distances between points P, Q, and R.',
'May have estimated when using ratios with geometry',
'Thinks x = y is an axis',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
qa_pair_text, MisconceptionName, and negative| qa_pair_text | MisconceptionName | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| qa_pair_text | MisconceptionName | negative |
|---|---|---|
Construct: Construct frequency tables. |
Frequency 0 |
4 1 |
Construct: Convert between any other time periods. |
Answers as if there are 60 hours in a day |
Confuses an equation with an expression |
Construct: Given information about one part, work out other parts. |
Thinks a difference of one part in a ratio means the quantities will differ by one unit |
Believes dividing two positives will give a negative answer |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
qa_pair_text, MisconceptionName, and negative| qa_pair_text | MisconceptionName | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| qa_pair_text | MisconceptionName | negative |
|---|---|---|
Construct: Identify when rounding a calculation will give an over or under approximation. |
Believes that the larger the dividend, the smaller the answer. |
Does not know how to calculate the mean |
Construct: Substitute negative integer values into expressions involving no powers or roots. |
y_1 | x_2 |
Construct: Round numbers to three or more decimal places. |
Rounds up instead of down |
When dividing decimals, does not realize that the order and position of the digits (relative to each other) has to remain constant. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: stepsper_device_train_batch_size: 32per_device_eval_batch_size: 32gradient_accumulation_steps: 8learning_rate: 1e-05weight_decay: 0.01num_train_epochs: 40lr_scheduler_type: cosinelr_scheduler_kwargs: {'num_cycles': 20}warmup_ratio: 0.1fp16: Trueload_best_model_at_end: Truegradient_checkpointing: Truegradient_checkpointing_kwargs: {'use_reentrant': False}batch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 8eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 1e-05weight_decay: 0.01adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 40max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {'num_cycles': 20}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Truegradient_checkpointing_kwargs: {'use_reentrant': False}include_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falsebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | loss |
|---|---|---|---|
| 0.5026 | 12 | 2.2789 | - |
| 1.0052 | 24 | 2.1642 | 1.9746 |
| 1.4974 | 36 | 2.0463 | - |
| 2.0 | 48 | 1.8955 | 1.6808 |
| 2.4921 | 60 | 1.7692 | - |
| 2.9948 | 72 | 1.6528 | 1.4532 |
| 3.4869 | 84 | 1.5298 | - |
| 3.9895 | 96 | 1.4338 | 1.2853 |
| 4.4817 | 108 | 1.3374 | - |
| 4.9843 | 120 | 1.3084 | 1.2465 |
| 5.4764 | 132 | 1.2921 | - |
| 5.9791 | 144 | 1.2143 | 1.1766 |
| 6.4712 | 156 | 1.1689 | - |
| 6.9738 | 168 | 1.1656 | 1.1518 |
| 7.4660 | 180 | 1.1172 | - |
| 7.9686 | 192 | 1.0737 | 1.1080 |
| 8.4607 | 204 | 1.0373 | - |
| 8.9634 | 216 | 1.0445 | 1.0874 |
| 9.4555 | 228 | 0.9707 | - |
| 9.9581 | 240 | 0.9644 | 1.0649 |
| 10.4503 | 252 | 0.9252 | - |
| 10.9529 | 264 | 0.9211 | 1.0367 |
| 11.4450 | 276 | 0.8645 | - |
| 11.9476 | 288 | 0.8635 | 1.0297 |
| 12.4398 | 300 | 0.8279 | - |
| 12.9424 | 312 | 0.819 | 1.0161 |
| 13.4346 | 324 | 0.7684 | - |
| 13.9372 | 336 | 0.7842 | 1.0016 |
| 14.4293 | 348 | 0.7448 | - |
| 14.9319 | 360 | 0.7321 | 0.9951 |
| 15.4241 | 372 | 0.7064 | - |
| 15.9267 | 384 | 0.7161 | 0.9835 |
| 16.4188 | 396 | 0.6692 | - |
| 16.9215 | 408 | 0.6594 | 0.9774 |
| 17.4136 | 420 | 0.6405 | - |
| 17.9162 | 432 | 0.638 | 0.9723 |
| 18.4084 | 444 | 0.6 | - |
| 18.9110 | 456 | 0.6122 | 0.9706 |
| 19.4031 | 468 | 0.5763 | - |
| 19.9058 | 480 | 0.5787 | 0.9732 |
| 20.3979 | 492 | 0.5432 | - |
| 20.9005 | 504 | 0.5599 | 0.9618 |
| 21.3927 | 516 | 0.5245 | - |
| 21.8953 | 528 | 0.5278 | 0.9626 |
| 22.3874 | 540 | 0.4989 | - |
| 22.8901 | 552 | 0.509 | 0.9583 |
| 23.3822 | 564 | 0.4674 | - |
| 23.8848 | 576 | 0.4854 | 0.9573 |
| 24.3770 | 588 | 0.4619 | - |
| 24.8796 | 600 | 0.4631 | 0.9615 |
| 25.3717 | 612 | 0.4339 | - |
| 25.8743 | 624 | 0.4427 | 0.9593 |
| 26.3665 | 636 | 0.4225 | - |
| 26.8691 | 648 | 0.4245 | 0.9694 |
| 27.3613 | 660 | 0.3936 | - |
| 27.8639 | 672 | 0.4168 | 0.9586 |
| 28.3560 | 684 | 0.3835 | - |
| 28.8586 | 696 | 0.3921 | 0.9629 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
sentence-transformers/all-MiniLM-L6-v2