Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 12
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the csv dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Gurveer05/bge-base-eedi-2024")
# Run inference
sentences = [
'Question: Solve quadratic equations using the quadratic formula where the coefficient of x² is 1. Vera wants to solve this equation using the quadratic formula.\n(\nh^2+4=5 h\n)\n\nWhat should replace the triangle? The image shows the structure of the quadratic formula. It says plus or minus the square root, and the triangle is the first thing after the square root sign, with a minus sign after it.\n\nOptions:\nA. 8\nB. -10\nC. 16\nD. 25\n\nAnswer: -10',
'Mixes up squaring and multiplying by 2 or doubling',
'Does not realise a quadratic must be in the form ax^2+bx+c=0 to find the values for the quadratic formula',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| anchor | positive | negative |
|---|---|---|
Question: Add algebraic fractions where the denominators are single terms and are not multiples of each other. Express the following as a single fraction, writing your answer as simply as possible: (t / s)+(2 s / t). |
When adding/subtracting fractions, adds/subtracts the denominators and multiplies the numerators |
Thinks can combine the numerator and denominator after simplifying an algebraic fraction |
Question: Calculate the volume of a cone where the dimensions are all given in the same units. STEP 2 |
When using Pythagoras to find the height of an isosceles triangle, uses the whole base instead of half |
Has multiplied base by slant height and perpendicular height to find area of a triangle |
Question: Convert from hours to minutes. 3 hours is the same as ___________ minutes. |
Thinks there are 30 minutes in a hour |
Answers as if there are 100 minutes in an hour when changing from hours to minutes |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: stepsper_device_train_batch_size: 24per_device_eval_batch_size: 24learning_rate: 2e-05weight_decay: 0.01num_train_epochs: 20lr_scheduler_type: cosine_with_restartswarmup_ratio: 0.1fp16: Trueload_best_model_at_end: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 24per_device_eval_batch_size: 24per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.01adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 20max_steps: -1lr_scheduler_type: cosine_with_restartslr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseeval_use_gather_object: Falsebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss |
|---|---|---|
| 0.2581 | 16 | 3.3202 |
| 0.5 | 31 | - |
| 0.5161 | 32 | 2.9432 |
| 0.7742 | 48 | 2.6014 |
| 1.0 | 62 | - |
| 1.0323 | 64 | 2.1029 |
| 1.1613 | 80 | 1.5757 |
| 1.3710 | 93 | - |
| 1.4194 | 96 | 2.0139 |
| 1.6774 | 112 | 1.8208 |
| 1.8710 | 124 | - |
| 1.9355 | 128 | 1.6599 |
| 2.0645 | 144 | 0.7017 |
| 2.2419 | 155 | - |
| 2.3226 | 160 | 1.4833 |
| 2.5806 | 176 | 1.3274 |
| 2.7419 | 186 | - |
| 2.8387 | 192 | 1.1951 |
| 3.0968 | 208 | 0.5799 |
| 3.1129 | 217 | - |
| 3.2258 | 224 | 0.9517 |
| 3.4839 | 240 | 1.0177 |
| 3.6129 | 248 | - |
| 3.7419 | 256 | 0.8864 |
| 4.0 | 272 | 0.7591 |
| 4.1129 | 279 | - |
| 4.1290 | 288 | 0.4319 |
| 4.3871 | 304 | 0.7878 |
| 4.4839 | 310 | - |
| 4.6452 | 320 | 0.7483 |
| 4.9032 | 336 | 0.6432 |
| 4.9839 | 341 | - |
| 5.0323 | 352 | 0.2496 |
| 5.2903 | 368 | 0.6689 |
| 5.3548 | 372 | - |
| 5.5484 | 384 | 0.628 |
| 5.8065 | 400 | 0.4981 |
| 5.8548 | 403 | - |
| 6.0645 | 416 | 0.3208 |
| 6.1935 | 432 | 0.4169 |
| 6.2258 | 434 | - |
| 6.4516 | 448 | 0.5049 |
| 6.7097 | 464 | 0.4402 |
| 6.7258 | 465 | - |
| 6.9677 | 480 | 0.3819 |
| 7.0968 | 496 | 0.1854 |
| 7.3548 | 512 | 0.4292 |
| 7.5968 | 527 | - |
| 7.6129 | 528 | 0.4171 |
| 7.8710 | 544 | 0.318 |
| 8.0968 | 558 | - |
| 8.1290 | 560 | 0.1318 |
| 8.2581 | 576 | 0.3829 |
| 8.4677 | 589 | - |
| 8.5161 | 592 | 0.4097 |
| 8.7742 | 608 | 0.2676 |
| 8.9677 | 620 | - |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
BAAI/bge-base-en-v1.5