Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 14
How to use vinay-pepakayala/embedding_model_custom with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("vinay-pepakayala/embedding_model_custom")
sentences = [
"SD-V3_01_RD00.0 FORM_SD_INVOICE_EN <Banking data> VF02/ VF03 *** Only printed on the last page ***",
"FSCM01 AR Accounting Create four debit dispute cases, for four approval levels in EUR/PLN/GBP for 4 different countries\nAttach backup, mark X for attachment, Create case title, Add information in Text field AR Accounting FBL5N Validate that the dispute case is created against the open line item on FBL5N.\nNote the Case ID# to find the dispute case in Dispute Management.\nCheck if automatic posting was done on a proper account linked to Special GL Indicator",
"SD-V3_01_RD00.0 FORM_SD_INVOICE_EN <Company Code data on the footer> VF02/ VF03 *** Printed on all pages *** Name and address data of the issuing Company Code\n\n",
"FA03_Depreciation run Recalculate Depreciation - it might be nessesary to recalculate deprec. if depreciation key has been changed (executed after depreciation run) Asset Accounting AFAR all proceed assets shouldn't have errors and changes"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'CO02-01 Create/Change/Display original budget on internal orders Create/Change original budget KO22 CSA master data Budget is posted to internal order',
'CO02-01 Create/Change/Display original budget on internal orders Display original budget KO23 CSA master data Budger for internal order is displayed',
"NA_SD_NISSAN Prod_Multiple Parts OPTIONAL: Validate part quantity and HU's successfully transfer to staging lane ZSDSPE68 Net difference for each part is 0 (zero quantity)",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, 0.9881, -0.1206],
# [ 0.9881, 1.0000, -0.1301],
# [-0.1206, -0.1301, 1.0000]])
sentence_0 and sentence_1| sentence_0 | sentence_1 | |
|---|---|---|
| type | string | string |
| details |
|
|
| sentence_0 | sentence_1 |
|---|---|
SD06-Sale Schedule agreement delivery process Stock Review-Stock Checking MMBE/LS24 D0P_ECMM_MATDOCD_TASKROLE Check stock before delivery |
SD06-Sale Schedule agreement delivery process DN - Create delivery based on SA Type:ZLZ VL01N/VL10E/ZSDSPE93 D0P_ECSD_DLVERYA_TASKROLE Delivery XXXXX created |
NA_SD_DENSO Execute shippinng due list (VL10E) : Provide the input parameters like Shipping point, Horizon and Until Date along with other parameters like "Ship to party", Unloading points, sales document types etc and display shipping due list VL10E Shipping due list is displayed. |
NA_SD_DENSO VL10E Message displayed: See log for deliveries created. Delivery number displayed |
GL02-GL Journal Entries: manual/GLSU/load Excel file with journal entries In the transaction selection screen, enter all relevant data (type of surplus, max variance amount.....), making sure you tick the "Prepare List" option at the bottom so you can view the proposed clearings rather than automatic posting. Execute to see list of proposed options MR11 |
GL02-GL Journal Entries: manual/GLSU/load Excel file with journal entries In the list proposed, use filter options to refine the criteriae. Select the items you want the system to clear MR11SHOW Log screen listing the new accounting documents created by the MR8M postings. Check the document by going to the PO history. From the PO History, go to the corresponding Accounting document which will be a KP posting on the GR/NI GL account. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false
}
per_device_train_batch_size: 32per_device_eval_batch_size: 32num_train_epochs: 2multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss |
|---|---|---|
| 1.3193 | 500 | 0.1066 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
nreimers/MiniLM-L6-H384-uncased