Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
• 1908.10084 • Published
• 12
This is a sentence-transformers model finetuned from sentence-transformers/all-mpnet-base-v2. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
"TechnicalProficiencies DB: Oracle 11g Domains: Investment Banking, Advertising, Insurance. Programming Skills: SQL, PLSQL BI Tools: Informatica 9.1 OS: Windows, Unix Professional Development Trainings â\x80¢ Concepts in Data Warehousing, Business Intelligence, ETL. â\x80¢ BI Tools -Informatica 9X Education Details \r\n BCA Nanded, Maharashtra Nanded University\r\nETL Developer \r\n\r\nETL Developer - Sun Trust Bank NY\r\nSkill Details \r\nETL- Exprience - 39 months\r\nEXTRACT, TRANSFORM, AND LOAD- Exprience - 39 months\r\nINFORMATICA- Exprience - 39 months\r\nORACLE- Exprience - 39 months\r\nUNIX- Exprience - 39 monthsCompany Details \r\ncompany - Sun Trust Bank NY\r\ndescription - Sun Trust Bank, NY JAN 2018 to present\r\nClient: Sun Trust Bank NY\r\nEnvironment: Informatica Power Center 9.1, Oracle 11g, unix.\r\n\r\nRole: ETL Developer\r\n\r\nProject Profile:\r\nSun Trust Bank is a US based multinational financial services holding company, headquarters in NY that operates the Bank in New York and other financial services investments. The company is organized as a stock corporation with four divisions: investment banking, private banking, Retail banking and a shared services group that provides\r\nFinancial services and support to the other divisions.\r\nThe objective of the first module was to create a DR system for the bank with a central point of communication and storage for Listed, Cash securities, Loans, Bonds, Notes, Equities, Rates, Commodities, and\r\nFX asset classes.\r\nContribution / Highlights:\r\n\r\nâ\x80¢ Liaising closely with Project Manager, Business Analysts, Product Architects, and Requirements Modelers (CFOC) to define Technical requirements and create project documentation.\r\nâ\x80¢ Development using Infa 9.1, 11g/Oracle, UNIX.\r\nâ\x80¢ Use Informatica PowerCenter for extraction, transformation and loading (ETL) of data in the Database.\r\nâ\x80¢ Created and configured Sessions in Informatica workflow Manager for loading data into Data base tables from various heterogeneous database sources like Flat Files, Oracle etc.\r\nâ\x80¢ Unit testing and system integration testing of the developed mappings.\r\nâ\x80¢ Providing production Support of the deployed code.\r\nâ\x80¢ Providing solutions to the business for the Production issues.\r\nâ\x80¢ Had one to One interaction with the client throughout the project and in daily meetings.\r\n\r\nProject #2\r\ncompany - Marshall Multimedia\r\ndescription - JUN 2016 to DEC 2017\r\n\r\nClient: Marshall Multimedia\r\nEnvironment: Informatica Power Center 9.1, Oracle 11g, unix.\r\n\r\nRole: ETL Developer\r\n\r\nProject Profile:\r\nMarshall Multimedia is a US based multimedia advertisement services based organization which has\r\nhead courter in New York. EGC interface systems are advert management, Customer Management, Billing and\r\nProvisioning Systems for Consumer& Enterprise Customers.\r\nThe main aim of the project was to create an enterprise data warehouse which would suffice the need of reports belonging to the following categories: Financial reports, management reports and\r\nrejection reports. The professional reports were created by Cognos and ETL work was performed by\r\nInformatica. This project is to load the advert details and magazine details coming in Relational tables into data warehouse and calculate the compensation and incentive amount monthly twice as per business\r\nrules.\r\n\r\nContribution / Highlights:\r\nâ\x80¢ Developed mappings using different sources by using Informatica transformations.\r\nâ\x80¢ Created and configured Sessions in Informatica workflow Manager for loading data into Data Mart tables from various heterogeneous database sources like Flat Files, Oracle etc.\r\n\r\n2\r\nâ\x80¢ Unit testing and system integration testing of the developed mappings.\r\nâ\x80¢ Providing solutions to the business for the Production issues.\r\n\r\nProject #3\r\ncompany - Assurant healthcare/Insurance Miami USA\r\ndescription - Assurant, USA NOV 2015 to MAY 2016\r\n\r\nProject: ACT BI - State Datamart\r\nClient: Assurant healthcare/Insurance Miami USA\r\nEnvironment: Informatica Power Center 9.1, Oracle 11g, unix.\r\n\r\nRole: ETL Developer\r\n\r\nProject Profile:\r\nAssurant, Inc. is a holding company with businesses that provide a diverse set of specialty, niche-market insurance\r\nproducts in the property, casualty, life and health insurance sectors. The company's four operating segments are Assurant\r\nEmployee Benefits, Assurant Health, Assurant Solutions and Assurant Specialty Property.\r\nThe project aim at building State Datamart for enterprise solution. I am part of team which is responsible for ETL\r\nDesign & development along with testing.\r\n\r\nContribution / Highlights:\r\nâ\x80¢ Performed small enhancement\r\nâ\x80¢ Daily load monitoring\r\nâ\x80¢ Attend to Informatica job failures by analyzing the root cause, resolving the failure using standard\r\ndocumented process.\r\nâ\x80¢ Experience in writing SQL statements.\r\nâ\x80¢ Strong Problem Analysis & Resolution skills and ability to work in Multi Platform Environments\r\nâ\x80¢ Scheduled the Informatica jobs using Informatica scheduler\r\nâ\x80¢ Extensively used ETL methodology for developing and supporting data extraction, transformations and loading process, in a corporate-wide-ETL Solution using Informatica.\r\nâ\x80¢ Involved in creating the Unit cases and uploaded in to Quality Center for Unit Testing and UTR\r\nâ\x80¢ Ensure that daily support tasks are done in accordance with the defined SLA.",
'I am looking for an opportunity that would provide me with a chance to learn and enhance my skills in the Oracle Financials domain. I have 4+ years of experience in the domain and have worked with various clients. I have been working in the finance domain for 9+ years. I have worked in Oracle Apps Financials and have experience in Oracle Financials 11i, R12. I am also proficient in Financial Services • ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢',
"The incumbent would be responsible for testing and maintenance of the Transformers, BPCB's, Transformer, PCC, MCC, HV cables, LV cables with respect to the electrical and mechanical aspects.\n\nJob Requirements:\n- B.E. / B.Tech. (Electrical/Mechanical) with minimum 60% aggregate.\n- Minimum 2 years of experience in testing and maintenance of transformers, BPCB's, Transformer, PCC, HV cables, LV cables.\n- Knowledge of transformer ratio test, transformer vector group test, transformer magnetic balance test, transformer tripping protection command, etc.\n- Knowledge of working of electrical/mechanical systems and related components (like motors, starters, etc.)\n- Knowledge of electrical/mechanical maintenance of transformers etc.\n- Ability to check transformer/MCC/PCC/HV cables/LV cables for defects and to work on them to fix",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
validationEmbeddingSimilarityEvaluator| Metric | Value |
|---|---|
| pearson_cosine | 0.8837 |
| spearman_cosine | 0.8724 |
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
KEY SKILLS: ⢠Computerized accounting with tally ⢠Sincere & hard working ⢠Management accounting & income tax ⢠Good communication & leadership ⢠Two and four wheeler driving license ⢠Internet & Ecommerce management COMPUTER SKILLS: ⢠C Language ⢠Web programing ⢠Tally ⢠Dbms Education Details |
||
June 2017 to June 2019 Mba Finance/hr India Mlrit |
||
June 2014 to June 2017 Bcom Computer Hyderabad, Telangana Osmania university |
||
June 2012 to April 2014 Inter MEC India Srimedhav |
||
Hr |
||
Nani |
||
Skill Details |
||
accounting- Exprience - 6 months |
||
DATABASE MANAGEMENT SYSTEM- Exprience - 6 months |
||
Dbms- Exprience - 6 months |
||
Management accounting- Exprience - 6 months |
||
Ecommerce- Exprience - 6 monthsCompany Details |
||
company - Valuelabs |
||
description - They will give the RRF form the required DLT then the hand over to RLT then scrum master will take the form from the RLT then scrum master will give the forms to trainee which we can work on the requirement till the candidate rece... |
We are looking for a hardworking and self-motivated candidate who can implement strategies to maximize sales. Key responsibilities will include: |
0.5287528648371803 |
IT SKILLS ⢠Well versed with MS Office and Internet Applications and various ERP systems implemented in the company ie.SAGE, Flotilla, LM ERP, Tally 9, WMS, Exceed 4000 etc PERSONAL DOSSIER Permanent Address: Bandra West, Mumbai 400 050Education Details |
||
B.Com commerce Mumbai, Maharashtra Bombay University |
||
Mumbai, Maharashtra St. Andrews College |
||
DIM Business Management IGNOU |
||
Operations Manager |
||
Operations Manager - Landmark Insurance Brokers Pvt Ltd |
||
Skill Details |
||
EMPLOYEE RESOURCE GROUP- Exprience - 6 months |
||
ENTERPRISE RESOURCE PLANNING- Exprience - 6 months |
||
ERP- Exprience - 6 months |
||
MS OFFICE- Exprience - 6 months |
||
Tally- Exprience - 6 monthsCompany Details |
||
company - Landmark Insurance Brokers Pvt Ltd |
||
description - Jan 2019 till Date |
||
About the Company |
||
One of India Largest Insurance Brokerage firms with offices across 24 states PAN India and a part of the LandmarkGroup with an annual turnover of 2200 cr |
||
Position: Operations Manager |
||
Leading and overseeing a... |
⢠A company with a very strong reputation for a high performance culture and strong customer focus is looking to recruit talented and motivated individuals to work within the Customer Service Team. |
0.3646167498289064 |
TECHNICAL STRENGTHS Computer Language Java/J2EE, Swift, HTML, Shell script, MySQL Databases MySQL Tools SVN, Jenkins, Hudson, Weblogic12c Software Android Studio, Eclipse, Oracle, Xcode Operating Systems Win 10, Mac (High Sierra) Education Details |
||
June 2016 B.E. Information Technology Goregaon, MAHARASHTRA, IN Vidyalankar Institute of Technology |
||
May 2013 Mumbai, Maharashtra Thakur Polytechnic |
||
May 2010 Mumbai, Maharashtra St. John's Universal School |
||
Java developer |
||
Java developer - Tech Mahindra |
||
Skill Details |
||
JAVA- Exprience - 21 months |
||
MYSQL- Exprience - 21 months |
||
DATABASES- Exprience - 17 months |
||
J2EE- Exprience - 17 months |
||
ANDROID- Exprience - 6 monthsCompany Details |
||
company - Tech Mahindra |
||
description - Team Size: 5 |
||
Environment: Java, Mysql, Shell script. |
||
Webserver: Jenkins. |
||
Description: OR-Formatter is an application which takes the input file as Geneva Modified File GMF from Geneva server and reads the data to generate Bill backup and Bill Invoices for Clie... |
We are looking for a Java Developer to join our growing team. We will be looking for a highly skilled developer with experience in Java/J2EE, Shell script, HTML, MYSQL, Databases, Java Tools, Android, and iOS. |
0.5360567140232494 |
CosineSimilarityLoss with these parameters:{
"loss_fct": "torch.nn.modules.loss.MSELoss"
}
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | validation_spearman_cosine |
|---|---|---|
| 1.0 | 54 | 0.8040 |
| 1.8519 | 100 | 0.8637 |
| 2.0 | 108 | 0.8596 |
| 3.0 | 162 | 0.8724 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
sentence-transformers/all-mpnet-base-v2