Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
12
This is a sentence-transformers model finetuned from EuroBERT/EuroBERT-210m on the matching_rh_train10 dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'EuroBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("gguichard/matching-rh-peft3")
# Run inference
sentences = [
'{"type": "opportunity", "customer_code": "", "opportunity_title": ".NET Developer", "opportunity_place": "", "opportunity_expertise_area": "Autres", "opportunity_tools": "", "opportunity_activity_area": "", "opportunity_type": "1", "opportunity_description": ".NET\\nReact", "opportunity_criteria": "", "opportunity_extract": 1}',
'{"type": "candidate", "customer_code": "", "title": "Agile Back end Developer", "skills": "", "education": "", "experience": "-1", "tools": "", "languages": "", "mobility": "", "expertise_area": "", "activity_area": "", "list_diplomes": "", "typeOf": "0", "source": "", "informationComments": "", "extract": 1, "experiences": "[]"}',
'{"type": "candidate", "customer_code": "", "title": "Consultant Data", "skills": "", "education": "", "experience": "-1", "tools": "", "languages": "", "mobility": "mondeeuropefrancerhonealpes", "expertise_area": "", "activity_area": "", "list_diplomes": "", "typeOf": "-1", "source": "3", "informationComments": "pas à l\'écoute", "extract": 1, "experiences": "[]"}',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8869, 0.1913],
# [0.8869, 1.0000, 0.2530],
# [0.1913, 0.2530, 1.0000]])
label, sentence1, and sentence2| label | sentence1 | sentence2 | |
|---|---|---|---|
| type | float | string | string |
| details |
|
|
|
| label | sentence1 | sentence2 |
|---|---|---|
1.0 |
{"type": "opportunity", "customer_code": "", "opportunity_title": "SIENNA - DEV DOT NET", "opportunity_place": "", "opportunity_expertise_area": "Banque", "opportunity_tools": "", "opportunity_activity_area": "", "opportunity_type": "1", "opportunity_description": "", "opportunity_criteria": "", "opportunity_extract": 1} |
{"type": "candidate", "customer_code": "", "title": "Consultant Sénior Microsoft .NET", "skills": "", "education": "", "experience": "-1", "tools": "", "languages": "", "mobility": "", "expertise_area": "", "activity_area": "", "list_diplomes": "2007 - Master Management des projets informatiques et systèmes d'information, 2004 - Filière Informatique et Réseaux - ENSICAEN", "typeOf": "1", "source": "1", "informationComments": "", "extract": 1, "experiences": "[{'skills': '', 'startMonth': '6', 'endDate': '', 'startYear': '2004', 'description': 'AUTRES MISSIONS\nA\nIngénieur Conception et développement CALCIA\nAnalyste - Responsable d’applications chez EDF\nIngénieur Conception et Développement chez EDF\nIngénieur Conception et Développement chez BNPPARIBAS', 'company': 'AUTRES MISSIONS', 'location': '', 'id': '2536', 'title': 'Ingénieur Conception et développement', 'endMonth': '11', 'endYear': '2008', 'startDate': ''}, {'skills': '.net, .net 2.0, asp.net, c#, front office, gamaweb... |
1.0 |
{"type": "opportunity", "customer_code": "", "opportunity_title": "Consultant Mainframe - DGFIP - ONEPOINT", "opportunity_place": "", "opportunity_expertise_area": "Autres", "opportunity_tools": "", "opportunity_activity_area": "", "opportunity_type": "1", "opportunity_description": "", "opportunity_criteria": "", "opportunity_extract": 1} |
{"type": "candidate", "customer_code": "", "title": "Ingénieur de développement\nPACBASE/COBOL/MAINFRAME\n2 ans et ½ d’expérience", "skills": "", "education": "", "experience": "-1", "tools": "", "languages": "français, anglais", "mobility": "mondeeuropefranceiledefranceparis, mondeeuropefranceiledefranceseineetmarne, mondeeuropefranceiledefranceyvelines, mondeeuropefranceiledefranceessone, mondeeuropefranceiledefrancehautsdeseine92, mondeeuropefranceiledefranceseinesaintdenis, mondeeuropefranceiledefrancevaldemarne, mondeeuropefranceiledefrancevaloise", "expertise_area": "", "activity_area": "", "list_diplomes": "2018 - Formation PACBASE - Banque Populaire Dijon, 2018 - Formation Cobol en alternance appliqué au contexte Descours & Cabaud - Alteca Lyon et Informatique, 2018 - Formation interne VBA EXCEL, 2018 - Formation Mainframe IBM/COBOL et Qualification logiciel - INTI Formation, 2016 - Master international Science de la matière - Université de Rouen", "typeOf": "1", "source": "3",... |
1.0 |
{"type": "opportunity", "customer_code": "", "opportunity_title": "STIME responsable application adjoint", "opportunity_place": "", "opportunity_expertise_area": "Grande distribution", "opportunity_tools": "", "opportunity_activity_area": "", "opportunity_type": "1", "opportunity_description": "", "opportunity_criteria": "", "opportunity_extract": 1} |
{"type": "candidate", "customer_code": "", "title": "Consultant AMOA- Chef de projet SI", "skills": "", "education": "", "experience": "-1", "tools": "", "languages": "anglais, espagnol", "mobility": "", "expertise_area": "", "activity_area": "", "list_diplomes": "2020 - CERTYOU Paris, 2019 - Certification SCRUM Master - Actinuum Paris, 2017 - Cycle Project Management Professional V5 PMP, 2015 - Urbanisation et architecture SI, 2014 - ITIL Fondation", "typeOf": "1", "source": "1", "informationComments": "", "extract": 1, "experiences": "[{'skills': 'crm, oracle parties, mep, dba, infrastructure, crm people soft, uml, power amc, sql query, oracle, hp quality', 'startMonth': '4', 'endDate': '', 'startYear': '2007', 'description': 'INWI\nà\nSynthèse :\nParticipation à la mise en place du CRM pepoleSoft Oracle parties : vue 360°\nclient , facture et réclamations.\nRôle :\nConsultant AMOA homologation\nRéalisation :\n\uf0b7\nCollecte de besoin métier.\n\uf0b7\nRédaction de spéc... |
CosineSimilarityLoss with these parameters:{
"loss_fct": "torch.nn.modules.loss.MSELoss"
}
label, sentence1, and sentence2| label | sentence1 | sentence2 | |
|---|---|---|---|
| type | float | string | string |
| details |
|
|
|
| label | sentence1 | sentence2 |
|---|---|---|
1.0 |
{"type": "opportunity", "customer_code": "", "opportunity_title": "DATA MANAGER - La POSTE", "opportunity_place": "", "opportunity_expertise_area": "Services", "opportunity_tools": "", "opportunity_activity_area": "", "opportunity_type": "1", "opportunity_description": "", "opportunity_criteria": "", "opportunity_extract": 1} |
{"type": "candidate", "customer_code": "", "title": "Senior Consultant/Project Manager - Data Management", "skills": "", "education": "", "experience": "-1", "tools": "", "languages": "", "mobility": "", "expertise_area": "", "activity_area": "", "list_diplomes": "BACHELOR - Mathématiques Appliquées - stratégique Université Paris I Panthéon Sorbonne, DEUG - Option Statistique - stratégique Université Paris I Panthéon Sorbonne", "typeOf": "-1", "source": "1", "informationComments": "adresse perso consultant : 99 rue Alfred DININ 92000 Nanterre", "extract": 1, "experiences": "[{'skills': '', 'startMonth': '', 'endDate': '', 'startYear': '', 'description': "Avril ❖Mission : * Automatisation et fiabilisation des calculs de l'inventaire de réassurance sur les produits de prévoyance individuelle commercialisés par les partenaires d'Axa France (SAS/SQL) * Etude de l'efficience et de la rentabilité des traités de réassurance mis en place pour sécuriser le portefeuille de ces produits (SAS/C++... |
1.0 |
{"type": "opportunity", "customer_code": "", "opportunity_title": "BABILOU - Responsable infra", "opportunity_place": "", "opportunity_expertise_area": "Autres", "opportunity_tools": "", "opportunity_activity_area": "", "opportunity_type": "1", "opportunity_description": "", "opportunity_criteria": "", "opportunity_extract": 1} |
{"type": "candidate", "customer_code": "", "title": "CHEF DE PROJET INFRASTRUCTURE", "skills": "", "education": "", "experience": "-1", "tools": "", "languages": "", "mobility": "", "expertise_area": "", "activity_area": "", "list_diplomes": "2020 - Microsoft Azure Artificial Intelligence - Microsoft Azure Fundamentals, 2014 - DEA - Probabilités et Applications - Université, 2003 - Diplôme d'ingénieur - Télécoms ENST ParisTech, 2003 - DEA - Signal et Communications Numériques - Université de Nice Sophia-Antipolis", "typeOf": "-1", "source": "1", "informationComments": "", "extract": 1, "experiences": "[{'skills': '', 'startMonth': '', 'endDate': '', 'startYear': '', 'description': '23 mois Études, architecture, ingénierie et paramétrage des réseaux de signalisation et de transit', 'company': '', 'location': '', 'id': '1947', 'title': 'Ingénieur accès fixe et mobile - Contexte - 01/10/2005 - 01/08/2007', 'endMonth': '', 'endYear': '', 'startDate': ''}, {'skills': '', 'startMonth': '', '... |
1.0 |
{"type": "opportunity", "customer_code": "", "opportunity_title": "DGFIP - ONEPOINT - Consultant JCL", "opportunity_place": "", "opportunity_expertise_area": "Autres", "opportunity_tools": "", "opportunity_activity_area": "", "opportunity_type": "1", "opportunity_description": "", "opportunity_criteria": "", "opportunity_extract": 1} |
{"type": "candidate", "customer_code": "", "title": "analyste developpeur pacbase cobol db2", "skills": "cobol, pacbase, db2, cics", "education": "", "experience": "-1", "tools": "", "languages": "", "mobility": "mondeeuropefranceiledefranceparis, mondeeuropefranceiledefranceseineetmarne, mondeeuropefranceiledefranceyvelines, mondeeuropefranceiledefranceessone, mondeeuropefranceiledefrancehautsdeseine92, mondeeuropefranceiledefranceseinesaintdenis, mondeeuropefranceiledefrancevaldemarne, mondeeuropefranceiledefrancevaloise", "expertise_area": "", "activity_area": "", "list_diplomes": "", "typeOf": "0", "source": "", "informationComments": "Sabrina Kadrie\n06 83 65 01 64\nsabrina20@orange.fr", "extract": 1, "experiences": "[]"} |
CosineSimilarityLoss with these parameters:{
"loss_fct": "torch.nn.modules.loss.MSELoss"
}
eval_strategy: stepsper_device_train_batch_size: 4per_device_eval_batch_size: 4learning_rate: 2e-05num_train_epochs: 1warmup_ratio: 0.1log_level: errorlog_level_replica: passivelog_on_each_node: Falselogging_nan_inf_filter: Falsebf16: Trueoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 4per_device_eval_batch_size: 4per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: errorlog_level_replica: passivelog_on_each_node: Falselogging_nan_inf_filter: Falsesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | Validation Loss |
|---|---|---|---|
| 0.0067 | 500 | 0.2078 | - |
| 0.0134 | 1000 | 0.1805 | - |
| 0.0202 | 1500 | 0.1644 | - |
| 0.0269 | 2000 | 0.1455 | - |
| 0.0336 | 2500 | 0.1326 | - |
| 0.0403 | 3000 | 0.132 | 0.1514 |
| 0.0471 | 3500 | 0.1292 | - |
| 0.0538 | 4000 | 0.1199 | - |
| 0.0605 | 4500 | 0.1223 | - |
| 0.0672 | 5000 | 0.1219 | - |
| 0.0740 | 5500 | 0.1116 | - |
| 0.0807 | 6000 | 0.1149 | 0.1483 |
| 0.0874 | 6500 | 0.1149 | - |
| 0.0941 | 7000 | 0.1243 | - |
| 0.1009 | 7500 | 0.1204 | - |
| 0.1076 | 8000 | 0.1116 | - |
| 0.1143 | 8500 | 0.109 | - |
| 0.1210 | 9000 | 0.111 | 0.1289 |
| 0.1278 | 9500 | 0.1168 | - |
| 0.1345 | 10000 | 0.1121 | - |
| 0.1412 | 10500 | 0.1054 | - |
| 0.1479 | 11000 | 0.1031 | - |
| 0.1547 | 11500 | 0.0994 | - |
| 0.1614 | 12000 | 0.0968 | 0.1204 |
| 0.1681 | 12500 | 0.0932 | - |
| 0.1748 | 13000 | 0.0978 | - |
| 0.1816 | 13500 | 0.0996 | - |
| 0.1883 | 14000 | 0.0974 | - |
| 0.1950 | 14500 | 0.095 | - |
| 0.2017 | 15000 | 0.0926 | 0.1139 |
| 0.2085 | 15500 | 0.0928 | - |
| 0.2152 | 16000 | 0.1007 | - |
| 0.2219 | 16500 | 0.0933 | - |
| 0.2286 | 17000 | 0.0903 | - |
| 0.2354 | 17500 | 0.0912 | - |
| 0.2421 | 18000 | 0.0927 | 0.1124 |
| 0.2488 | 18500 | 0.0927 | - |
| 0.2555 | 19000 | 0.1001 | - |
| 0.2623 | 19500 | 0.0951 | - |
| 0.2690 | 20000 | 0.0893 | - |
| 0.2757 | 20500 | 0.0874 | - |
| 0.2824 | 21000 | 0.0854 | 0.1100 |
| 0.2892 | 21500 | 0.0905 | - |
| 0.2959 | 22000 | 0.0858 | - |
| 0.3026 | 22500 | 0.0906 | - |
| 0.3093 | 23000 | 0.0899 | - |
| 0.3161 | 23500 | 0.0861 | - |
| 0.3228 | 24000 | 0.0934 | 0.1063 |
| 0.3295 | 24500 | 0.0995 | - |
| 0.3362 | 25000 | 0.0905 | - |
| 0.3430 | 25500 | 0.0875 | - |
| 0.3497 | 26000 | 0.074 | - |
| 0.3564 | 26500 | 0.0875 | - |
| 0.3631 | 27000 | 0.0821 | 0.1043 |
| 0.3699 | 27500 | 0.0877 | - |
| 0.3766 | 28000 | 0.0837 | - |
| 0.3833 | 28500 | 0.0854 | - |
| 0.3900 | 29000 | 0.0754 | - |
| 0.3968 | 29500 | 0.0803 | - |
| 0.4035 | 30000 | 0.0872 | 0.1029 |
| 0.4102 | 30500 | 0.0829 | - |
| 0.4169 | 31000 | 0.0841 | - |
| 0.4237 | 31500 | 0.0861 | - |
| 0.4304 | 32000 | 0.0827 | - |
| 0.4371 | 32500 | 0.0867 | - |
| 0.4438 | 33000 | 0.0808 | 0.1028 |
| 0.4506 | 33500 | 0.081 | - |
| 0.4573 | 34000 | 0.0789 | - |
| 0.4640 | 34500 | 0.0774 | - |
| 0.4707 | 35000 | 0.084 | - |
| 0.4775 | 35500 | 0.0866 | - |
| 0.4842 | 36000 | 0.0839 | 0.1010 |
| 0.4909 | 36500 | 0.0849 | - |
| 0.4976 | 37000 | 0.0834 | - |
| 0.5044 | 37500 | 0.0832 | - |
| 0.5111 | 38000 | 0.0739 | - |
| 0.5178 | 38500 | 0.077 | - |
| 0.5245 | 39000 | 0.0799 | 0.1016 |
| 0.5313 | 39500 | 0.0775 | - |
| 0.5380 | 40000 | 0.0788 | - |
| 0.5447 | 40500 | 0.0821 | - |
| 0.5514 | 41000 | 0.0796 | - |
| 0.5582 | 41500 | 0.0795 | - |
| 0.5649 | 42000 | 0.0836 | 0.0976 |
| 0.5716 | 42500 | 0.0783 | - |
| 0.5783 | 43000 | 0.082 | - |
| 0.5851 | 43500 | 0.0788 | - |
| 0.5918 | 44000 | 0.0849 | - |
| 0.5985 | 44500 | 0.0754 | - |
| 0.6052 | 45000 | 0.0764 | 0.0989 |
| 0.6120 | 45500 | 0.0736 | - |
| 0.6187 | 46000 | 0.0805 | - |
| 0.6254 | 46500 | 0.0788 | - |
| 0.6321 | 47000 | 0.0724 | - |
| 0.6389 | 47500 | 0.0833 | - |
| 0.6456 | 48000 | 0.0752 | 0.0972 |
| 0.6523 | 48500 | 0.0733 | - |
| 0.6590 | 49000 | 0.0686 | - |
| 0.6658 | 49500 | 0.0802 | - |
| 0.6725 | 50000 | 0.0817 | - |
| 0.6792 | 50500 | 0.0772 | - |
| 0.6859 | 51000 | 0.0746 | 0.0958 |
| 0.6927 | 51500 | 0.0742 | - |
| 0.6994 | 52000 | 0.0732 | - |
| 0.7061 | 52500 | 0.0711 | - |
| 0.7128 | 53000 | 0.0773 | - |
| 0.7196 | 53500 | 0.0782 | - |
| 0.7263 | 54000 | 0.0774 | 0.0953 |
| 0.7330 | 54500 | 0.0788 | - |
| 0.7397 | 55000 | 0.0667 | - |
| 0.7465 | 55500 | 0.0721 | - |
| 0.7532 | 56000 | 0.074 | - |
| 0.7599 | 56500 | 0.0698 | - |
| 0.7666 | 57000 | 0.0703 | 0.0948 |
| 0.7734 | 57500 | 0.0718 | - |
| 0.7801 | 58000 | 0.0764 | - |
| 0.7868 | 58500 | 0.078 | - |
| 0.7935 | 59000 | 0.0784 | - |
| 0.8003 | 59500 | 0.0771 | - |
| 0.8070 | 60000 | 0.0766 | 0.0937 |
| 0.8137 | 60500 | 0.0758 | - |
| 0.8204 | 61000 | 0.0747 | - |
| 0.8272 | 61500 | 0.0814 | - |
| 0.8339 | 62000 | 0.0719 | - |
| 0.8406 | 62500 | 0.067 | - |
| 0.8473 | 63000 | 0.0717 | 0.0937 |
| 0.8541 | 63500 | 0.0732 | - |
| 0.8608 | 64000 | 0.0755 | - |
| 0.8675 | 64500 | 0.0749 | - |
| 0.8742 | 65000 | 0.072 | - |
| 0.8810 | 65500 | 0.071 | - |
| 0.8877 | 66000 | 0.0702 | 0.0923 |
| 0.8944 | 66500 | 0.0676 | - |
| 0.9011 | 67000 | 0.0753 | - |
| 0.9079 | 67500 | 0.0734 | - |
| 0.9146 | 68000 | 0.0654 | - |
| 0.9213 | 68500 | 0.073 | - |
| 0.9280 | 69000 | 0.0703 | 0.0922 |
| 0.9348 | 69500 | 0.07 | - |
| 0.9415 | 70000 | 0.0716 | - |
| 0.9482 | 70500 | 0.0811 | - |
| 0.9549 | 71000 | 0.0722 | - |
| 0.9617 | 71500 | 0.0697 | - |
| 0.9684 | 72000 | 0.0746 | 0.0915 |
| 0.9751 | 72500 | 0.0768 | - |
| 0.9818 | 73000 | 0.0691 | - |
| 0.9886 | 73500 | 0.0718 | - |
| 0.9953 | 74000 | 0.0707 | - |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
EuroBERT/EuroBERT-210m