Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
10
This is a sentence-transformers model finetuned from CALDISS-AAU/DA-BERT_Old_News_V3. It is the first version of a sentence transformer designed to embed texts based on the key actions undertaken in the texts. It is trained to support the project "Run Away" at Aalborg University. It has been trained on sentences from runaway advertisements that have been tagged based on key verbs, grouped into larger clusters of verbs. The key idea is to foreground action.
SentenceTransformer(
(0): Transformer({'max_seq_length': 514, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("JohanHeinsen/Run_Away_Action_Embedding_model_v0.1")
# Run inference
sentences = [
'saa finde vi os beføyet, saavel for at besørge de publiqve Midlers Stkkerhed, som i Anledning af den Nød og Armod hans Undvigelse har foraarsaget hans efterladte Kone og smaa Børn her, at lade ham herved eftelyse og indkalde til at indfinde sig her Byen',
'thi anmodes enhver som træffer denne Karl at paagribe ham og imod de sædvanlige Indbringerpenge at levere ham til Compagniet.',
'men ingen veed om ham at sige.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
validationEmbeddingSimilarityEvaluator| Metric | Value |
|---|---|
| pearson_cosine | 0.8932 |
| spearman_cosine | 0.838 |
sentence_0, sentence_1, and label| sentence_0 | sentence_1 | label | |
|---|---|---|---|
| type | string | string | int |
| details |
|
|
|
| sentence_0 | sentence_1 | label |
|---|---|---|
Han er født i Kjøbenhavn, er 33 Aar gammel, har staaet ved Kongens Regiment som Tambour i 9 Aar, og gaaer derfor skjæbt paa det venstre Bern har tjent Hr. Fabriker Schrøder, og paa Maderup gaard som Gaardskarl, en kort Tid. |
som tiener paa Hiørnet af gl. Mynt og Svertegaden |
1 |
efterlyses han herved med Anmodning, at han, af hvem han skulde forekomme, maatte anholdes og mig derom meddeles Underretning |
Da min Lærredreng Friderich Senggrav er den 12 Aug. undvigt af sin Lære |
0 |
give det tilkiende hos A. J. Schottlænder, boende i Pilestræde Nr. 11 Litr. Dførste Sal. |
Niels Pedersen Eistrup, som i 4 Aar har tjent hos mig for Under-Knegt, er fra mig undvigt den 24 December uden at giøre Rigtighed for det ham var betroet |
0 |
CosineSimilarityLoss with these parameters:{
"loss_fct": "torch.nn.modules.loss.MSELoss"
}
per_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 2multi_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 2max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | Training Loss | validation_spearman_cosine |
|---|---|---|---|
| 0.0961 | 500 | 0.1933 | - |
| 0.1922 | 1000 | 0.1191 | - |
| 0.2883 | 1500 | 0.1016 | - |
| 0.3844 | 2000 | 0.0914 | - |
| 0.4805 | 2500 | 0.0841 | - |
| 0.5766 | 3000 | 0.079 | - |
| 0.6727 | 3500 | 0.0757 | - |
| 0.7688 | 4000 | 0.0732 | - |
| 0.8649 | 4500 | 0.0686 | - |
| 0.9610 | 5000 | 0.0665 | - |
| 1.0 | 5203 | - | 0.8284 |
| 1.0571 | 5500 | 0.0627 | - |
| 1.1532 | 6000 | 0.0548 | - |
| 1.2493 | 6500 | 0.0557 | - |
| 1.3454 | 7000 | 0.0562 | - |
| 1.4415 | 7500 | 0.0557 | - |
| 1.5376 | 8000 | 0.0527 | - |
| 1.6337 | 8500 | 0.0504 | - |
| 1.7298 | 9000 | 0.0539 | - |
| 1.8259 | 9500 | 0.054 | - |
| 1.9220 | 10000 | 0.0501 | - |
| 2.0 | 10406 | - | 0.8380 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}