SentenceTransformer based on CALDISS-AAU/DA-BERT_Old_News_V3

This is a sentence-transformers model finetuned from CALDISS-AAU/DA-BERT_Old_News_V3. It is the first version of a sentence transformer designed to embed texts based on the key actions undertaken in the texts. It is trained to support the project "Run Away" at Aalborg University. It has been trained on sentences from runaway advertisements that have been tagged based on key verbs, grouped into larger clusters of verbs. The key idea is to foreground action.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: CALDISS-AAU/DA-BERT_Old_News_V3
  • Maximum Sequence Length: 514 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 514, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("JohanHeinsen/Run_Away_Action_Embedding_model_v0.1")
# Run inference
sentences = [
    'saa finde vi os beføyet, saavel for at besørge de publiqve Midlers Stkkerhed, som i Anledning af den Nød og Armod hans Undvigelse har foraarsaget hans efterladte Kone og smaa Børn her, at lade ham herved eftelyse og indkalde til at indfinde sig her Byen',
    'thi anmodes enhver som træffer denne Karl at paagribe ham og imod de sædvanlige Indbringerpenge at levere ham til Compagniet.',
    'men ingen veed om ham at sige.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.8932
spearman_cosine 0.838

Training Details

Training Dataset

Unnamed Dataset

  • Size: 83,246 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string int
    details
    • min: 6 tokens
    • mean: 28.53 tokens
    • max: 176 tokens
    • min: 4 tokens
    • mean: 30.7 tokens
    • max: 514 tokens
    • 0: ~47.20%
    • 1: ~52.80%
  • Samples:
    sentence_0 sentence_1 label
    Han er født i Kjøbenhavn, er 33 Aar gammel, har staaet ved Kongens Regiment som Tambour i 9 Aar, og gaaer derfor skjæbt paa det venstre Bern har tjent Hr. Fabriker Schrøder, og paa Maderup gaard som Gaardskarl, en kort Tid. som tiener paa Hiørnet af gl. Mynt og Svertegaden 1
    efterlyses han herved med Anmodning, at han, af hvem han skulde forekomme, maatte anholdes og mig derom meddeles Underretning Da min Lærredreng Friderich Senggrav er den 12 Aug. undvigt af sin Lære 0
    give det tilkiende hos A. J. Schottlænder, boende i Pilestræde Nr. 11 Litr. Dførste Sal. Niels Pedersen Eistrup, som i 4 Aar har tjent hos mig for Under-Knegt, er fra mig undvigt den 24 December uden at giøre Rigtighed for det ham var betroet 0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 2
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss validation_spearman_cosine
0.0961 500 0.1933 -
0.1922 1000 0.1191 -
0.2883 1500 0.1016 -
0.3844 2000 0.0914 -
0.4805 2500 0.0841 -
0.5766 3000 0.079 -
0.6727 3500 0.0757 -
0.7688 4000 0.0732 -
0.8649 4500 0.0686 -
0.9610 5000 0.0665 -
1.0 5203 - 0.8284
1.0571 5500 0.0627 -
1.1532 6000 0.0548 -
1.2493 6500 0.0557 -
1.3454 7000 0.0562 -
1.4415 7500 0.0557 -
1.5376 8000 0.0527 -
1.6337 8500 0.0504 -
1.7298 9000 0.0539 -
1.8259 9500 0.054 -
1.9220 10000 0.0501 -
2.0 10406 - 0.8380

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.7.0
  • Accelerate: 1.6.0
  • Datasets: 2.19.2
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
17
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JohanHeinsen/Run_Away_Action_Embedding_model_v0.1

Base model

vesteinn/DanskBERT
Finetuned
(2)
this model

Paper for JohanHeinsen/Run_Away_Action_Embedding_model_v0.1

Evaluation results