SentenceTransformer based on ltg/norbert4-base

This is a sentence-transformers model finetuned from ltg/norbert4-base on the all-nli-norwegian dataset. It maps sentences & paragraphs to a 640-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: ltg/norbert4-base
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 640 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: no

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'GptBertModel'})
  (1): Pooling({'word_embedding_dimension': 640, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("thivy/norbert4-base-nli-norwegian")
# Run inference
sentences = [
    'En mann lager et sandmaleri på gulvet.',
    'En mann lager kunst.',
    'En kvinne ødelegger et sandmaleri.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 640]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6251, 0.2931],
#         [0.6251, 1.0000, 0.1305],
#         [0.2931, 0.1305, 1.0000]])

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9547

Training Details

Training Dataset

all-nli-norwegian

  • Dataset: all-nli-norwegian at 98cabde
  • Size: 556,367 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 9.53 tokens
    • max: 47 tokens
    • min: 5 tokens
    • mean: 12.03 tokens
    • max: 40 tokens
    • min: 5 tokens
    • mean: 12.7 tokens
    • max: 49 tokens
  • Samples:
    anchor positive negative
    En person på en hest hopper over et havarert fly. En person er utendørs, på en hest. En person er på en diner og bestiller en omelett.
    Barn smiler og vinker til kameraet Det er barn til stede Barna rynker pannen
    En gutt hopper på skateboard midt på en rød bro. Gutten gjør et skateboardtriks. Gutten skater nedover fortauet.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

all-nli-norwegian

  • Dataset: all-nli-norwegian at 98cabde
  • Size: 6,561 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 5 tokens
    • mean: 17.72 tokens
    • max: 74 tokens
    • min: 4 tokens
    • mean: 8.98 tokens
    • max: 31 tokens
    • min: 3 tokens
    • mean: 9.5 tokens
    • max: 29 tokens
  • Samples:
    anchor positive negative
    To kvinner klemmer mens de holder take-away pakker. To kvinner holder pakker. Mennene slåss utenfor en deli.
    To små barn i blå drakter, en med nummer 9 og en med nummer 2, står på trinn i et bad og vasker hendene i en vask. To barn i nummererte drakter vasker hendene. To barn i jakker går til skolen.
    En mann selger donuts til en kunde under et verdensutstillingsarrangement holdt i byen Angeles En mann selger donuts til en kunde. En kvinne drikker kaffen sin på en liten kafé.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 64
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss eval_cosine_accuracy
0.0058 100 4.0493 - -
0.0115 200 3.0097 - -
0.0173 300 1.4324 - -
0.0230 400 1.0791 - -
0.0288 500 0.8985 0.7151 0.8682
0.0345 600 0.7899 - -
0.0403 700 0.7379 - -
0.0460 800 0.7333 - -
0.0518 900 0.6676 - -
0.0575 1000 0.6593 0.4987 0.9137
0.0633 1100 0.6162 - -
0.0690 1200 0.6153 - -
0.0748 1300 0.5763 - -
0.0805 1400 0.6055 - -
0.0863 1500 0.5504 0.4496 0.9207
0.0920 1600 0.5622 - -
0.0978 1700 0.5484 - -
0.1035 1800 0.5263 - -
0.1093 1900 0.5789 - -
0.1150 2000 0.5462 0.4225 0.9273
0.1208 2100 0.5521 - -
0.1265 2200 0.5368 - -
0.1323 2300 0.5079 - -
0.1380 2400 0.5437 - -
0.1438 2500 0.5123 0.4020 0.9346
0.1495 2600 0.4835 - -
0.1553 2700 0.473 - -
0.1610 2800 0.4957 - -
0.1668 2900 0.4935 - -
0.1725 3000 0.4894 0.3775 0.9383
0.1783 3100 0.4894 - -
0.1840 3200 0.5203 - -
0.1898 3300 0.4907 - -
0.1955 3400 0.464 - -
0.2013 3500 0.461 0.3808 0.9387
0.2071 3600 0.4486 - -
0.2128 3700 0.4753 - -
0.2186 3800 0.4591 - -
0.2243 3900 0.4496 - -
0.2301 4000 0.428 0.3680 0.9383
0.2358 4100 0.433 - -
0.2416 4200 0.4525 - -
0.2473 4300 0.4119 - -
0.2531 4400 0.4335 - -
0.2588 4500 0.4378 0.3586 0.9407
0.2646 4600 0.4073 - -
0.2703 4700 0.3997 - -
0.2761 4800 0.381 - -
0.2818 4900 0.4064 - -
0.2876 5000 0.4211 0.3577 0.9438
0.2933 5100 0.4338 - -
0.2991 5200 0.3951 - -
0.3048 5300 0.3813 - -
0.3106 5400 0.4165 - -
0.3163 5500 0.405 0.3464 0.9428
0.3221 5600 0.395 - -
0.3278 5700 0.3869 - -
0.3336 5800 0.3758 - -
0.3393 5900 0.4021 - -
0.3451 6000 0.374 0.3511 0.9460
0.3508 6100 0.3696 - -
0.3566 6200 0.377 - -
0.3623 6300 0.37 - -
0.3681 6400 0.3584 - -
0.3738 6500 0.3485 0.3399 0.9470
0.3796 6600 0.3841 - -
0.3853 6700 0.3674 - -
0.3911 6800 0.3843 - -
0.3968 6900 0.3753 - -
0.4026 7000 0.3533 0.3435 0.9448
0.4084 7100 0.3577 - -
0.4141 7200 0.3442 - -
0.4199 7300 0.3539 - -
0.4256 7400 0.3723 - -
0.4314 7500 0.3666 0.3383 0.9456
0.4371 7600 0.3644 - -
0.4429 7700 0.3644 - -
0.4486 7800 0.3474 - -
0.4544 7900 0.3538 - -
0.4601 8000 0.3733 0.3316 0.9508
0.4659 8100 0.3587 - -
0.4716 8200 0.347 - -
0.4774 8300 0.3809 - -
0.4831 8400 0.3222 - -
0.4889 8500 0.3408 0.3281 0.9492
0.4946 8600 0.3345 - -
0.5004 8700 0.3492 - -
0.5061 8800 0.3311 - -
0.5119 8900 0.3576 - -
0.5176 9000 0.3377 0.3215 0.9488
0.5234 9100 0.3405 - -
0.5291 9200 0.3243 - -
0.5349 9300 0.351 - -
0.5406 9400 0.3547 - -
0.5464 9500 0.3438 0.3241 0.9500
0.5521 9600 0.3384 - -
0.5579 9700 0.3306 - -
0.5636 9800 0.353 - -
0.5694 9900 0.299 - -
0.5751 10000 0.3064 0.3173 0.9509
0.5809 10100 0.3292 - -
0.5866 10200 0.292 - -
0.5924 10300 0.3599 - -
0.5981 10400 0.3271 - -
0.6039 10500 0.3002 0.3225 0.9492
0.6097 10600 0.3455 - -
0.6154 10700 0.2981 - -
0.6212 10800 0.3255 - -
0.6269 10900 0.3 - -
0.6327 11000 0.304 0.3170 0.9512
0.6384 11100 0.3136 - -
0.6442 11200 0.3348 - -
0.6499 11300 0.3255 - -
0.6557 11400 0.3101 - -
0.6614 11500 0.314 0.3149 0.9500
0.6672 11600 0.3157 - -
0.6729 11700 0.3149 - -
0.6787 11800 0.2966 - -
0.6844 11900 0.3145 - -
0.6902 12000 0.2928 0.3075 0.9532
0.6959 12100 0.3035 - -
0.7017 12200 0.3142 - -
0.7074 12300 0.3289 - -
0.7132 12400 0.3046 - -
0.7189 12500 0.311 0.3103 0.9529
0.7247 12600 0.2942 - -
0.7304 12700 0.295 - -
0.7362 12800 0.2802 - -
0.7419 12900 0.3258 - -
0.7477 13000 0.28 0.3027 0.9518
0.7534 13100 0.2887 - -
0.7592 13200 0.2729 - -
0.7649 13300 0.2936 - -
0.7707 13400 0.2883 - -
0.7764 13500 0.2972 0.3048 0.9549
0.7822 13600 0.2806 - -
0.7879 13700 0.2851 - -
0.7937 13800 0.3097 - -
0.7994 13900 0.2663 - -
0.8052 14000 0.2743 0.3004 0.9529
0.8110 14100 0.2911 - -
0.8167 14200 0.2955 - -
0.8225 14300 0.2892 - -
0.8282 14400 0.2796 - -
0.8340 14500 0.2674 0.3000 0.9528
0.8397 14600 0.2604 - -
0.8455 14700 0.2816 - -
0.8512 14800 0.2711 - -
0.8570 14900 0.2897 - -
0.8627 15000 0.2495 0.3008 0.9544
0.8685 15100 0.3126 - -
0.8742 15200 0.3151 - -
0.8800 15300 0.2664 - -
0.8857 15400 0.2884 - -
0.8915 15500 0.263 0.2984 0.9552
0.8972 15600 0.2733 - -
0.9030 15700 0.2755 - -
0.9087 15800 0.2818 - -
0.9145 15900 0.2853 - -
0.9202 16000 0.2742 0.2980 0.9544
0.9260 16100 0.269 - -
0.9317 16200 0.257 - -
0.9375 16300 0.2637 - -
0.9432 16400 0.2752 - -
0.9490 16500 0.2719 0.2971 0.9546
0.9547 16600 0.282 - -
0.9605 16700 0.2461 - -
0.9662 16800 0.2673 - -
0.9720 16900 0.2646 - -
0.9777 17000 0.2665 0.2960 0.9547
0.9835 17100 0.258 - -
0.9892 17200 0.2562 - -
0.9950 17300 0.2511 - -
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.2.0
  • Transformers: 4.57.3
  • PyTorch: 2.9.1
  • Accelerate: 1.12.0
  • Datasets: 4.4.2
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
26
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for thivy/norbert4-base-nli-norwegian

Base model

ltg/norbert4-base
Finetuned
(2)
this model

Dataset used to train thivy/norbert4-base-nli-norwegian

Papers for thivy/norbert4-base-nli-norwegian

Evaluation results