modernbert-embed-base

This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nomic-ai/modernbert-embed-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Who may carry out the monitoring of compliance with a code of conduct according to Article 40?',
    '1.Without prejudice to the tasks and powers of the competent supervisory authority under Articles 57 and 58, the monitoring of compliance with a code of conduct pursuant to Article 40 may be carried out by a body which has an appropriate level of expertise in relation to the subject-matter of the code and is accredited for that purpose by the competent supervisory authority.\n2.A body as referred to in paragraph 1 may be accredited to monitor compliance with a code of conduct where that body has: (a)  demonstrated its independence and expertise in relation to the subject-matter of the code to the satisfaction of the competent supervisory authority; (b)  established procedures which allow it to assess the eligibility of controllers and processors concerned to apply the code, to monitor their compliance with its provisions and to periodically review its operation; (c)  established procedures and structures to handle complaints about infringements of the code or the manner in which the code has been, or is being, implemented by a controller or processor, and to make those procedures and structures transparent to data subjects and the public; and (d)  demonstrated to the satisfaction of the competent supervisory authority that its tasks and duties do not result in a conflict of interests.\n3.The competent supervisory authority shall submit the draft criteria for accreditation of a body as referred to in paragraph 1 of this Article to the Board pursuant to the consistency mechanism referred to in Article 63\n4.Without prejudice to the tasks and powers of the competent supervisory authority and the provisions of Chapter VIII, a body as referred to in paragraph 1 of this Article shall, subject to appropriate safeguards, take appropriate action in cases of infringement of the code by a controller or processor, including suspension or exclusion of the controller or processor concerned from the code. It shall inform the competent supervisory authority of such actions and the reasons for taking them.\n5.The competent supervisory authority shall revoke the accreditation of a body as referred to in paragraph 1 if the conditions for accreditation are not, or are no longer, met or where actions taken by the body infringe this Regulation.\n6.This Article shall not apply to processing carried out by public authorities and bodies.',
    'It should be ascertained whether all appropriate technological protection and organisational measures have been implemented to establish immediately whether a personal data breach has taken place and to inform promptly the supervisory authority and the data subject. The fact that the notification was made without undue delay should be established taking into account in particular the nature and gravity of the personal data breach and its consequences and adverse effects for the data subject. Such notification may result in an intervention of the supervisory authority in accordance with its tasks and powers laid down in this Regulation.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6245, 0.2334],
#         [0.6245, 1.0000, 0.3201],
#         [0.2334, 0.3201, 1.0000]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.4722
cosine_accuracy@3 0.5177
cosine_accuracy@5 0.5606
cosine_accuracy@10 0.6263
cosine_precision@1 0.4722
cosine_precision@3 0.4571
cosine_precision@5 0.4192
cosine_precision@10 0.3687
cosine_recall@1 0.1074
cosine_recall@3 0.2661
cosine_recall@5 0.3441
cosine_recall@10 0.4659
cosine_ndcg@10 0.5402
cosine_mrr@10 0.5073
cosine_map@100 0.5999

Information Retrieval

Metric Value
cosine_accuracy@1 0.4444
cosine_accuracy@3 0.4848
cosine_accuracy@5 0.5455
cosine_accuracy@10 0.6035
cosine_precision@1 0.4444
cosine_precision@3 0.4251
cosine_precision@5 0.3894
cosine_precision@10 0.3455
cosine_recall@1 0.107
cosine_recall@3 0.2625
cosine_recall@5 0.3362
cosine_recall@10 0.45
cosine_ndcg@10 0.5158
cosine_mrr@10 0.4801
cosine_map@100 0.585

Information Retrieval

Metric Value
cosine_accuracy@1 0.4394
cosine_accuracy@3 0.4848
cosine_accuracy@5 0.5328
cosine_accuracy@10 0.596
cosine_precision@1 0.4394
cosine_precision@3 0.4242
cosine_precision@5 0.3919
cosine_precision@10 0.3424
cosine_recall@1 0.1027
cosine_recall@3 0.2525
cosine_recall@5 0.3299
cosine_recall@10 0.4382
cosine_ndcg@10 0.5075
cosine_mrr@10 0.475
cosine_map@100 0.5754

Information Retrieval

Metric Value
cosine_accuracy@1 0.4091
cosine_accuracy@3 0.4495
cosine_accuracy@5 0.5025
cosine_accuracy@10 0.5606
cosine_precision@1 0.4091
cosine_precision@3 0.3872
cosine_precision@5 0.3591
cosine_precision@10 0.3124
cosine_recall@1 0.1018
cosine_recall@3 0.2438
cosine_recall@5 0.3185
cosine_recall@10 0.4262
cosine_ndcg@10 0.4788
cosine_mrr@10 0.4431
cosine_map@100 0.5381

Information Retrieval

Metric Value
cosine_accuracy@1 0.3207
cosine_accuracy@3 0.3636
cosine_accuracy@5 0.4091
cosine_accuracy@10 0.4823
cosine_precision@1 0.3207
cosine_precision@3 0.3081
cosine_precision@5 0.2869
cosine_precision@10 0.2548
cosine_recall@1 0.0791
cosine_recall@3 0.1965
cosine_recall@5 0.2592
cosine_recall@10 0.3567
cosine_ndcg@10 0.389
cosine_mrr@10 0.3549
cosine_map@100 0.4498

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,580 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 15.21 tokens
    • max: 35 tokens
    • min: 25 tokens
    • mean: 648.23 tokens
    • max: 2429 tokens
  • Samples:
    anchor positive
    What bodies or sources shall the Commission take into account? 1.By 25 May 2020 and every four years thereafter, the Commission shall submit a report on the evaluation and review of this Regulation to the European Parliament and to the Council. The reports shall be made public.
    2.In the context of the evaluations and reviews referred to in paragraph 1, the Commission shall examine, in particular, the application and functioning of: (a) Chapter V on the transfer of personal data to third countries or international organisations with particular regard to decisions adopted pursuant to Article 45(3) of this Regulation and decisions adopted on the basis of Article 25(6) of Directive 95/46/EC; (b) Chapter VII on cooperation and consistency.
    3.For the purpose of paragraph 1, the Commission may request information from Member States and supervisory authorities.
    4.In carrying out the evaluations and reviews referred to in paragraphs 1 and 2, the Commission shall take into account the positions and findings of the European Parliament, of the Council, and ...
    What enables researchers within social science to obtain essential knowledge about the long-term correlation of social conditions? By coupling information from registries, researchers can obtain new knowledge of great value with regard to widespread medical conditions such as cardiovascular disease, cancer and depression. On the basis of registries, research results can be enhanced, as they draw on a larger population. Within social science, research on the basis of registries enables researchers to obtain essential knowledge about the long-term correlation of a number of social conditions such as unemployment and education with other life conditions. Research results obtained through registries provide solid, high-quality knowledge which can provide the basis for the formulation and implementation of knowledge-based policy, improve the quality of life for a number of people and improve the efficiency of social services. In order to facilitate scientific research, personal data can be processed for scientific research purposes, subject to appropriate conditions and safeguards set out in Union or Member State law.
    What is the article that pertains to approving binding corporate rules? 1.Each supervisory authority shall have all of the following investigative powers: (a) to order the controller and the processor, and, where applicable, the controller's or the processor's representative to provide any information it requires for the performance of its tasks; (b) to carry out investigations in the form of data protection audits; (c) to carry out a review on certifications issued pursuant to Article 42(7); (d) to notify the controller or the processor of an alleged infringement of this Regulation; (e) to obtain, from the controller and the processor, access to all personal data and to all information necessary for the performance of its tasks; (f) to obtain access to any premises of the controller and the processor, including to any data processing equipment and means, in accordance with Union or Member State procedural law.
    2.Each supervisory authority shall have all of the following corrective powers: (a) to issue warnings to a controller or processor that inte...
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • gradient_accumulation_steps: 4
  • learning_rate: 3e-05
  • num_train_epochs: 20
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 20
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
-1 -1 - 0.3515 0.3509 0.3285 0.3016 0.2617
0.2020 10 20.9258 - - - - -
0.4040 20 20.6577 - - - - -
0.6061 30 20.6479 - - - - -
0.8081 40 21.0398 - - - - -
1.0 50 20.2131 0.3647 0.3809 0.3475 0.3206 0.2865
1.2020 60 19.2345 - - - - -
1.4040 70 18.6065 - - - - -
1.6061 80 16.8382 - - - - -
1.8081 90 17.4581 - - - - -
2.0 100 16.8996 0.4571 0.4535 0.4513 0.4101 0.3576
2.2020 110 17.4694 - - - - -
2.4040 120 14.7442 - - - - -
2.6061 130 12.601 - - - - -
2.8081 140 13.037 - - - - -
3.0 150 13.0811 0.4993 0.5003 0.4866 0.4555 0.3709
3.2020 160 11.8374 - - - - -
3.4040 170 12.5389 - - - - -
3.6061 180 14.3829 - - - - -
3.8081 190 13.8871 - - - - -
4.0 200 10.3684 0.5054 0.5020 0.4947 0.4597 0.3739
4.2020 210 12.6792 - - - - -
4.4040 220 10.6044 - - - - -
4.6061 230 12.015 - - - - -
4.8081 240 10.7804 - - - - -
5.0 250 9.439 0.5190 0.5098 0.5063 0.4589 0.3753
5.2020 260 10.8849 - - - - -
5.4040 270 11.2237 - - - - -
5.6061 280 9.7149 - - - - -
5.8081 290 10.5259 - - - - -
6.0 300 9.1578 0.5227 0.5169 0.5062 0.4667 0.3777
6.2020 310 10.6102 - - - - -
6.4040 320 10.1176 - - - - -
6.6061 330 8.3092 - - - - -
6.8081 340 9.5087 - - - - -
7.0 350 11.525 0.5252 0.5144 0.5092 0.4747 0.3706
7.2020 360 10.3263 - - - - -
7.4040 370 9.7615 - - - - -
7.6061 380 9.1261 - - - - -
7.8081 390 9.6996 - - - - -
8.0 400 8.4646 0.5324 0.5158 0.5082 0.4759 0.3719
8.2020 410 9.6561 - - - - -
8.4040 420 9.504 - - - - -
8.6061 430 7.4925 - - - - -
8.8081 440 8.749 - - - - -
9.0 450 9.5831 0.5282 0.5215 0.5038 0.4741 0.3721
9.2020 460 8.5261 - - - - -
9.4040 470 9.2267 - - - - -
9.6061 480 8.3529 - - - - -
9.8081 490 8.391 - - - - -
10.0 500 9.2313 0.5374 0.5219 0.5093 0.4768 0.3749
10.2020 510 10.6238 - - - - -
10.4040 520 8.9972 - - - - -
10.6061 530 8.0452 - - - - -
10.8081 540 8.2937 - - - - -
11.0 550 8.0842 0.5402 0.5158 0.5075 0.4788 0.389
11.2020 560 7.9855 - - - - -
11.4040 570 9.1783 - - - - -
11.6061 580 8.5681 - - - - -
11.8081 590 9.0004 - - - - -
12.0 600 7.8016 0.5402 0.5199 0.5078 0.4745 0.3836
12.2020 610 8.1169 - - - - -
12.4040 620 8.7016 - - - - -
12.6061 630 8.6899 - - - - -
12.8081 640 8.1782 - - - - -
13.0 650 7.8024 0.5361 0.5178 0.5065 0.4751 0.3864
-1 -1 - 0.5402 0.5158 0.5075 0.4788 0.3890
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
2
Safetensors
Model size
0.1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for IoannisKat1/modernbert-embed-base-new2

Finetuned
(95)
this model

Papers for IoannisKat1/modernbert-embed-base-new2

Evaluation results