SentenceTransformer based on answerdotai/ModernBERT-base

This is a sentence-transformers model finetuned from answerdotai/ModernBERT-base on the code-retrieval-combined dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'OptimizedModule'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("modernbert-code")
# Run inference
queries = [
    "function onActiveEditorChanged(event, current, previous) {\n        if (current \u0026\u0026 !current._codeMirror._lineFolds) {\n            enableFoldingInEditor(current);\n   ",
]
documents = [
    '     }\n        if (previous) {\n            saveLineFolds(previous);\n        }\n    }',
    'Save config data.\n\n@param string $path\n@param string $value\n@param string $scope\n@param int $scopeId\n\n@return null',
    'Get playback settings such as shuffle and repeat.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.6443, 0.0381, 0.0291]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.9167
cosine_accuracy@3 0.9643
cosine_accuracy@5 0.9738
cosine_accuracy@10 0.9822
cosine_precision@1 0.9167
cosine_precision@3 0.3214
cosine_precision@5 0.1948
cosine_precision@10 0.0982
cosine_recall@1 0.9167
cosine_recall@3 0.9643
cosine_recall@5 0.9738
cosine_recall@10 0.9822
cosine_ndcg@10 0.9519
cosine_mrr@10 0.9419
cosine_map@100 0.9426

Training Details

Training Dataset

code-retrieval-combined

  • Dataset: code-retrieval-combined at 4403b52
  • Size: 193,623 training samples
  • Columns: query and positive
  • Approximate statistics based on the first 1000 samples:
    query positive
    type string string
    details
    • min: 6 tokens
    • mean: 143.24 tokens
    • max: 1024 tokens
    • min: 5 tokens
    • mean: 64.75 tokens
    • max: 937 tokens
  • Samples:
    query positive
    protected function sendMusicMsgToJsonString(WxSendMusicMsg $msg)
    {
    $formatStr = '{
    "touser":"%s",
    "msgtype":"%s",
    "music":
    {
    "title":"%s",
    "description":"%s",
    "musicurl":"%s",
    "hqmusicurl":"%s",
    "thumb_media_id":"%s"
    }
    }';
    $result = sprintf($formatStr, $msg->getToUserName(),
    $msg->getMsgType(),
    $msg->getTitle(),
    $msg->getDescription(),
    $msg->getMusicUrl(),
    $msg->getHQMusicUrl(),
    $msg->getThumbMediaId()
    );

    return $result;
    }
    formatter WxSendMusicMsg to Json string
    @param WxSendMusicMsg $msg
    @return string
    def getBlocks(self):
    """
    Get the blocks that need to be migrated
    """
    try:
    conn = self.dbi.connection()
    result =
    self.buflistblks.execute(conn)
    return result
    finally:
    if conn:
    conn.close()
    function obj(/key,value, key,value .../) {
    var result = {}
    for(var n=0; n result[arguments[n]] = arguments[n+1]
    }
    return result
    }
    builds an object immediate where keys can be expressions
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 128,
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Evaluation Dataset

code-retrieval-combined

  • Dataset: code-retrieval-combined at 4403b52
  • Size: 21,514 evaluation samples
  • Columns: query and positive
  • Approximate statistics based on the first 1000 samples:
    query positive
    type string string
    details
    • min: 7 tokens
    • mean: 140.91 tokens
    • max: 1024 tokens
    • min: 5 tokens
    • mean: 71.36 tokens
    • max: 1024 tokens
  • Samples:
    query positive
    def save
    self.attributes.stringify_keys!
    self.attributes.delete('customer')
    self.attributes.delete('product')
    self.attributes.delete('credit_card')
    self.attributes.delete('bank_account')
    self.attributes.delete('paypal_account')

    self.attributes, options = extract_uniqueness_token(attributes)
    self.prefix_options.merge!(options)
    super
    end
    def _update_summary(self, summary=None):
    """Update all parts of the summary or clear when no summary."""
    board_image_label = self._parts['board image label']
    # get content for update or use blanks when no summary
    if summary:
    # make a board image with the swap drawn on it
    # board, action, text = summary.board, summary.action, summary.text
    board_image_cv = self._create_board_image_cv(summary.board)
    self._draw_swap_cv(board_image_cv, summary.action)
    board_image_tk = self._convert_cv_to_tk(board_image_cv)
    text = ''
    if not summary.score is None:
    text += 'Score: {:3.1f}'.format(summary.score)
    if (not summary.mana_drain_leaves is None) and<br> (not summary.total_leaves is None):
    text += ' Mana Drains: {}/{}' <br> ''.format(summary.mana_drain_leaves,
    summary.total_leaves)
    else:
    #clear any stored state image and use the blank
    board_image_tk = board_image_label._blank_image
    text = ''
    # update the UI parts with the content
    board_image_label._board_image = board_image_tk
    board_image_label.config(image=board_image_tk)
    # update the summary text
    summary_label = self._parts['summary label']
    summary_label.config(text=text)
    # refresh the UI
    self._base.update()
    def chi_p(mass1, mass2, spin1x, spin1y, spin2x, spin2y):
    """Returns the effective precession spin from mass1, mass2, spin1x,
    spin1y, spin2x, and spin2y.
    """
    xi1 = secondary_xi(mass1, mass2, spin1x, spin1y, spin2x, spin2y)
    xi2 = primary_xi(mass1, mass2, spin1x, spin1y, spin2x, spin2y)
    return chi_p_from_xi1_xi2(xi1, xi2)
    Returns the effective precession spin from mass1, mass2, spin1x,
    spin1y, spin2x, and spin2y.
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 128,
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 1024
  • num_train_epochs: 1
  • learning_rate: 8e-05
  • warmup_steps: 0.05
  • bf16: True
  • eval_strategy: steps
  • per_device_eval_batch_size: 1024
  • push_to_hub: True
  • hub_model_id: modernbert-code
  • load_best_model_at_end: True
  • dataloader_num_workers: 4
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 1024
  • num_train_epochs: 1
  • max_steps: -1
  • learning_rate: 8e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.05
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: steps
  • per_device_eval_batch_size: 1024
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: True
  • hub_private_repo: None
  • hub_model_id: modernbert-code
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss eval_cosine_ndcg@10
0.0526 10 5.2457 2.4469 0.4195
0.1053 20 1.3973 0.6956 0.7742
0.1579 30 0.5500 0.4000 0.8560
0.2105 40 0.3429 0.2878 0.8891
0.2632 50 0.2487 0.2250 0.9104
0.3158 60 0.2080 0.1872 0.9256
0.3684 70 0.1768 0.1656 0.9312
0.4211 80 0.1525 0.1501 0.9352
0.4737 90 0.1402 0.1374 0.9397
0.5263 100 0.1343 0.1317 0.9413
0.5789 110 0.1217 0.1242 0.9444
0.6316 120 0.1180 0.1199 0.9454
0.6842 130 0.1164 0.1149 0.9476
0.7368 140 0.1146 0.1106 0.9494
0.7895 150 0.1091 0.1080 0.9494
0.8421 160 0.1085 0.1055 0.9506
0.8947 170 0.1062 0.1041 0.9511
0.9474 180 0.1130 0.1030 0.9517
1.0 190 0.0924 0.1024 0.9519
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.3.0
  • Transformers: 5.3.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.8.3
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
175
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for benjamintli/modernbert-code

Finetuned
(1156)
this model

Dataset used to train benjamintli/modernbert-code

Papers for benjamintli/modernbert-code

Evaluation results