SentenceTransformer based on answerdotai/ModernBERT-base
This is a sentence-transformers model finetuned from answerdotai/ModernBERT-base on the code-retrieval-combined dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: answerdotai/ModernBERT-base
- Maximum Sequence Length: 1024 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'OptimizedModule'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("modernbert-code")
queries = [
"function onActiveEditorChanged(event, current, previous) {\n if (current \u0026\u0026 !current._codeMirror._lineFolds) {\n enableFoldingInEditor(current);\n ",
]
documents = [
' }\n if (previous) {\n saveLineFolds(previous);\n }\n }',
'Save config data.\n\n@param string $path\n@param string $value\n@param string $scope\n@param int $scopeId\n\n@return null',
'Get playback settings such as shuffle and repeat.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
Evaluation
Metrics
Information Retrieval
| Metric |
Value |
| cosine_accuracy@1 |
0.9167 |
| cosine_accuracy@3 |
0.9643 |
| cosine_accuracy@5 |
0.9738 |
| cosine_accuracy@10 |
0.9822 |
| cosine_precision@1 |
0.9167 |
| cosine_precision@3 |
0.3214 |
| cosine_precision@5 |
0.1948 |
| cosine_precision@10 |
0.0982 |
| cosine_recall@1 |
0.9167 |
| cosine_recall@3 |
0.9643 |
| cosine_recall@5 |
0.9738 |
| cosine_recall@10 |
0.9822 |
| cosine_ndcg@10 |
0.9519 |
| cosine_mrr@10 |
0.9419 |
| cosine_map@100 |
0.9426 |
Training Details
Training Dataset
code-retrieval-combined
- Dataset: code-retrieval-combined at 4403b52
- Size: 193,623 training samples
- Columns:
query and positive
- Approximate statistics based on the first 1000 samples:
|
query |
positive |
| type |
string |
string |
| details |
- min: 6 tokens
- mean: 143.24 tokens
- max: 1024 tokens
|
- min: 5 tokens
- mean: 64.75 tokens
- max: 937 tokens
|
- Samples:
| query |
positive |
protected function sendMusicMsgToJsonString(WxSendMusicMsg $msg) { $formatStr = '{ "touser":"%s", "msgtype":"%s", "music": { "title":"%s", "description":"%s", "musicurl":"%s", "hqmusicurl":"%s", "thumb_media_id":"%s" } }'; $result = sprintf($formatStr, $msg->getToUserName(), $msg->getMsgType(), $msg->getTitle(), $msg->getDescription(), $msg->getMusicUrl(), $msg->getHQMusicUrl(), $msg->getThumbMediaId() );
return $result; } |
formatter WxSendMusicMsg to Json string @param WxSendMusicMsg $msg @return string |
def getBlocks(self): """ Get the blocks that need to be migrated """ try: conn = self.dbi.connection() result = |
self.buflistblks.execute(conn) return result finally: if conn: conn.close() |
function obj(/key,value, key,value .../) { var result = {} for(var n=0; n result[arguments[n]] = arguments[n+1] } return result } |
builds an object immediate where keys can be expressions |
- Loss:
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 128,
"gather_across_devices": false,
"directions": [
"query_to_doc"
],
"partition_mode": "joint",
"hardness_mode": null,
"hardness_strength": 0.0
}
Evaluation Dataset
code-retrieval-combined
- Dataset: code-retrieval-combined at 4403b52
- Size: 21,514 evaluation samples
- Columns:
query and positive
- Approximate statistics based on the first 1000 samples:
|
query |
positive |
| type |
string |
string |
| details |
- min: 7 tokens
- mean: 140.91 tokens
- max: 1024 tokens
|
- min: 5 tokens
- mean: 71.36 tokens
- max: 1024 tokens
|
- Samples:
| query |
positive |
def save self.attributes.stringify_keys! self.attributes.delete('customer') self.attributes.delete('product') self.attributes.delete('credit_card') self.attributes.delete('bank_account') self.attributes.delete('paypal_account')
|
self.attributes, options = extract_uniqueness_token(attributes) self.prefix_options.merge!(options) super end |
def _update_summary(self, summary=None): """Update all parts of the summary or clear when no summary.""" board_image_label = self._parts['board image label'] # get content for update or use blanks when no summary if summary: # make a board image with the swap drawn on it # board, action, text = summary.board, summary.action, summary.text board_image_cv = self._create_board_image_cv(summary.board) self._draw_swap_cv(board_image_cv, summary.action) board_image_tk = self._convert_cv_to_tk(board_image_cv) text = '' if not summary.score is None: text += 'Score: {:3.1f}'.format(summary.score) if (not summary.mana_drain_leaves is None) and<br> (not summary.total_leaves is None): text += ' Mana Drains: {}/{}' <br> ''.format(summary.mana_drain_leaves, |
summary.total_leaves) else: #clear any stored state image and use the blank board_image_tk = board_image_label._blank_image text = '' # update the UI parts with the content board_image_label._board_image = board_image_tk board_image_label.config(image=board_image_tk) # update the summary text summary_label = self._parts['summary label'] summary_label.config(text=text) # refresh the UI self._base.update() |
def chi_p(mass1, mass2, spin1x, spin1y, spin2x, spin2y): """Returns the effective precession spin from mass1, mass2, spin1x, spin1y, spin2x, and spin2y. """ xi1 = secondary_xi(mass1, mass2, spin1x, spin1y, spin2x, spin2y) xi2 = primary_xi(mass1, mass2, spin1x, spin1y, spin2x, spin2y) return chi_p_from_xi1_xi2(xi1, xi2) |
Returns the effective precession spin from mass1, mass2, spin1x, spin1y, spin2x, and spin2y. |
- Loss:
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 128,
"gather_across_devices": false,
"directions": [
"query_to_doc"
],
"partition_mode": "joint",
"hardness_mode": null,
"hardness_strength": 0.0
}
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 1024
num_train_epochs: 1
learning_rate: 8e-05
warmup_steps: 0.05
bf16: True
eval_strategy: steps
per_device_eval_batch_size: 1024
push_to_hub: True
hub_model_id: modernbert-code
load_best_model_at_end: True
dataloader_num_workers: 4
batch_sampler: no_duplicates
All Hyperparameters
Click to expand
per_device_train_batch_size: 1024
num_train_epochs: 1
max_steps: -1
learning_rate: 8e-05
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_steps: 0.05
optim: adamw_torch_fused
optim_args: None
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
optim_target_modules: None
gradient_accumulation_steps: 1
average_tokens_across_devices: True
max_grad_norm: 1.0
label_smoothing_factor: 0.0
bf16: True
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
use_liger_kernel: False
liger_kernel_config: None
use_cache: False
neftune_noise_alpha: None
torch_empty_cache_steps: None
auto_find_batch_size: False
log_on_each_node: True
logging_nan_inf_filter: True
include_num_input_tokens_seen: no
log_level: passive
log_level_replica: warning
disable_tqdm: False
project: huggingface
trackio_space_id: trackio
eval_strategy: steps
per_device_eval_batch_size: 1024
prediction_loss_only: True
eval_on_start: False
eval_do_concat_batches: True
eval_use_gather_object: False
eval_accumulation_steps: None
include_for_metrics: []
batch_eval_metrics: False
save_only_model: False
save_on_each_node: False
enable_jit_checkpoint: False
push_to_hub: True
hub_private_repo: None
hub_model_id: modernbert-code
hub_strategy: every_save
hub_always_push: False
hub_revision: None
load_best_model_at_end: True
ignore_data_skip: False
restore_callback_states_from_checkpoint: False
full_determinism: False
seed: 42
data_seed: None
use_cpu: False
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
dataloader_drop_last: False
dataloader_num_workers: 4
dataloader_pin_memory: True
dataloader_persistent_workers: False
dataloader_prefetch_factor: None
remove_unused_columns: True
label_names: None
train_sampling_strategy: random
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
ddp_backend: None
ddp_timeout: 1800
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
deepspeed: None
debug: []
skip_memory_metrics: True
do_predict: False
resume_from_checkpoint: None
warmup_ratio: None
local_rank: -1
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}
Training Logs
| Epoch |
Step |
Training Loss |
Validation Loss |
eval_cosine_ndcg@10 |
| 0.0526 |
10 |
5.2457 |
2.4469 |
0.4195 |
| 0.1053 |
20 |
1.3973 |
0.6956 |
0.7742 |
| 0.1579 |
30 |
0.5500 |
0.4000 |
0.8560 |
| 0.2105 |
40 |
0.3429 |
0.2878 |
0.8891 |
| 0.2632 |
50 |
0.2487 |
0.2250 |
0.9104 |
| 0.3158 |
60 |
0.2080 |
0.1872 |
0.9256 |
| 0.3684 |
70 |
0.1768 |
0.1656 |
0.9312 |
| 0.4211 |
80 |
0.1525 |
0.1501 |
0.9352 |
| 0.4737 |
90 |
0.1402 |
0.1374 |
0.9397 |
| 0.5263 |
100 |
0.1343 |
0.1317 |
0.9413 |
| 0.5789 |
110 |
0.1217 |
0.1242 |
0.9444 |
| 0.6316 |
120 |
0.1180 |
0.1199 |
0.9454 |
| 0.6842 |
130 |
0.1164 |
0.1149 |
0.9476 |
| 0.7368 |
140 |
0.1146 |
0.1106 |
0.9494 |
| 0.7895 |
150 |
0.1091 |
0.1080 |
0.9494 |
| 0.8421 |
160 |
0.1085 |
0.1055 |
0.9506 |
| 0.8947 |
170 |
0.1062 |
0.1041 |
0.9511 |
| 0.9474 |
180 |
0.1130 |
0.1030 |
0.9517 |
| 1.0 |
190 |
0.0924 |
0.1024 |
0.9519 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.12.12
- Sentence Transformers: 5.3.0
- Transformers: 5.3.0
- PyTorch: 2.10.0+cu128
- Accelerate: 1.13.0
- Datasets: 4.8.3
- Tokenizers: 0.22.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}