SentenceTransformer based on benjamintli/modernbert-cosqa

This is a sentence-transformers model finetuned from benjamintli/modernbert-cosqa on the cosqa-llm-filtered-hard-negatives dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'OptimizedModule'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("modernbert-cosqa-hard-negatives")
# Run inference
sentences = [
    'python strftime miliseconds fixed width',
    'def fmt_duration(secs):\n    """Format a duration in seconds."""\n    return \' \'.join(fmt.human_duration(secs, 0, precision=2, short=True).strip().split())',
    'def seconds_to_hms(seconds):\n    """\n    Converts seconds float to \'hh:mm:ss.ssssss\' format.\n    """\n    hours = int(seconds / 3600.0)\n    minutes = int((seconds / 60.0) % 60.0)\n    secs = float(seconds % 60.0)\n    return "{0:02d}:{1:02d}:{2:02.6f}".format(hours, minutes, secs)',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7076, 0.6960],
#         [0.7076, 1.0000, 0.7423],
#         [0.6960, 0.7423, 1.0000]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.5521
cosine_accuracy@3 0.8598
cosine_accuracy@5 0.9311
cosine_accuracy@10 0.9752
cosine_precision@1 0.5521
cosine_precision@3 0.2866
cosine_precision@5 0.1862
cosine_precision@10 0.0975
cosine_recall@1 0.5521
cosine_recall@3 0.8598
cosine_recall@5 0.9311
cosine_recall@10 0.9752
cosine_ndcg@10 0.7789
cosine_mrr@10 0.714
cosine_map@100 0.7154

Training Details

Training Dataset

cosqa-llm-filtered-hard-negatives

  • Dataset: cosqa-llm-filtered-hard-negatives at 1585731
  • Size: 19,963 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 9.57 tokens
    • max: 22 tokens
    • min: 37 tokens
    • mean: 88.3 tokens
    • max: 512 tokens
    • min: 36 tokens
    • mean: 94.31 tokens
    • max: 512 tokens
  • Samples:
    anchor positive negative
    python 2d array to dict def to_dicts(recarray):
    """convert record array to a dictionaries"""
    for rec in recarray:
    yield dict(zip(recarray.dtype.names, rec.tolist()))
    def multidict_to_dict(d):
    """
    Turns a werkzeug.MultiDict or django.MultiValueDict into a dict with
    list values
    :param d: a MultiDict or MultiValueDict instance
    :return: a dict instance
    """
    return dict((k, v[0] if len(v) == 1 else v) for k, v in iterlists(d))
    how to send dns request message in python def _request_modify_dns_record(self, record):
    """Sends Modify_DNS_Record request"""
    return self._request_internal("Modify_DNS_Record",
    domain=self.domain,
    record=record)
    def request(self, method, url, body=None, headers={}):
    """Send a complete request to the server."""
    self._send_request(method, url, body, headers)
    how to cast string to uint8 in python def b2u(string):
    """ bytes to unicode """
    if (isinstance(string, bytes) or
    (PY2 and isinstance(string, str))):
    return string.decode('utf-8')
    return string
    def to_bytes(s, encoding="utf-8"):
    """Convert a string to bytes."""
    if isinstance(s, six.binary_type):
    return s
    if six.PY3:
    return bytes(s, encoding)
    return s.encode(encoding)
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 64,
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Evaluation Dataset

cosqa-llm-filtered-hard-negatives

  • Dataset: cosqa-llm-filtered-hard-negatives at 1585731
  • Size: 2,219 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 9.65 tokens
    • max: 21 tokens
    • min: 35 tokens
    • mean: 88.87 tokens
    • max: 512 tokens
    • min: 37 tokens
    • mean: 94.56 tokens
    • max: 512 tokens
  • Samples:
    anchor positive negative
    way to change the string "python" to have all uppercase letters def uppercase_chars(string: any) -> str:
    """Return all (and only) the uppercase chars in the given string."""
    return ''.join([c if c.isupper() else '' for c in str(string)])
    def to_capitalized_camel_case(snake_case_string):
    """
    Convert a string from snake case to camel case with the first letter capitalized. For example, "some_var"
    would become "SomeVar".

    :param snake_case_string: Snake-cased string to convert to camel case.
    :returns: Camel-cased version of snake_case_string.
    """
    parts = snake_case_string.split('_')
    return ''.join([i.title() for i in parts])
    how to make intercept zero in python def prox_zero(X, step):
    """Proximal operator to project onto zero
    """
    return np.zeros(X.shape, dtype=X.dtype)
    def _adjust_offset(self, real_wave_mfcc, algo_parameters):
    """
    OFFSET
    """
    self.log(u"Called _adjust_offset")
    self._apply_offset(offset=algo_parameters[0])
    stop running function and passing to other variable python def stop(self) -> None:
    """Stops the analysis as soon as possible."""
    if self._stop and not self._posted_kork:
    self._stop()
    self._stop = None
    def stop(self, dummy_signum=None, dummy_frame=None):
    """ Shutdown process (this method is also a signal handler) """
    logging.info('Shutting down ...')
    self.socket.close()
    sys.exit(0)
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 64,
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 1024
  • num_train_epochs: 10
  • learning_rate: 2e-06
  • warmup_steps: 0.1
  • bf16: True
  • eval_strategy: epoch
  • per_device_eval_batch_size: 1024
  • push_to_hub: True
  • hub_model_id: modernbert-cosqa-hard-negatives
  • load_best_model_at_end: True
  • dataloader_num_workers: 4
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 1024
  • num_train_epochs: 10
  • max_steps: -1
  • learning_rate: 2e-06
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.1
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: epoch
  • per_device_eval_batch_size: 1024
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: True
  • hub_private_repo: None
  • hub_model_id: modernbert-cosqa-hard-negatives
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss eval_cosine_ndcg@10
0.5 10 1.5380 - -
1.0 20 1.4167 0.9702 0.7440
1.5 30 1.4515 - -
2.0 40 1.3789 0.9269 0.7499
2.5 50 1.3920 - -
3.0 60 1.2849 0.8898 0.7581
3.5 70 1.3585 - -
4.0 80 1.2197 0.8572 0.7653
4.5 90 1.2825 - -
5.0 100 1.2078 0.8350 0.7686
5.5 110 1.2496 - -
6.0 120 1.1569 0.8104 0.7720
6.5 130 1.2119 - -
7.0 140 1.1278 0.7952 0.7754
7.5 150 1.1812 - -
8.0 160 1.1018 0.7835 0.7770
8.5 170 1.1696 - -
9.0 180 1.0972 0.7788 0.7786
9.5 190 1.1655 - -
10.0 200 1.0796 0.7755 0.7789
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.3.0
  • Transformers: 5.3.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.8.2
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
108
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for benjamintli/modernbert-cosqa-hard-negatives

Unable to build the model tree, the base model loops to the model itself. Learn more.

Dataset used to train benjamintli/modernbert-cosqa-hard-negatives

Papers for benjamintli/modernbert-cosqa-hard-negatives

Evaluation results