SentenceTransformer based on benjamintli/modernbert-cosqa
This is a sentence-transformers model finetuned from benjamintli/modernbert-cosqa on the cosqa-llm-filtered-hard-negatives dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: benjamintli/modernbert-cosqa
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'OptimizedModule'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("modernbert-cosqa-hard-negatives")
sentences = [
'python strftime miliseconds fixed width',
'def fmt_duration(secs):\n """Format a duration in seconds."""\n return \' \'.join(fmt.human_duration(secs, 0, precision=2, short=True).strip().split())',
'def seconds_to_hms(seconds):\n """\n Converts seconds float to \'hh:mm:ss.ssssss\' format.\n """\n hours = int(seconds / 3600.0)\n minutes = int((seconds / 60.0) % 60.0)\n secs = float(seconds % 60.0)\n return "{0:02d}:{1:02d}:{2:02.6f}".format(hours, minutes, secs)',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities)
Evaluation
Metrics
Information Retrieval
| Metric |
Value |
| cosine_accuracy@1 |
0.5521 |
| cosine_accuracy@3 |
0.8598 |
| cosine_accuracy@5 |
0.9311 |
| cosine_accuracy@10 |
0.9752 |
| cosine_precision@1 |
0.5521 |
| cosine_precision@3 |
0.2866 |
| cosine_precision@5 |
0.1862 |
| cosine_precision@10 |
0.0975 |
| cosine_recall@1 |
0.5521 |
| cosine_recall@3 |
0.8598 |
| cosine_recall@5 |
0.9311 |
| cosine_recall@10 |
0.9752 |
| cosine_ndcg@10 |
0.7789 |
| cosine_mrr@10 |
0.714 |
| cosine_map@100 |
0.7154 |
Training Details
Training Dataset
cosqa-llm-filtered-hard-negatives
- Dataset: cosqa-llm-filtered-hard-negatives at 1585731
- Size: 19,963 training samples
- Columns:
anchor, positive, and negative
- Approximate statistics based on the first 1000 samples:
|
anchor |
positive |
negative |
| type |
string |
string |
string |
| details |
- min: 6 tokens
- mean: 9.57 tokens
- max: 22 tokens
|
- min: 37 tokens
- mean: 88.3 tokens
- max: 512 tokens
|
- min: 36 tokens
- mean: 94.31 tokens
- max: 512 tokens
|
- Samples:
| anchor |
positive |
negative |
python 2d array to dict |
def to_dicts(recarray): """convert record array to a dictionaries""" for rec in recarray: yield dict(zip(recarray.dtype.names, rec.tolist())) |
def multidict_to_dict(d): """ Turns a werkzeug.MultiDict or django.MultiValueDict into a dict with list values :param d: a MultiDict or MultiValueDict instance :return: a dict instance """ return dict((k, v[0] if len(v) == 1 else v) for k, v in iterlists(d)) |
how to send dns request message in python |
def _request_modify_dns_record(self, record): """Sends Modify_DNS_Record request""" return self._request_internal("Modify_DNS_Record", domain=self.domain, record=record) |
def request(self, method, url, body=None, headers={}): """Send a complete request to the server.""" self._send_request(method, url, body, headers) |
how to cast string to uint8 in python |
def b2u(string): """ bytes to unicode """ if (isinstance(string, bytes) or (PY2 and isinstance(string, str))): return string.decode('utf-8') return string |
def to_bytes(s, encoding="utf-8"): """Convert a string to bytes.""" if isinstance(s, six.binary_type): return s if six.PY3: return bytes(s, encoding) return s.encode(encoding) |
- Loss:
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 64,
"gather_across_devices": false,
"directions": [
"query_to_doc"
],
"partition_mode": "joint",
"hardness_mode": null,
"hardness_strength": 0.0
}
Evaluation Dataset
cosqa-llm-filtered-hard-negatives
- Dataset: cosqa-llm-filtered-hard-negatives at 1585731
- Size: 2,219 evaluation samples
- Columns:
anchor, positive, and negative
- Approximate statistics based on the first 1000 samples:
|
anchor |
positive |
negative |
| type |
string |
string |
string |
| details |
- min: 6 tokens
- mean: 9.65 tokens
- max: 21 tokens
|
- min: 35 tokens
- mean: 88.87 tokens
- max: 512 tokens
|
- min: 37 tokens
- mean: 94.56 tokens
- max: 512 tokens
|
- Samples:
| anchor |
positive |
negative |
way to change the string "python" to have all uppercase letters |
def uppercase_chars(string: any) -> str: """Return all (and only) the uppercase chars in the given string.""" return ''.join([c if c.isupper() else '' for c in str(string)]) |
def to_capitalized_camel_case(snake_case_string): """ Convert a string from snake case to camel case with the first letter capitalized. For example, "some_var" would become "SomeVar".
:param snake_case_string: Snake-cased string to convert to camel case. :returns: Camel-cased version of snake_case_string. """ parts = snake_case_string.split('_') return ''.join([i.title() for i in parts]) |
how to make intercept zero in python |
def prox_zero(X, step): """Proximal operator to project onto zero """ return np.zeros(X.shape, dtype=X.dtype) |
def _adjust_offset(self, real_wave_mfcc, algo_parameters): """ OFFSET """ self.log(u"Called _adjust_offset") self._apply_offset(offset=algo_parameters[0]) |
stop running function and passing to other variable python |
def stop(self) -> None: """Stops the analysis as soon as possible.""" if self._stop and not self._posted_kork: self._stop() self._stop = None |
def stop(self, dummy_signum=None, dummy_frame=None): """ Shutdown process (this method is also a signal handler) """ logging.info('Shutting down ...') self.socket.close() sys.exit(0) |
- Loss:
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 64,
"gather_across_devices": false,
"directions": [
"query_to_doc"
],
"partition_mode": "joint",
"hardness_mode": null,
"hardness_strength": 0.0
}
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 1024
num_train_epochs: 10
learning_rate: 2e-06
warmup_steps: 0.1
bf16: True
eval_strategy: epoch
per_device_eval_batch_size: 1024
push_to_hub: True
hub_model_id: modernbert-cosqa-hard-negatives
load_best_model_at_end: True
dataloader_num_workers: 4
batch_sampler: no_duplicates
All Hyperparameters
Click to expand
per_device_train_batch_size: 1024
num_train_epochs: 10
max_steps: -1
learning_rate: 2e-06
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_steps: 0.1
optim: adamw_torch_fused
optim_args: None
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
optim_target_modules: None
gradient_accumulation_steps: 1
average_tokens_across_devices: True
max_grad_norm: 1.0
label_smoothing_factor: 0.0
bf16: True
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
use_liger_kernel: False
liger_kernel_config: None
use_cache: False
neftune_noise_alpha: None
torch_empty_cache_steps: None
auto_find_batch_size: False
log_on_each_node: True
logging_nan_inf_filter: True
include_num_input_tokens_seen: no
log_level: passive
log_level_replica: warning
disable_tqdm: False
project: huggingface
trackio_space_id: trackio
eval_strategy: epoch
per_device_eval_batch_size: 1024
prediction_loss_only: True
eval_on_start: False
eval_do_concat_batches: True
eval_use_gather_object: False
eval_accumulation_steps: None
include_for_metrics: []
batch_eval_metrics: False
save_only_model: False
save_on_each_node: False
enable_jit_checkpoint: False
push_to_hub: True
hub_private_repo: None
hub_model_id: modernbert-cosqa-hard-negatives
hub_strategy: every_save
hub_always_push: False
hub_revision: None
load_best_model_at_end: True
ignore_data_skip: False
restore_callback_states_from_checkpoint: False
full_determinism: False
seed: 42
data_seed: None
use_cpu: False
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
dataloader_drop_last: False
dataloader_num_workers: 4
dataloader_pin_memory: True
dataloader_persistent_workers: False
dataloader_prefetch_factor: None
remove_unused_columns: True
label_names: None
train_sampling_strategy: random
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
ddp_backend: None
ddp_timeout: 1800
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
deepspeed: None
debug: []
skip_memory_metrics: True
do_predict: False
resume_from_checkpoint: None
warmup_ratio: None
local_rank: -1
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}
Training Logs
| Epoch |
Step |
Training Loss |
Validation Loss |
eval_cosine_ndcg@10 |
| 0.5 |
10 |
1.5380 |
- |
- |
| 1.0 |
20 |
1.4167 |
0.9702 |
0.7440 |
| 1.5 |
30 |
1.4515 |
- |
- |
| 2.0 |
40 |
1.3789 |
0.9269 |
0.7499 |
| 2.5 |
50 |
1.3920 |
- |
- |
| 3.0 |
60 |
1.2849 |
0.8898 |
0.7581 |
| 3.5 |
70 |
1.3585 |
- |
- |
| 4.0 |
80 |
1.2197 |
0.8572 |
0.7653 |
| 4.5 |
90 |
1.2825 |
- |
- |
| 5.0 |
100 |
1.2078 |
0.8350 |
0.7686 |
| 5.5 |
110 |
1.2496 |
- |
- |
| 6.0 |
120 |
1.1569 |
0.8104 |
0.7720 |
| 6.5 |
130 |
1.2119 |
- |
- |
| 7.0 |
140 |
1.1278 |
0.7952 |
0.7754 |
| 7.5 |
150 |
1.1812 |
- |
- |
| 8.0 |
160 |
1.1018 |
0.7835 |
0.7770 |
| 8.5 |
170 |
1.1696 |
- |
- |
| 9.0 |
180 |
1.0972 |
0.7788 |
0.7786 |
| 9.5 |
190 |
1.1655 |
- |
- |
| 10.0 |
200 |
1.0796 |
0.7755 |
0.7789 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.12.12
- Sentence Transformers: 5.3.0
- Transformers: 5.3.0
- PyTorch: 2.10.0+cu128
- Accelerate: 1.13.0
- Datasets: 4.8.2
- Tokenizers: 0.22.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}