Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup
Paper • 2101.06983 • Published • 2
This is a sentence-transformers model finetuned from benjamintli/modernbert-code-v3-hard-negatives on the code-retrieval-hard-negatives-llm-verified-merged dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'OptimizedModule'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("modernbert-code-v4-hard-negatives")
# Run inference
queries = [
"If MultiTenantMiddleware is used, filter queryset by request.site_id",
]
documents = [
"def get_queryset(self):\n '''\n If MultiTenantMiddleware is used, filter queryset by request.site_id\n '''\n queryset = super(PageList, self).get_queryset()\n if hasattr(self.request, 'site_id'):\n queryset = queryset.filter(site_id=self.request.site_id)\n return queryset",
'def reduce_ticks(ax, which, maxticks=3):\n """Given a pyplot axis, resamples its `which`-axis ticks such that are at most\n `maxticks` left.\n\n Parameters\n ----------\n ax : axis\n The axis to adjust.\n which : {\'x\' | \'y\'}\n Which axis to adjust.\n maxticks : {3, int}\n Maximum number of ticks to use.\n\n Returns\n -------\n array\n An array of the selected ticks.\n """\n ticks = getattr(ax, \'get_{}ticks\'.format(which))()\n if len(ticks) > maxticks:\n # make sure the left/right value is not at the edge\n minax, maxax = getattr(ax, \'get_{}lim\'.format(which))()\n dw = abs(maxax-minax)/10.\n start_idx, end_idx = 0, len(ticks)\n if ticks[0] < minax + dw:\n start_idx += 1\n if ticks[-1] > maxax - dw:\n end_idx -= 1\n # get reduction factor\n fac = int(len(ticks) / maxticks)\n ticks = ticks[start_idx:end_idx:fac]\n return ticks',
'function (isPublic, name, data, ttl, published_at, coreid) {\n var rawFn = function (msg) {\n try {\n msg.setMaxAge(parseInt((ttl && (ttl >= 0)) ? ttl : 60));\n if (published_at) {\n msg.setTimestamp(moment(published_at).toDate());\n }\n }\n catch (ex) {\n logger.error("onCoreHeard - " + ex);\n }\n return msg;\n };\n\n var msgName = (isPublic) ? "PublicEvent" : "PrivateEvent";\n var userID = (this.userID || "").toLowerCase() + "/";\n name = (name) ? name.toString() : name;\n if (name && name.indexOf && (name.indexOf(userID) == 0)) {\n name = name.substring(userID.length);\n }\n\n data = (data) ? data.toString() : data;\n this.sendNONTypeMessage(msgName, { event_name: name, _raw: rawFn }, data);\n }',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.8836, -0.0275, 0.0176]])
evalInformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.8943 |
| cosine_accuracy@3 | 0.943 |
| cosine_accuracy@5 | 0.963 |
| cosine_accuracy@10 | 0.976 |
| cosine_precision@1 | 0.8943 |
| cosine_precision@3 | 0.3143 |
| cosine_precision@5 | 0.1926 |
| cosine_precision@10 | 0.0976 |
| cosine_recall@1 | 0.8943 |
| cosine_recall@3 | 0.943 |
| cosine_recall@5 | 0.963 |
| cosine_recall@10 | 0.976 |
| cosine_ndcg@10 | 0.9359 |
| cosine_mrr@10 | 0.9229 |
| cosine_map@100 | 0.924 |
query, positive, negative_0, negative_1, negative_2, negative_3, negative_4, and negative_5| query | positive | negative_0 | negative_1 | negative_2 | negative_3 | negative_4 | negative_5 | |
|---|---|---|---|---|---|---|---|---|
| type | string | string | string | string | string | string | string | string |
| details |
|
|
|
|
|
|
|
|
| query | positive | negative_0 | negative_1 | negative_2 | negative_3 | negative_4 | negative_5 |
|---|---|---|---|---|---|---|---|
A valid parentheses sequence is a non-empty string where each character is either '(' or ')', which satisfies the following constraint: |
try: |
t=int(input()) |
test=int(input()) |
from collections import Counter |
|||
def solve(A,B): |
|||||||
a = Counter(A) |
|||||||
b = Counter(B) |
|||||||
ans = 0 |
|||||||
for i in a: |
|||||||
if i in b: |
|||||||
ans += min(a[i],b[i]) |
|||||||
return ans |
|||||||
t = int(input()) |
|||||||
for _ in range(t): |
|||||||
A = input() |
|||||||
B = input() |
|||||||
print(solve(A,B)) |
l=list(map(int,input())) |
t=eval(input()) |
t=eval(input()) |
||||
Chef has a cubic die with 6 faces kept on an infinite plane. Each face has a distinct integer in the range [1,6] written on it, but the exact arrangement of the numbers on the faces of the die is unknown to Chef. Curiosity gets the better of Chef and he wants to find out o(1), o(2), ..., o(6), where o(i) is the number written opposite to the number i. |
from itertools import permutations |
def solve(): |
import sys |
import numpy as np |
# from math import log2 |
# from math import log2 |
# from math import log2 |
DevuLand is a very strange place. There are n villages in it. Some of the villages are occupied by dinosaurs while the remaining ones by villagers. |
# cook your dish here |
from collections import deque |
|||||
T=int(input()) |
|||||||
def break_down(num): |
|||||||
count=0 |
|||||||
while(len(num)!=1): |
|||||||
temp=0 |
|||||||
for i in range(0,len(num)): |
|||||||
temp=temp+int(num[i]) |
|||||||
num=str(temp) |
|||||||
count=count+1 |
|||||||
return (int(num),count) |
|||||||
def digit_sum(num): |
|||||||
temp=0 |
|||||||
for i in range(0,len(num)): |
|||||||
temp=temp+int(num[i]) |
|||||||
num=temp |
|||||||
return (num) |
|||||||
while(T): |
|||||||
queue=deque() |
|||||||
count_n=0 |
|||||||
count_d=0 |
|||||||
T=T-1 |
|||||||
N,d=[i for i in input().split()] |
|||||||
n,count_n=break_down(N) |
|||||||
D,count_D=break_down(d) |
|||||||
dic={} |
|||||||
if(D==1 or D==2 or D==4 or D==5 or D==7 or D==8): |
|||||||
mini=1 |
|||||||
elif(D==3 or D==6): |
|||||||
mini=min(digit_sum(str(n+3)),digit_sum(str(n+6)),digit_sum(str(n+9))) |
|||||||
else: |
|||||||
mini=n |
|||||||
queue.append((int(N),0)) |
|||||||
ele=int(N) |
|||||||
count=0 |
|||||||
while(len(queue)!=0): |
|||||||
ele,count=queue.popleft() |
|||||||
if(ele==mini): |
|||||||
break |
|||||||
else: |
|||||||
if(len(str(ele))==1): |
|||||||
temp1=ele+int(d) |
|||||||
queue.append((temp1,count+1))... |
# cook your dish here |
from collections import Counter |
from bisect import bisect_left, insort_left |
import fractions |
import fractions |
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 128,
"gather_across_devices": false,
"directions": [
"query_to_doc"
],
"partition_mode": "joint",
"hardness_mode": null,
"hardness_strength": 0.0
}
query and positive| query | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| query | positive |
|---|---|
This gets the version of OpenALPR |
def get_version(self): |
Remove all unnecessary comments from a lexer or parser file |
public String stripUnnecessaryComments(String javaContent, AntlrOptions options) { |
Serialize reply to array or JSON. |
function reply(packet, json) { |
CachedMultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"mini_batch_size": 128,
"gather_across_devices": false,
"directions": [
"query_to_doc"
],
"partition_mode": "joint",
"hardness_mode": null,
"hardness_strength": 0.0
}
eval_strategy: stepsper_device_train_batch_size: 1024per_device_eval_batch_size: 1024num_train_epochs: 1warmup_steps: 0.05bf16: Truedataloader_num_workers: 4load_best_model_at_end: Truepush_to_hub: Truehub_model_id: modernbert-code-v4-hard-negativesbatch_sampler: no_duplicatesdo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 1024per_device_eval_batch_size: 1024gradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: Nonewarmup_steps: 0.05log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Trueenable_jit_checkpoint: Falsesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseuse_cpu: Falseseed: 42data_seed: Nonebf16: Truefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: -1ddp_backend: Nonedebug: []dataloader_drop_last: Falsedataloader_num_workers: 4dataloader_prefetch_factor: Nonedisable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Nonegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Truepush_to_hub: Trueresume_from_checkpoint: Nonehub_model_id: modernbert-code-v4-hard-negativeshub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_for_metrics: []eval_do_concat_batches: Trueauto_find_batch_size: Falsefull_determinism: Falseddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueuse_cache: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | Validation Loss | eval_cosine_ndcg@10 |
|---|---|---|---|---|
| 0.0738 | 20 | 0.9880 | - | - |
| 0.1476 | 40 | 0.9529 | 0.3465 | 0.9286 |
| 0.2214 | 60 | 0.9726 | - | - |
| 0.2952 | 80 | 0.9299 | 0.3351 | 0.9296 |
| 0.3690 | 100 | 0.9130 | - | - |
| 0.4428 | 120 | 0.9187 | 0.3253 | 0.9325 |
| 0.5166 | 140 | 0.8940 | - | - |
| 0.5904 | 160 | 0.9037 | 0.3186 | 0.9354 |
| 0.6642 | 180 | 0.8951 | - | - |
| 0.738 | 200 | 0.8816 | 0.3121 | 0.9361 |
| 0.8118 | 220 | 0.8753 | - | - |
| 0.8856 | 240 | 0.8649 | 0.3106 | 0.9359 |
| 0.9594 | 260 | 0.8575 | - | - |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}