SentenceTransformer based on benjamintli/modernbert-code-v3-hard-negatives

This is a sentence-transformers model finetuned from benjamintli/modernbert-code-v3-hard-negatives on the code-retrieval-hard-negatives-llm-verified-merged dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'OptimizedModule'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("modernbert-code-v4-hard-negatives")
# Run inference
queries = [
    "If MultiTenantMiddleware is used, filter queryset by request.site_id",
]
documents = [
    "def get_queryset(self):\n        '''\n        If MultiTenantMiddleware is used, filter queryset by request.site_id\n        '''\n        queryset = super(PageList, self).get_queryset()\n        if hasattr(self.request, 'site_id'):\n            queryset = queryset.filter(site_id=self.request.site_id)\n        return queryset",
    'def reduce_ticks(ax, which, maxticks=3):\n    """Given a pyplot axis, resamples its `which`-axis ticks such that are at most\n    `maxticks` left.\n\n    Parameters\n    ----------\n    ax : axis\n        The axis to adjust.\n    which : {\'x\' | \'y\'}\n        Which axis to adjust.\n    maxticks : {3, int}\n        Maximum number of ticks to use.\n\n    Returns\n    -------\n    array\n        An array of the selected ticks.\n    """\n    ticks = getattr(ax, \'get_{}ticks\'.format(which))()\n    if len(ticks) > maxticks:\n        # make sure the left/right value is not at the edge\n        minax, maxax = getattr(ax, \'get_{}lim\'.format(which))()\n        dw = abs(maxax-minax)/10.\n        start_idx, end_idx = 0, len(ticks)\n        if ticks[0] < minax + dw:\n            start_idx += 1\n        if ticks[-1] > maxax - dw:\n            end_idx -= 1\n        # get reduction factor\n        fac = int(len(ticks) / maxticks)\n        ticks = ticks[start_idx:end_idx:fac]\n    return ticks',
    'function (isPublic, name, data, ttl, published_at, coreid) {\n        var rawFn = function (msg) {\n            try {\n                msg.setMaxAge(parseInt((ttl && (ttl >= 0)) ? ttl : 60));\n                if (published_at) {\n                    msg.setTimestamp(moment(published_at).toDate());\n                }\n            }\n            catch (ex) {\n                logger.error("onCoreHeard - " + ex);\n            }\n            return msg;\n        };\n\n        var msgName = (isPublic) ? "PublicEvent" : "PrivateEvent";\n        var userID = (this.userID || "").toLowerCase() + "/";\n        name = (name) ? name.toString() : name;\n        if (name && name.indexOf && (name.indexOf(userID) == 0)) {\n            name = name.substring(userID.length);\n        }\n\n        data = (data) ? data.toString() : data;\n        this.sendNONTypeMessage(msgName, { event_name: name, _raw: rawFn }, data);\n    }',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.8836, -0.0275,  0.0176]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.8943
cosine_accuracy@3 0.943
cosine_accuracy@5 0.963
cosine_accuracy@10 0.976
cosine_precision@1 0.8943
cosine_precision@3 0.3143
cosine_precision@5 0.1926
cosine_precision@10 0.0976
cosine_recall@1 0.8943
cosine_recall@3 0.943
cosine_recall@5 0.963
cosine_recall@10 0.976
cosine_ndcg@10 0.9359
cosine_mrr@10 0.9229
cosine_map@100 0.924

Training Details

Training Dataset

code-retrieval-hard-negatives-llm-verified-merged

  • Dataset: code-retrieval-hard-negatives-llm-verified-merged at 459ec4b
  • Size: 277,492 training samples
  • Columns: query, positive, negative_0, negative_1, negative_2, negative_3, negative_4, and negative_5
  • Approximate statistics based on the first 1000 samples:
    query positive negative_0 negative_1 negative_2 negative_3 negative_4 negative_5
    type string string string string string string string string
    details
    • min: 6 tokens
    • mean: 298.97 tokens
    • max: 1024 tokens
    • min: 9 tokens
    • mean: 191.06 tokens
    • max: 1024 tokens
    • min: 16 tokens
    • mean: 215.07 tokens
    • max: 1024 tokens
    • min: 17 tokens
    • mean: 214.94 tokens
    • max: 1024 tokens
    • min: 16 tokens
    • mean: 215.65 tokens
    • max: 1024 tokens
    • min: 17 tokens
    • mean: 215.93 tokens
    • max: 1024 tokens
    • min: 15 tokens
    • mean: 219.31 tokens
    • max: 1024 tokens
    • min: 15 tokens
    • mean: 219.5 tokens
    • max: 1024 tokens
  • Samples:
    query positive negative_0 negative_1 negative_2 negative_3 negative_4 negative_5
    A valid parentheses sequence is a non-empty string where each character is either '(' or ')', which satisfies the following constraint:

    You can find a way to repeat erasing adjacent pairs of parentheses '()' until it becomes empty.

    For example, '(())' and '()((()()))' are valid parentheses sequences, but ')()(' and '(()' are not.

    Mike has a valid parentheses sequence. He really likes everything about his sequence, except the fact that it is quite long. So Mike has recently decided that he will replace his parentheses sequence with a new one in the near future. But not every valid parentheses sequence will satisfy him. To help you understand his requirements we'll introduce the pseudocode of function F(S):

    FUNCTION F( S - a valid parentheses sequence )
    BEGIN
    balance = 0
    max_balance = 0
    FOR index FROM 1 TO LENGTH(S)
    BEGIN
    if S[index] == '(' then balance = balance + 1
    if S[index] == ')' then balance = balance - 1
    max_balance = max( max_balance, balance )
    END
    ...
    try:
    for i in range(int(input())):
    s=input()
    balance=0
    max_balance=0
    for i in s:
    if i=='(':balance+=1
    else:
    balance-=1
    max_balance=max(max_balance,balance)
    print('('*max_balance,')'*max_balance,sep="")
    except Exception as e:
    print(e)
    t=int(input())

    for tt in range(t):
    a,b,p=map(int,input().split())
    s=input()
    n=len(s)
    cost = [0]*n
    cost[-1] = 0
    typ = ''
    i=n-2
    while i>=0:
    if s[i]==typ:
    cost[i] = cost[i+1]
    else:
    typ = s[i]
    cost[i] = cost[i+1] + (a if typ=='A' else b)
    i-=1
    i=0
    while cost[i] > p:
    i+=1
    print(i+1)
    test=int(input())
    for i in range(test):
    s=input()
    b=len(s)
    list1=[]
    for j in range(len(s)):
    if s[j]=='.':
    list1.append(j)
    for i in list1:
    if b-i-1 in list1 :
    if i!=b-i-1 and ((s[i] and s[b-i-1]) != 'a' ):
    s=s[:i]+'a'+s[i+1:b-i-1]+'a'+s[b-i:]
    else:
    s=s[:i]+'a'+s[i+1:]
    else:
    s=s[:i]+s[b-i-1]+s[i+1:]

    if s==s[::-1]:
    print(s)
    else:
    print(-1)

    from collections import Counter

    def solve(A,B):

    a = Counter(A)

    b = Counter(B)

    ans = 0

    for i in a:

    if i in b:

    ans += min(a[i],b[i])


    return ans



    t = int(input())


    for _ in range(t):

    A = input()

    B = input()

    print(solve(A,B))
    l=list(map(int,input()))
    t=-1
    x=-1
    y=-1
    for i in range(len(l)):
    s=l[i]
    a=i+1
    b=i+1
    for j in range(i+1,len(l)):
    if l[i] s=s+l[j]
    b=j+1
    else:
    break
    if s>t:
    t=s
    x=a
    y=b
    print(t,end=":")
    print(x,y,sep="-")
    t=eval(input())

    a=[]
    b=[]

    top=-1

    for __ in range(0,t):

    x=input().split()

    if(x[0]!="-1" and x[0]!="0"):

    add=int(x[0])

    if top!=-1 and add>a[top][0] :

    b[top]+=1

    else:
    a.append((add,x[1]))

    b.append(0)
    top+=1


    elif (x[0]=="-1"):

    #print("%s %s" %(b[top],a[top][1]))
    print((b[top]), end=' ')
    print(a[top][1])
    foo=a.pop()
    bar=b.pop()
    top-=1
    t=eval(input())

    a=[]
    b=[]

    top=-1

    for __ in range(0,t):

    x=input().split()

    if(x[0]!="-1" and x[0]!="0"):

    add=int(x[0])

    if top!=-1 and add>a[top][0] :

    b[top]+=1

    else:
    a.append((add,x[1]))

    b.append(0)
    top+=1


    elif (x[0]=="-1"):

    #print("%s %s" %(b[top],a[top][1]))
    print((b[top]), end=' ')
    print(a[top][1])
    foo=a.pop()
    bar=b.pop()
    top-=1
    Chef has a cubic die with 6 faces kept on an infinite plane. Each face has a distinct integer in the range [1,6] written on it, but the exact arrangement of the numbers on the faces of the die is unknown to Chef. Curiosity gets the better of Chef and he wants to find out o(1), o(2), ..., o(6), where o(i) is the number written opposite to the number i.

    Chef performs the following N-1 steps to learn the exact arrangement of the numbers on the die. In the i-th step, Chef pushes the die in some direction (there are 4 possible directions), and the die rolls 90o in this direction. The picture below demonstrates a die and the result that it produced after rolling in each of the 4 directions respectively. For this die, we have o(1)=4, o(2)=5, o(3)=6, o(4)=1, o(5)=2, o(6)=3.

    Chef records N numbers A1, A2, ..., AN, where Ai is the number written on the top of the die before the i-th step. However, the information on the direction in which he pushes the die each time are lost. Can you help h...
    from itertools import permutations

    def solve(n,a):
    ans=[]

    for des in desire:
    check=1
    for i in range(n-1):

    if (a[i]==a[i+1]):
    return [-1]
    if a[i+1]==des[a[i]-1]:
    check=0
    break
    if check:
    ans=des
    break
    if ans:
    return ans
    return [-1]


    per=permutations([1,2,3,4,5,6])
    desire=[]
    for p in per:
    check=1
    for i in range(1,7):
    if p[i-1]==i:
    check=0
    break
    if check:
    doublecheck=1
    for i in range(6):
    if p[p[i]-1]!=i+1:
    doublecheck=0
    break
    if doublecheck:
    desire.append(p)
    #print(desire)
    for _ in range(int(input())):

    n=int(input())
    a=list(map(int,input().split( )))
    print(*solve(n,a))
    def solve():
    n = int(input())
    lst = list(map(int,input().split()))
    if sum(lst) <= n // 2:
    print(n//2)
    print("0 " * (n // 2))
    else:
    print(n//2 + (n // 2) % 2)
    print("1 " * (n//2 + (n // 2) % 2))
    for i in range(int(input())):
    solve()
    import sys
    input = lambda: sys.stdin.readline().rstrip()

    T = int(input())
    for _ in range(T):
    N = int(input())
    A = [int(a) for a in input().split()]

    if max(A) == min(A):
    print(1)
    print(([1] * N))
    elif N % 2 == 0:
    print(2)
    print(
    ([1, 2] * (N // 2)))
    else:
    for i in range(N):
    if A[i-1] == A[i]:
    print(2)
    print((([1, 2] * N)[:i][::-1] + ([1, 2] * N)[:N-i]))
    break
    else:
    print(3)
    print(
    ([3] + [1, 2] * (N // 2)))

    import numpy as np

    N=10**6+1
    t=eval(input())
    inp = ()

    t1=ord('z')
    #bag=[[0 for _ in xrange(t1)] for _ in xrange(N+1)]
    bag=np.zeros((N+1,t1),dtype=np.int)
    #print bag
    while t:
    t-=1
    inp=input().split()
    t2=ord(inp[3]) - ord('a')
    t3=int(inp[1])
    t4=int(inp[2]) + 1
    if inp[0]=="1":
    #print "enter"
    bag[t3][t2]+=int(inp[2])


    if inp[0]=="2":
    sum=0
    for i in range(t3,t4):
    sum+=bag[i][t2]
    print(sum)

    #
    # for j in range(ord('z')-ord('a')):
    # for i in range(N+1):
    # if bag[i][j]!=0:
    # print bag[i][j] ,i,j



    # from math import log2
    # N = 10000
    # for i in range(1,N):
    # # print(i)
    # for m in range(i):
    # if( (m^(m+1))==i ):
    # print(i)
    # print(m,m+1,bin(m)[2:])
    # print()
    # break
    # # else:
    # # print(-1)
    # # print()
    T = int(input())
    ans = []

    for _ in range(T):
    N = int(input())

    # x = log2(N+1)
    if(N==1):
    ans.append(2)
    elif('0' not in bin(N)[2:]):
    ans.append(N//2)
    else:
    ans.append(-1)

    for i in ans:
    print(i)
    # from math import log2
    # N = 10000
    # for i in range(1,N):
    # # print(i)
    # for m in range(i):
    # if( (m^(m+1))==i ):
    # print(i)
    # print(m,m+1,bin(m)[2:])
    # print()
    # break
    # # else:
    # # print(-1)
    # # print()
    T = int(input())
    ans = []

    for _ in range(T):
    N = int(input())

    # x = log2(N+1)
    if(N==1):
    ans.append(2)
    elif('0' not in bin(N)[2:]):
    ans.append(N//2)
    else:
    ans.append(-1)

    for i in ans:
    print(i)
    # from math import log2
    # N = 10000
    # for i in range(1,N):
    # # print(i)
    # for m in range(i):
    # if( (m^(m+1))==i ):
    # print(i)
    # print(m,m+1,bin(m)[2:])
    # print()
    # break
    # # else:
    # # print(-1)
    # # print()
    T = int(input())
    ans = []

    for _ in range(T):
    N = int(input())

    # x = log2(N+1)
    if(N==1):
    ans.append(2)
    elif('0' not in bin(N)[2:]):
    ans.append(N//2)
    else:
    ans.append(-1)

    for i in ans:
    print(i)
    DevuLand is a very strange place. There are n villages in it. Some of the villages are occupied by dinosaurs while the remaining ones by villagers.
    You are given the information of DevuLand
    by an array D of size n. If D[i] is non-negative, it means that there are D[i] villagers in that village.
    Otherwise, it means that are -D[i]
    dinosaurs in that village.

    It is also guaranteed that total number of villagers in DevuLand is equal to total number of dinosaurs.

    Once dinosaurs got very hungry and started eating villagers. Frightened villagers gathered immediately and met their Sarpanch Deviji. Deviji, being a very daring and negotiable person, met to the head
    of dinosaurs. Soon both parties called a truce. It was decided that the villagers will provide laddus to
    the dinosaurs. So everyday, each villager will take exactly one laddu to one of the dinosaurs in such a way that no dinosaur remains hungry (note that this is possible because number of villagers is the same as the numbe...
    # cook your dish here
    for _ in range(int(input())):
    n = int(input())
    a = list(map(int, input().split()))
    curr = 0
    ans = 0
    for x in a:
    curr += x
    ans += abs(curr)
    print(ans)
    from collections import deque

    T=int(input())

    def break_down(num):

    count=0

    while(len(num)!=1):

    temp=0

    for i in range(0,len(num)):

    temp=temp+int(num[i])

    num=str(temp)

    count=count+1

    return (int(num),count)

    def digit_sum(num):

    temp=0

    for i in range(0,len(num)):

    temp=temp+int(num[i])

    num=temp

    return (num)

    while(T):

    queue=deque()

    count_n=0

    count_d=0

    T=T-1

    N,d=[i for i in input().split()]

    n,count_n=break_down(N)

    D,count_D=break_down(d)

    dic={}

    if(D==1 or D==2 or D==4 or D==5 or D==7 or D==8):

    mini=1

    elif(D==3 or D==6):

    mini=min(digit_sum(str(n+3)),digit_sum(str(n+6)),digit_sum(str(n+9)))

    else:

    mini=n

    queue.append((int(N),0))

    ele=int(N)

    count=0

    while(len(queue)!=0):

    ele,count=queue.popleft()

    if(ele==mini):

    break

    else:

    if(len(str(ele))==1):

    temp1=ele+int(d)

    queue.append((temp1,count+1))...
    # cook your dish here
    test_cases = int(input())
    for i in range(test_cases):
    no_of_elements = int(input())
    sequence = list(map(int, input().split()))
    d1 = sequence[1] - sequence[0]
    d2 = sequence[2] - sequence[1]
    d3 = (sequence[3] - sequence[0])/3
    d4 = (sequence[3] - sequence[1])/2
    d5 = (sequence[2] - sequence[0])/2

    if (d2 == d4):
    d = d2

    elif(d3 == d5):
    d = d3

    elif(d1 == d3):
    d = d1

    elif(d1 == d5):
    d = d1

    if (d == d1):
    for i in range(no_of_elements):
    sequence[i] = int(sequence[0] + i*d)
    else:
    for i in range(no_of_elements):
    sequence[i] = int(sequence[-1] - ((no_of_elements - i - 1)*d))

    for i in sequence:
    print(i, end=" ")

    print('\n')


    from collections import Counter
    try:
    for _ in range(int(input())):
    n=int(input())
    s=input()
    d1=dict(Counter(s))

    u,d,r,l=0,0,0,0
    if 'U' in d1:
    u=d1['U']
    else:
    u=0
    if 'D' in d1:
    d=d1['D']
    else:
    d=0
    if 'R' in d1:
    r=d1['R']
    else:
    r=0
    if 'L' in d1:
    l=d1['L']
    else:
    l=0
    x=0
    y=0
    if l==r:
    x=0
    elif l>r:
    x=-(l-r)
    elif r>l:
    x=r-l
    if u==d:
    y=0
    elif d>u:
    y=-(d-u)
    elif u>d:
    y=u-d
    # print(x,y)
    if x==0 and y==0:
    print(n)
    continue

    print(n-(abs(x)+abs(y)))
    except:
    pass
    from bisect import bisect_left, insort_left
    a = []
    n = int(input())
    for _ in range(n):
    #print(a)
    s, d = list(map(int, input().split()))
    if len(a) == 0:
    print(s, s+d - 1)
    a.append((s, s + d - 1))
    continue
    p = bisect_left(a, (s, s + d - 1))
    #print('p', p)
    ok = True
    if p > 0 and a[p-1][1] >= s:
    ok = False
    if p < len(a) and a[p][0] <= s + d - 1:
    ok = False
    if ok:
    insort_left(a, (s, s + d - 1))
    print(s, s + d - 1)
    else:
    ok = False
    for i in range(len(a)):
    if i == 0:
    if a[0][0] > d:
    print(1,d)
    a = [(1, d)] + a
    ok = True
    break
    else:
    if a[i - 1][1] + d < a[i][0]:
    print(a[i - 1][1] + 1, a[i - 1][1] + d)
    insort_left(a, (a[i - 1][1] + 1, a[i - 1][1] + d))
    ok = True
    break
    ...
    import fractions
    for t in range(int(input())):
    h,u,d = list(map(int,input().split()))
    g = fractions.gcd(u,d)
    if (h%g!=0):
    print(-1)
    else:
    m = 0
    n = 0
    while (True):
    n = (float(m)*u-h)/d
    if (n>0 and int(n) == n):
    break
    m+=1
    print(int(m+n))
    import fractions
    for t in range(int(input())):
    h,u,d = list(map(int,input().split()))
    g = fractions.gcd(u,d)
    if (h%g!=0):
    print(-1)
    else:
    m = 0
    n = 0
    while (True):
    n = (float(m)*u-h)/d
    if (n>0 and int(n) == n):
    break
    m+=1
    print(int(m+n))
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 128,
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Evaluation Dataset

code-retrieval-combined-v2

  • Dataset: code-retrieval-combined-v2 at 2b971a6
  • Size: 31,516 evaluation samples
  • Columns: query and positive
  • Approximate statistics based on the first 1000 samples:
    query positive
    type string string
    details
    • min: 5 tokens
    • mean: 42.73 tokens
    • max: 834 tokens
    • min: 30 tokens
    • mean: 180.42 tokens
    • max: 1024 tokens
  • Samples:
    query positive
    This gets the version of OpenALPR

    :return: Version information
    def get_version(self):
    """
    This gets the version of OpenALPR

    :return: Version information
    """

    ptr = self._get_version_func(self.alpr_pointer)
    version_number = ctypes.cast(ptr, ctypes.c_char_p).value
    version_number = _convert_from_charp(version_number)
    self._free_json_mem_func(ctypes.c_void_p(ptr))
    return version_number
    Remove all unnecessary comments from a lexer or parser file public String stripUnnecessaryComments(String javaContent, AntlrOptions options) {
    if (!options.isOptimizeCodeQuality()) {
    return javaContent;
    }
    javaContent = stripMachineDependentPaths(javaContent);
    if (options.isStripAllComments()) {
    javaContent = stripAllComments(javaContent);
    }
    return javaContent;
    }
    Serialize reply to array or JSON.

    @param {Object} packet
    @param {String} packet.method "get", "search", "post", "put", "delete", "sub", "unsub".
    @param {String} packet.resource
    @param {String} packet.id
    @param {*} packet.body
    @param {Number} [packet.status]
    @param {Number|String} [packet.date]
    @param {Object} [packet.headers]
    @param {Boolean} [json] true to generate JSON instead of array.
    @returns {Array|String|null}
    function reply(packet, json) {
    return _create(packet, packet.status || 500, (METHODS[packet.method] || '') + packet.resource, json);
    }
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 128,
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 1024
  • per_device_eval_batch_size: 1024
  • num_train_epochs: 1
  • warmup_steps: 0.05
  • bf16: True
  • dataloader_num_workers: 4
  • load_best_model_at_end: True
  • push_to_hub: True
  • hub_model_id: modernbert-code-v4-hard-negatives
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 1024
  • per_device_eval_batch_size: 1024
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: None
  • warmup_steps: 0.05
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: modernbert-code-v4-hard-negatives
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss eval_cosine_ndcg@10
0.0738 20 0.9880 - -
0.1476 40 0.9529 0.3465 0.9286
0.2214 60 0.9726 - -
0.2952 80 0.9299 0.3351 0.9296
0.3690 100 0.9130 - -
0.4428 120 0.9187 0.3253 0.9325
0.5166 140 0.8940 - -
0.5904 160 0.9037 0.3186 0.9354
0.6642 180 0.8951 - -
0.738 200 0.8816 0.3121 0.9361
0.8118 220 0.8753 - -
0.8856 240 0.8649 0.3106 0.9359
0.9594 260 0.8575 - -
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.13
  • Sentence Transformers: 5.3.0
  • Transformers: 5.0.0
  • PyTorch: 2.10.0+cu128
  • Accelerate: 1.13.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
124
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for benjamintli/modernbert-code-v4-hard-negatives

Finetuned
(1)
this model

Datasets used to train benjamintli/modernbert-code-v4-hard-negatives

Papers for benjamintli/modernbert-code-v4-hard-negatives

Evaluation results