ModernBERT-small-v2 / README.md
johnnyboycurtis's picture
Update README.md
72a12a4 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:3375201
  - loss:MSELoss
widget:
  - source_sentence: >-
      What is Weboob. Weboob is a collection of applications able to interact
      with websites, without requiring the user to open them in a browser. It
      also provides well-defined APIs to talk to websites lacking one.
    sentences:
      - >-
        Moreno and colleagues (Mossio et al. 2009; Moreno & Mossio 2015) have
        also claimed that their organizational approach unifies across
        backwardlooking and forward-looking accounts by describing activities
        that atemporally account for the continuing persistence of traits.
      - average cost of a dj for a wedding 2015
      - >-
        CIALIS tablets should not be split, crushed or separated in any way. Do
        not split CIALIS tablets; the entire dose should be taken. Splitting or
        crushing may result in the patient receiving more or less than the
        desired dose. References. CIALIS [package insert].
  - source_sentence: >-
      Cement Impregnated Particle Board is a revolutionary, waterproof, cement
      impregnated acoustic floor panel designed to improve impact and airborne
      noise transfer through separating floors. Cement Impregnated Particle
      Board installed on top of R10 resilient insulation provides a very
      efficient and stable floating floor.
    sentences:
      - >-
        1. to estimate officially the value of (property) for tax purposes. 2.
        to determine the amount of (damages, a fine, etc.). 3. to impose a tax
        or other charge on:to assess members for painting the clubhouse. 4. to
        estimate or judge the value, character, etc., of; evaluate: to assess
        one's efforts.
      - "# Lageia\nLageia (Greek: Λάγεια [ˈlaʝa]; Turkish: Laya) is a small village in the Larnaca District of Cyprus, 7\_km west of Pano Lefkara. Its population in 2011 was 28.\n"
      - >-
        The carbohydrates in pineapples are mostly simple sugars, such as
        sucrose, fructose and glucose. They also contain some fiber. A cup (165
        grams) of pineapples contains 21.7 grams of carbs, and 2.3 grams of
        fiber, so there are 19.4 grams of digestible (net) carbs in each cup.
        The glycemic index value of pineapples can range from 45-66, which is in
        the medium range (4).
  - source_sentence: >-
      Representations (concepts) can be portrayed as partitions in
      multi-dimensional vector spaces. One example is a neuron activation vector
      space, where a point in this space represents one possible pattern of
      activity in all neurons in the network.
    sentences:
      - |-
        # Listeners Bounce Back After Being Laid Off 

         Last week we asked listeners who have been laid off to share their success stories. Many tolds us tales of finding work despite the tough economy.
      - >-
        The neurobiological uniqueness of the pain inhibitory system, contrasted
        with the mechanisms of other sensory modalities, renders pain processing
        atypical, which leads to the conclusion that pain experiences are
        atypical conscious events.
      - "# Robert Hampton Gray\nRobert Hampton \"Hammy\" Gray, VC, DSC (November 2, 1917 – August 9, 1945) was a Canadian naval officer, pilot, and recipient of the Victoria Cross during World War II. He and Eugene Esmonde are the only personnel of the Royal Navy's Fleet Air Arm to be decorated the VC in the war. Gray is the last Canadian to be awarded the Victoria Cross.\n\n## Early life\nGray was born in Trail, British Columbia, Canada, but resided from an early age in Nelson, where his father was a jeweller.\nHe completed one year at the University of Alberta before transferring to the Bachelor of Arts program at The University of British Columbia where he was a member of the Phi Delta Theta fraternity.\nBefore completing university, he enlisted in the Royal Canadian Naval Volunteer Reserve (RCNVR) at HMCS\_Tecumseh in Calgary, Alberta on July 18, 1940. Originally sent to England for training in September Gray decided to join the Fleet Air Arm. Gray began his training at HMS\_St Vincent in January 1941 then 24th Elementary Flying Training School in Luton by March. Gray was sent back to Canada to train at RCAF Station Kingston in June. Once completing his training in September, Gray was given the rank of sub-lieutenant and by November was sent back to England to train on the Hawker Hurricane at HMS\_Heron. While at HMS Heron Gray had the chance to meet his brother Jack, who played the role of an RCAF air gunner in the film Target for Tonight, before being killed in a air accident not long after. \nGray initially joined 757 Naval Air Squadron at Winchester, England at the end of February 1942 where he conducted further training.\n\n## War service\n\n### Africa and Norway\nGray was assigned to the African theatre in May 1942, flying Hawker Hurricanes for shore-based squadrons, nos. 795, 803, and 877, where he spent two years at Nairobi. In December Gray served for a brief time aboard the aircraft carrier HMS\_Illustrious and on December 31 was promoted to lieutenant.\nIn February 1944 Gray was transferred back to England where trained to fly the Vought F4U Corsair fighter with 748 Naval Air Squadron at HMS Heron and on August 14 he joined 1841 NAS, based on HMS\_Formidable. From August 24–29, Gray took part in the unsuccessful Operation Goodwood raids against the German battleship\_Tirpitz, in Norway. On August 29, Gray was Mentioned in Dispatches for his participation in an attack on three German destroyers, during which his plane's rudder was shot off. On January 16, 1945, he received a further Mention, \"For undaunted courage, skill and determination in carrying out daring attacks on the German battleship Tirpitz.\"\n\n### Japan\nOn April 4, 1945, Formidable joined the British Pacific Fleet which was involved in the invasion of Okinawa. On April 16, Gray led a flight of Corsairs during the attacks against Ishigaki and Miyako airfields on Okinawa. Gray only conducted combat air patrols for the remainder of April and into May. In the aftermath of the kamikaze strikes on Formidable, the ship returned to Sydney, Australia, on May 22 where Gray helped train replacements from May to July before returning to combat on July 17. On July 18, Gray led a strafing mission against airfields in the Tokyo area and another flight to the inland sea on July 24, which damaged one merchant ship, and damaged two seaplane bases and one airbase. Gray earned a Distinguished Service Cross for aiding in sinking a Japanese destroyer in the area of Tokyo on July 28. The award was not announced until August 21, 1945, when the notice appeared in the London Gazette with the citation, \"For determination and address in air attacks on targets in Japan\".\n\n#### VC action\nOn August 9, 1945, Gray original mission was to attack Matsushima airfield, however when it was realized the airfield was out of commission Gray was ordered to attack targets of opportunity. Having spotted Japanese shipping at Onagawa Bay, Miyagi Prefecture, Japan, early in the flight, Gray led the strike force towards the bay. A few hours after the atomic bombing of Nagasaki, Lieutenant Gray (flying Vought F4U Corsair KD658, with 151 as his insignia and an X on the aircraft's tail) led an attack on a group of Japanese naval vessels. Gray scored a direct hit upon the Etorofu-class escort ship Amakusa with a 500-lb bomb which passed through the engine room and detonated a magazine below the after gun turret. The resultant explosion blew out the ship's side and caused it to sink rapidly with the loss of 71 crewmen. Gray's plane was damaged by anti-aircraft fire and crashed into the bay.\nThe citation for his VC, gazetted on November 13, 1945, described as being:\nfor great valour in leading an attack on a Japanese destroyer in Onagawa Wan, on 9 August 1945. In the face of fire from shore batteries and a heavy concentration of fire from some five warships Lieutenant Gray pressed home his attack, flying very low in order to ensure success, and, although he was hit and his aircraft was in flames, he obtained at least one direct hit, sinking the destroyer. Lieutenant Gray has consistently shown a brilliant fighting spirit and most inspiring leadership.\nGray was one of the last Canadians to die during World War II, and was the last Canadian to be awarded the Victoria Cross. His VC is owned by the Gray family.\n\n## Awards and decorations\nGray's personal awards and decorations include the following:\n| Ribbon | Description                                      | Notes                                            |\n|        | Victoria Cross                                   | - Citation for Victoria Cross (VC)               |\n|        | Distinguished Service Cross (DSC)                | - Citation for Distinguished Service Cross (DSC) |\n|        | 1939–1945 Star                                   | - WWII 1939–1945                                 |\n|        | Atlantic Star                                    | - WWII 1939–1945                                 |\n|        | Africa Star                                      | - WWII 1939–1945                                 |\n|        | Pacific Star                                     | - WWII 1939–1945                                 |\n|        | Defence Medal (United Kingdom)                   | - WWII 1939–1945                                 |\n|        | Canadian Volunteer Service Medal                 | - WWII 1939–1945 with Overseas Service bar       |\n|        | War Medal 1939–1945 with Mentioned in dispatches | - WWII 1939-1945                                 |\n\n\n## Legacy\nAs Gray's remains were never found, he was listed as missing in action and presumed dead. He is commemorated, with other Canadians who died or were buried at sea during the First and Second World Wars, at the Halifax Memorial in Point Pleasant Park, Halifax, Nova Scotia. \nThe War Memorial Gym at University of British Columbia, Royal Canadian Legion hall in Nelson, numerous other sites in Nelson, and the wardroom of HMCS Tecumseh (his RCNVR home unit) also bear plaques in his honour.\nGray is one of fourteen figures commemorated at the Valiants Memorial in Ottawa.\nA memorial for Gray was erected at Onagawa Bay in 1989 in Sakiyama Park. This is the only memorial dedicated to a foreign soldier on Japanese soil. Following the devastation of the March 11, 2011 earthquake (during which the granite monument itself was knocked over), the monument (with new plaque) was moved from its original location in Sakiyama Park to one beside the hospital (Onagawacho Community Medicine Center) in Onagawa Town. A rededication ceremony was held August 24, 2012.\nTo celebrate the Centennial of the Canadian Navy, during the 2010 air show season, Vintage Wings of Canada flew at events across Canada in a Corsair bearing the markings of the plane Gray was likely flying that fateful day.\nHis life is recorded in A Formidable Hero: Lt. R.H. Gray, VC, DSC, RCNVR by Stuart E. Soward, published by Trafford Neptune.\n\n### Grays Peak, British Columbia\nOn March 12, 1946, the Geographic Board of Canada named a mountain in Kokanee Glacier Provincial Park, British Columbia, after Gray and his brother, Flt Sgt John Balfour Gray, RCAF, who was also killed in World War II. Rising to a height of 2,753\_m (9,032\_ft), Grays Peak is well known in Canada as the mountain pictured on the label of Kokanee Beer.\n\n### Hampton Gray Memorial Elementary\nThe elementary school at CFB Shearwater is named after Gray.\n\n### Kingston Norman Rogers Airport\nGray completed his training at No. 31 Service Flying Training School in Kingston, Ontario. There is a Harvard aircraft, same type of trainer he flew at Kingston, mounted on a pedestal with a memorial dedicated to him. Additionally, the road leading to the airport terminal has been named Hampton Gray Gate.\n\n### Royal Canadian Sea Cadets\nThe Royal Canadian Sea Cadet Corps in Nelson, BC is named 81 Hampton Gray, VC Royal Canadian Sea Cadet Corps.\n\n### Royal Canadian Air Cadets\nIn 2012, the Royal Canadian Air Cadets created a new squadron in his honour called 789 Lt. R. Hampton Gray VC Squadron which is located in Mississauga, Ontario.\n\n### Harry DeWolf-class offshore patrol vessel\nThe sixth Harry DeWolf-class offshore patrol vessel for the Royal Canadian Navy will be named for Gray.\n\n### Brechin, Angus, Scotland\nThe Gray family headstone in Brechin Cemetery was completely restored in 2021 after it had fallen into a state of disrepair. (The main headstone had been removed from its plinth and positioned on the adjacent grass). The work was carried out and funded by locals. On the 76th anniversary of his death and VC action a short service was conducted at the family grave. The headstone carries the inscriptions for Robert and his brother Flight Sergeant John (Jack) Balfour Gray, RCAF. He was killed on February 27, 1942 serving with 144 Squadron RAF. He is buried in Doncaster (Rosehill) Cemetery.\nA new housing development in Brechin will feature a street named after Robert Hampton Gray, Hampton Gray Way.\n"
  - source_sentence: >+
      # The Wishing-Table

      The Wishing-Table (German: Tischlein, deck dich) is a 1956 West German
      family film directed by Fritz Genschow and starring Werner Stock, Wolfgang
      Draeger and Harald Dietl. It is based on the story of the same name by the
      Brothers Grimm.


      ## Cast

      - Werner Stock as Tailor

      - Wolfgang Draeger as Peter

      - Harald Dietl as Paul

      - Horst Keitel as Hans

      - Rita-Maria Nowotny as Kathy

      - Wulf Rittscher as Innkeeper

      - Fritz Genschow as Woodworker

      - Sigrid Hackenberg as Marie

      - Renée Stobrawa as Kathy's aunt

      - Karola Ebeling as Liesel

      - York Bertram as Charburner

      - Otto Czarski as Robber

      - Joachim Rödel as Robber

      - Alexander Welbat as Robber

      - Lutz Götz as Mayor

      - Theodor Vogeler as Carpenter

      - Nora Brand as Neighbor

      - Otto Lengwinat as Miller

      - Egon Stief as Servant



      ## Bibliography

      - Jill Nelmes & Jule Selbo. Women Screenwriters: An International Guide.
      Palgrave Macmillan, 2015.

    sentences:
      - >-
        Low-Cost Feline Spay/Neuter The Michigan Humane Society offers low-cost
        cat and kitten spay/neuter services for the pets of residents of
        southeast Michigan. At an everyday price of just $50 per male cat or
        kitten, and $65 per female cat or kitten, a savings of more than $100
        from the regular price, the price includes the procedure,
        hospitalization, and anesthesia.
      - >-
        Annual ryegrass is primarily used for pastures and quick cover in
        erosion control plantings. In the South, it is used as a winter annual
        for overseeding warm season grasses. Annual ryegrass is quite similar to
        perennial ryegrass except it is an annual or biennial, depending on
        climate and/or length or growing season.
      - >-
        An AA meeting may take one of several forms, but at any meeting you will
        find alcoholics talking about what drinking did to their lives and
        personalities, what actions they took to help themselves, and how they
        are living their lives today. Click here to learn more about AA
        meetings.
  - source_sentence: >
      # Breda Holmes

      Breda Holmes is a former camogie player, winner of the B+I Star of the
      Year award in 1987 and seven All Ireland medals in succession between 1984
      and 1991, celebrating the seventh by scoring the match-turning goal from
      Ann Downey’s sideline ball against Cork in the 1991 final.


      ## Career

      She captained Carysfort Training College in their 1984 Purcell Cup
      campaign and won six All Ireland club medals with St Paul’s camogie club,
      based in Kilkenny city.
    sentences:
      - >-
        What is Intellectual Property? Intellectual property (IP) refers to
        creations of the mind, such as inventions; literary and artistic works;
        designs; and symbols, names and images used in commerce. IP is protected
        in law by, for example, patents, copyright and trademarks, which enable
        people to earn recognition or financial benefit from what they invent or
        create.
      - "# Kieran Djilali\nKieran Stephen Larbi Allen-Djilali (born 1 January 1991), more commonly known as Kieran Djilali, is an English former footballer who played as a midfielder. He played in the Football League with Crystal Palace, Chesterfield, AFC Wimbledon and Portsmouth.\n\n## Early life\nDjilali attended Dunraven School in Streatham.\n\n## Club career\n\n### Crystal Palace and loans\nBorn in Lambeth, London, Djilali came through the academy at Crystal Palace, going on trial with Manchester United in mid-2007.\nDjilali made his Palace debut aged 17, as a substitute in a 2–1 Football League Cup victory over Hereford United. This was followed quickly by a string of first-team appearances in which he impressed.\nDjilali joined Conference Premier side Crawley Town on a month-long loan on 1 September 2009. He returned from his loan spell early in late September, having made 5 league appearances.\nOn 13 November, he moved on loan to League Two side Chesterfield, where he scored his first career goal in a game against Darlington on 21 November 2009. On 15 December 2009, his loan was extended by a further month.\nHe returned to Crystal Palace following his loan spell on 12 January 2010, and scored his first goal for Palace against Doncaster Rovers on 27 February 2010. He began the following season in Palace's first team but dropped out as manager George Burley sought to bring in more experienced players.\nIn February, he returned to Chesterfield for a second loan spell. On 23 March, his loan at Chesterfield was extended to 16 April. Djilali scored once in 10 matches as Chesterfield were promoted to League One at the end of the season.\nWhen his contract at parent club Crystal Palace expired, he opted to leave Selhurst Park in the summer of 2011 to seek more game time.\n\n### AFC Wimbledon\nIn July 2011, Djilali played on trial for Scunthorpe United, but ended up signing for League Two club AFC Wimbledon on 26 August. On 3 September, he made his debut for the club, against Port Vale. On 10 March 2012, he scored his first goal for the club, against Dagenham & Redbridge. In May 2012, Djilali was released from the club as his contract expired.\n\n### Portsmouth\nOn 16 August 2012, Djilali signed a one-month contract with League One side Portsmouth. He made his debut in a 1–1 draw with Bournemouth on the opening day of the League One season, but was released after just two weeks due to Portsmouth's tight wage budget, with manager Michael Appleton putting Djilali's release down to his lack of fitness.\n\n### Return to AFC Wimbledon\nOn 16 November 2012, Djilali re-signed for AFC Wimbledon on a short-term deal. Manager Neal Ardley said of the move: \"Kieran has been with us for a month now. He has trained well and showed a very good attitude. He has the potential to play at a higher level but first he needs to prove himself with us. With the busy winter period coming on, we thought we should augment the squad and take the chance to have a good look at him in competitive action.\" Djilali was released by AFC Wimbledon on 31 January 2013.\n\n### Sligo Rovers\nIn March 2013, Djilali signed a contract with League of Ireland champions Sligo Rovers. He made his debut on 8 March, against Derry City. On 18 March, he scored his first goal for Sligo, against Bray Wanderers.\nDjilali... North... ELDING\n\n### Limerick\nIn July 2014, Djilali signed with League of Ireland side Limerick.\n\n### Cork City\nOn 21 November 2014, Cork City announced the signing of Kieran Djilali from Munster rivals, Limerick ahead of the 2015 season. The winger made his debut as a substitute against former club, Sligo Rovers in a 1–1 draw at The Showgrounds. He scored his first goal for the Rebel Army after coming on late against Bray Wanderers, scoring the vital winning goal in a dramatic 1–0 victory.\nWhilst at Cork, Djilali suffered a knee injury which he never fully recovered from and led to him leaving full-time football following his departure from the club.\n\n### Dulwich Hamlet\nAfter leaving Cork City, and following a period out of the game whilst he recovered from injury, Djilali joined Dulwich Hamlet of the Isthmian League Premier Division in September 2016, going on to make his debut as a substitute against Grays Athletic in the Isthmian League Cup on 13 September 2016.\n\n### Three Bridges\nAfter making three substitute appearances in all competitions for Dulwich Hamlet, Djilali joined Three Bridges of the Isthmian League South Division on 17 October 2016.\n\n## After football\nAfter Djilali left the League of Ireland and full-time football, he took up youth football coaching. He attained a UEFA B License and worked as a coach at Fulham's academy, and also operated his own coaching business.\n\n## Honours\nChesterfield\n- Football League Two (1): 2010–11\n\nSligo Rovers\n- FAI Cup (1): 2013\n- Setanta Sports Cup (1): 2014\n\n\n## Statistics\nAs of 20 July 2013\n| Club                | Season       | League | League | Cup  | Cup   | League Cup | League Cup | Other[A] | Other[A] | Total | Total |\n| Club                | Season       | Apps   | Goals  | Apps | Goals | Apps       | Goals      | Apps     | Goals    | Apps  | Goals |\n| ------------------- | ------------ | ------ | ------ | ---- | ----- | ---------- | ---------- | -------- | -------- | ----- | ----- |\n| Crystal Palace      | 2008–09      | 6      | 0      | 0    | 0     | 2          | 0          | –        | –        | 8     | 0     |\n| Crystal Palace      | 2009–10      | 8      | 1      | 2    | 0     | 1          | 0          | –        | –        | 11    | 1     |\n| Crystal Palace      | 2010–11      | 14     | 0      | 0    | 0     | 2          | 0          | –        | –        | 16    | 0     |\n| Crystal Palace      | Total        | 28     | 1      | 2    | 0     | 5          | 0          | –        | –        | 35    | 1     |\n| Crawley (loan)      | 2009–10      | 5      | 0      | 0    | 0     | 0          | 0          | –        | –        | 5     | 0     |\n| Chesterfield (loan) | 2009–10      | 8      | 1      | 0    | 0     | 0          | 0          | –        | –        | 8     | 1     |\n| Chesterfield (loan) | 2010–11      | 10     | 1      | 0    | 0     | 0          | 0          | –        | –        | 10    | 1     |\n| AFC Wimbledon       | 2011–12      | 12     | 1      | 1    | 0     | 0          | 0          | 1        | 0        | 14    | 1     |\n| Portsmouth          | 2012–13      | 1      | 0      | 0    | 0     | 0          | 0          | –        | –        | 1     | 0     |\n| AFC Wimbledon       | 2012–13      | 5      | 0      | 0    | 0     | 0          | 0          | –        | –        | 5     | 0     |\n| Sligo Rovers        | 2013         | 17     | 3      | 1    | 0     | 2          | 0          | 3        | 0        | 23    | 3     |\n| Career total        | Career total | 86     | 7      | 4    | 0     | 7          | 0          | 4        | 0        | 101   | 7     |\n\nA.\_^  The \"Other\" column constitutes appearances (including substitutions) and goals in either the Football League Trophy, the Setanta Cup and the UEFA Champions League.\n"
      - >-
        10 Most Famous Soccer Stadiums in the World. The Camp Nou with its
        capacity of 99,354 is the largest stadium in Europe and also the fourth
        largest soccer stadium in the world. It is situated in Barcelona,
        Catalonia, Spain, and is the home of Spanish club Barcelona since 1957.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - negative_mse
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: SentenceTransformer
    results:
      - task:
          type: knowledge-distillation
          name: Knowledge Distillation
        dataset:
          name: mse dev
          type: mse-dev
        metrics:
          - type: negative_mse
            value: -77.74003601074219
            name: Negative Mse
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: NanoMSMARCO
          type: NanoMSMARCO
        metrics:
          - type: cosine_accuracy@1
            value: 0.32
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.52
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.6
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.76
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.32
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.1733333333333333
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.12000000000000002
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.07600000000000001
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.32
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.52
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.6
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.76
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.5250944624924359
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.4523412698412697
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.4623987053582962
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: NanoHotpotQA
          type: NanoHotpotQA
        metrics:
          - type: cosine_accuracy@1
            value: 0.52
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.76
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.78
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.84
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.52
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.33333333333333326
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.22
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.122
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.26
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.5
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.55
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.61
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.5456863439791646
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.6494444444444444
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.47358422601023775
            name: Cosine Map@100
      - task:
          type: nano-beir
          name: Nano BEIR
        dataset:
          name: NanoBEIR mean
          type: NanoBEIR_mean
        metrics:
          - type: cosine_accuracy@1
            value: 0.42000000000000004
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.64
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.69
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.42000000000000004
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.2533333333333333
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.17
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.099
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.29000000000000004
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.51
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.575
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.685
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.5353904032358002
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.5508928571428571
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.46799146568426697
            name: Cosine Map@100

SentenceTransformer

ModernBERT-small-v2 represents an efficient approach to creating highly efficient and accurate dense vector encoders. It leverages a small ModernBERT architecture, simple MLM training, and distillation from a larger performant model to achieve superior performance at a lower computational cost compared to standard large models.

Key Features & Training Methodology

This model was created using a specialized four-stage pipeline:

  1. Deep & Narrow Architecture: Unlike typical small models (e.g., 6 layers), this student model features 12 Transformer layers but operates within a narrow 384-dimensional embedding space. This depth allows for complex multi-hop reasoning crucial for high-accuracy retrieval tasks, while the narrow dimension ensures extremely fast encoding and small index sizes.

  2. Guided Initialization (GUIDE): The model did not start from random weights. It inherited structural and semantic knowledge from a larger teacher model (answerdotai/ModernBERT-base) via Principal Component Analysis (PCA) Projection. This technique surgically compressed the teacher's 768-dimensional knowledge into the student's 384-dimensional space, providing a massive "head start."

  3. Extensive MLM Pre-training: Following initialization, the model underwent comprehensive Masked Language Modeling (MLM) pre-training on a highly diverse corpus combining:

    • Search Data (MS MARCO)
    • Academic Texts (Stanford Philosophy)
    • General Knowledge (NPR, FineWiki)
  4. Knowledge Distillation (STS Tuning): The final, critical stage optimized the model for semantic similarity. It was trained to mimic the output embeddings of a powerful Retrieval Teacher (Alibaba-NLP/gte-modernbert-base) using Mean Squared Error (MSE) loss. This specialized tuning ensures its 384-dimensional vectors excel at similarity and retrieval tasks.

Training

The final model, ModernBERT-small-v2, was trained using a curated combination of four distinct datasets during the MLM Pre-training phase to ensure broad general knowledge acquisition before the final distillation tuning.

GitHub: semantic-search-models/ModernBERT-small-v2

The following datasets were integrated and processed:

  1. MS MARCO Triplets (sentence-transformers/msmarco-msmarco-MiniLM-L6-v3, "triplet" split)
    • Source Focus: Query/Document ranking (Search Relevance).
  2. Stanford Encyclopedia of Philosophy Triplets (johnnyboycurtis/Philosophical-Triplets-Retrieval)
    • Source Focus: Deep, technical, and abstract academic reasoning.
  3. NPR Articles (sentence-transformers/npr)
    • Source Focus: Modern news, journalistic style, and general current events.
  4. FineWiki (English) (HuggingFaceFW/finewiki, "en" split)
    • Source Focus: Encyclopedic, factual knowledge spanning a wide range of topics.
    • Only used in distillation training; not used in MLM.

(Note: During the final Knowledge Distillation phase, the targets were generated using embeddings from the teacher model (Alibaba-NLP/gte-modernbert-base) based on the combined text content of this merged corpus.)

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • parquet

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

import torch
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("johnnyboycurtis/ModernBERT-small-v2", model_kwargs={"attn_implementation": "flash_attention_2", "dtype": torch.bfloat16}) # or use "sdpa"

# Run inference
sentences = [
    '# Breda Holmes\nBreda Holmes is a former camogie player, winner of the B+I Star of the Year award in 1987 and seven All Ireland medals in succession between 1984 and 1991, celebrating the seventh by scoring the match-turning goal from Ann Downey’s sideline ball against Cork in the 1991 final.\n\n## Career\nShe captained Carysfort Training College in their 1984 Purcell Cup campaign and won six All Ireland club medals with St Paul’s camogie club, based in Kilkenny city.\n',
    'What is Intellectual Property? Intellectual property (IP) refers to creations of the mind, such as inventions; literary and artistic works; designs; and symbols, names and images used in commerce. IP is protected in law by, for example, patents, copyright and trademarks, which enable people to earn recognition or financial benefit from what they invent or create.',
    '10 Most Famous Soccer Stadiums in the World. The Camp Nou with its capacity of 99,354 is the largest stadium in Europe and also the fourth largest soccer stadium in the world. It is situated in Barcelona, Catalonia, Spain, and is the home of Spanish club Barcelona since 1957.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.2616, 0.5490],
#         [0.2616, 1.0000, 0.3196],
#         [0.5490, 0.3196, 1.0000]])

Evaluation

Metrics

Knowledge Distillation

Metric Value
negative_mse -77.74

Information Retrieval

Metric NanoMSMARCO NanoHotpotQA
cosine_accuracy@1 0.32 0.52
cosine_accuracy@3 0.52 0.76
cosine_accuracy@5 0.6 0.78
cosine_accuracy@10 0.76 0.84
cosine_precision@1 0.32 0.52
cosine_precision@3 0.1733 0.3333
cosine_precision@5 0.12 0.22
cosine_precision@10 0.076 0.122
cosine_recall@1 0.32 0.26
cosine_recall@3 0.52 0.5
cosine_recall@5 0.6 0.55
cosine_recall@10 0.76 0.61
cosine_ndcg@10 0.5251 0.5457
cosine_mrr@10 0.4523 0.6494
cosine_map@100 0.4624 0.4736

Nano BEIR

  • Dataset: NanoBEIR_mean
  • Evaluated with NanoBEIREvaluator with these parameters:
    {
        "dataset_names": [
            "MSMARCO",
            "HotpotQA"
        ],
        "dataset_id": "sentence-transformers/NanoBEIR-en"
    }
    
Metric Value
cosine_accuracy@1 0.42
cosine_accuracy@3 0.64
cosine_accuracy@5 0.69
cosine_accuracy@10 0.8
cosine_precision@1 0.42
cosine_precision@3 0.2533
cosine_precision@5 0.17
cosine_precision@10 0.099
cosine_recall@1 0.29
cosine_recall@3 0.51
cosine_recall@5 0.575
cosine_recall@10 0.685
cosine_ndcg@10 0.5354
cosine_mrr@10 0.5509
cosine_map@100 0.468

Training Details

Training Dataset

parquet

  • Dataset: parquet
  • Size: 3,375,201 training samples
  • Columns: text and label
  • Approximate statistics based on the first 1000 samples:
    text label
    type string list
    details
    • min: 5 tokens
    • mean: 280.41 tokens
    • max: 1024 tokens
    • size: 384 elements
  • Samples:
    text label
    # Scientists Link Diamonds To Earth's Quick Cooling

    Scientists say they have evidence the Earth was bombarded by meteors about 13,000 years ago, triggering a 1,000-year cold spell. Researchers write in the journal Science that they have found a layer of microscopic diamonds scattered across North America. An abrupt cooling may have caused many large mammals to become extinct.
    [4.6171875, 2.515625, 2.439453125, -1.4853515625, -6.328125, ...]
    # Brad Giffen
    Brad Giffen is a retired Canadian news anchor who has worked on television in both Canada and the United States.
    Over his broadcasting career he has also worked as a radio personality, disc jockey, VJ, television reporter, television producer and voice-over artist.

    ## Broadcasting career
    Giffen studied at the Poynter Institute for Advanced Journalism Study. In the late 1980s he was a broadcaster on CHUM-FM radio station in Toronto, Ontario, Canada. He previously was John Majhor's successor veejay on CITY-TV's music video program Toronto Rocks. and he hosted the CBC Television battle of the bands competition Rock Wars.
    In 1990, Giffen pivoted to news journalism and became a reporter for CFTO's nightly news program World Beat News (later rebranded as CFTO News in early 1998, and CTV News in 2005).
    In 1993, Giffen moved to the United States and became co-anchor of the nightly news on the Fox affiliate KSTU, in Salt Lake City, Utah. Giffen left that post in 1995 to accept ...
    [-1.693359375, 13.3828125, 4.50390625, 0.41064453125, -2.884765625, ...]
    # How Trump Won, According To The Exit Polls

    Donald Trump will be the next president of the United States. That's remarkable for all sorts of reasons: He has no governmental experience, for example. And many times during his campaign, Trump's words inflamed large swaths of Americans, whether it was his comments from years ago talking about grabbing women's genitals or calling Mexican immigrants in the U.S. illegally "rapists" and playing up crimes committed by immigrants, including drug crimes and murders. But right now, it's also remarkable because almost no one saw it coming. All major forecasters predicted a Hillary Clinton win, whether moderately or by a landslide. So what happened? We don't know just yet why pollsters and forecasters got it wrong, but here's what made this electorate so different from the one that elected Barack Obama by 4 points in 2012. To be clear, it's impossible to break any election results out into fully discrete demographic groups or trends — race, gend...
    [3.4296875, 12.828125, 2.8203125, -5.47265625, -5.390625, ...]
  • Loss: MSELoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • learning_rate: 0.0001
  • num_train_epochs: 2
  • warmup_steps: 0.1
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 0.0001
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: None
  • warmup_steps: 0.1
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • enable_jit_checkpoint: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • use_cpu: False
  • seed: 42
  • data_seed: None
  • bf16: False
  • fp16: True
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: -1
  • ddp_backend: None
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • auto_find_batch_size: False
  • full_determinism: False
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • use_cache: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss mse-dev_negative_mse NanoMSMARCO_cosine_ndcg@10 NanoHotpotQA_cosine_ndcg@10 NanoBEIR_mean_cosine_ndcg@10
0.0019 100 4.2698 - - - -
0.0038 200 4.2304 - - - -
0.0057 300 4.1280 - - - -
0.0076 400 3.8576 - - - -
0.0095 500 3.1561 - - - -
0.0114 600 2.5527 - - - -
0.0133 700 2.3275 - - - -
0.0152 800 2.2656 - - - -
0.0171 900 2.2401 - - - -
0.0190 1000 2.2256 -221.2144 0.0514 0.0577 0.0545
0.0209 1100 2.2140 - - - -
0.0228 1200 2.1920 - - - -
0.0247 1300 2.1840 - - - -
0.0265 1400 2.1662 - - - -
0.0284 1500 2.1598 - - - -
0.0303 1600 2.1452 - - - -
0.0322 1700 2.1226 - - - -
0.0341 1800 2.1068 - - - -
0.0360 1900 2.0941 - - - -
0.0379 2000 2.0796 -206.8865 0.1481 0.0672 0.1077
0.0398 2100 2.0621 - - - -
0.0417 2200 2.0545 - - - -
0.0436 2300 2.0382 - - - -
0.0455 2400 2.0267 - - - -
0.0474 2500 2.0167 - - - -
0.0493 2600 2.0041 - - - -
0.0512 2700 1.9902 - - - -
0.0531 2800 1.9746 - - - -
0.0550 2900 1.9650 - - - -
0.0569 3000 1.9539 -194.5440 0.1243 0.1242 0.1243
0.0588 3100 1.9401 - - - -
0.0607 3200 1.9317 - - - -
0.0626 3300 1.9181 - - - -
0.0645 3400 1.9098 - - - -
0.0664 3500 1.8983 - - - -
0.0683 3600 1.8924 - - - -
0.0702 3700 1.8806 - - - -
0.0721 3800 1.8717 - - - -
0.0740 3900 1.8591 - - - -
0.0758 4000 1.8525 -184.2026 0.1647 0.1745 0.1696
0.0777 4100 1.8416 - - - -
0.0796 4200 1.8359 - - - -
0.0815 4300 1.8256 - - - -
0.0834 4400 1.8131 - - - -
0.0853 4500 1.8063 - - - -
0.0872 4600 1.7950 - - - -
0.0891 4700 1.7846 - - - -
0.0910 4800 1.7762 - - - -
0.0929 4900 1.7620 - - - -
0.0948 5000 1.7605 -175.1685 0.1960 0.2024 0.1992
0.0967 5100 1.7481 - - - -
0.0986 5200 1.7419 - - - -
0.1005 5300 1.7301 - - - -
0.1024 5400 1.7280 - - - -
0.1043 5500 1.7131 - - - -
0.1062 5600 1.7063 - - - -
0.1081 5700 1.6959 - - - -
0.1100 5800 1.6884 - - - -
0.1119 5900 1.6801 - - - -
0.1138 6000 1.6700 -166.4924 0.2493 0.2150 0.2321
0.1157 6100 1.6637 - - - -
0.1176 6200 1.6543 - - - -
0.1195 6300 1.6451 - - - -
0.1214 6400 1.6382 - - - -
0.1233 6500 1.6278 - - - -
0.1251 6600 1.6235 - - - -
0.1270 6700 1.6150 - - - -
0.1289 6800 1.6054 - - - -
0.1308 6900 1.6007 - - - -
0.1327 7000 1.5874 -158.1013 0.2809 0.2349 0.2579
0.1346 7100 1.5824 - - - -
0.1365 7200 1.5724 - - - -
0.1384 7300 1.5669 - - - -
0.1403 7400 1.5535 - - - -
0.1422 7500 1.5450 - - - -
0.1441 7600 1.5345 - - - -
0.1460 7700 1.5340 - - - -
0.1479 7800 1.5242 - - - -
0.1498 7900 1.5181 - - - -
0.1517 8000 1.5086 -150.1032 0.2957 0.2454 0.2705
0.1536 8100 1.5007 - - - -
0.1555 8200 1.4950 - - - -
0.1574 8300 1.4829 - - - -
0.1593 8400 1.4780 - - - -
0.1612 8500 1.4737 - - - -
0.1631 8600 1.4603 - - - -
0.1650 8700 1.4510 - - - -
0.1669 8800 1.4500 - - - -
0.1688 8900 1.4408 - - - -
0.1707 9000 1.4372 -142.8462 0.3033 0.2824 0.2929
0.1726 9100 1.4270 - - - -
0.1744 9200 1.4233 - - - -
0.1763 9300 1.4135 - - - -
0.1782 9400 1.4074 - - - -
0.1801 9500 1.3981 - - - -
0.1820 9600 1.3919 - - - -
0.1839 9700 1.3844 - - - -
0.1858 9800 1.3741 - - - -
0.1877 9900 1.3685 - - - -
0.1896 10000 1.3668 -135.7081 0.3194 0.3059 0.3127
0.1915 10100 1.3568 - - - -
0.1934 10200 1.3505 - - - -
0.1953 10300 1.3433 - - - -
0.1972 10400 1.3338 - - - -
0.1991 10500 1.3295 - - - -
0.2010 10600 1.3275 - - - -
0.2029 10700 1.3149 - - - -
0.2048 10800 1.3119 - - - -
0.2067 10900 1.3055 - - - -
0.2086 11000 1.2952 -129.2064 0.3109 0.3434 0.3272
0.2105 11100 1.2920 - - - -
0.2124 11200 1.2851 - - - -
0.2143 11300 1.2769 - - - -
0.2162 11400 1.2747 - - - -
0.2181 11500 1.2686 - - - -
0.2200 11600 1.2684 - - - -
0.2219 11700 1.2582 - - - -
0.2237 11800 1.2582 - - - -
0.2256 11900 1.2479 - - - -
0.2275 12000 1.2418 -123.6261 0.3439 0.3547 0.3493
0.2294 12100 1.2400 - - - -
0.2313 12200 1.2330 - - - -
0.2332 12300 1.2288 - - - -
0.2351 12400 1.2230 - - - -
0.2370 12500 1.2164 - - - -
0.2389 12600 1.2157 - - - -
0.2408 12700 1.2166 - - - -
0.2427 12800 1.2045 - - - -
0.2446 12900 1.2035 - - - -
0.2465 13000 1.1968 -118.8691 0.3282 0.3329 0.3306
0.2484 13100 1.1942 - - - -
0.2503 13200 1.1895 - - - -
0.2522 13300 1.1843 - - - -
0.2541 13400 1.1755 - - - -
0.2560 13500 1.1756 - - - -
0.2579 13600 1.1707 - - - -
0.2598 13700 1.1637 - - - -
0.2617 13800 1.1684 - - - -
0.2636 13900 1.1628 - - - -
0.2655 14000 1.1585 -115.4122 0.3779 0.3579 0.3679
0.2674 14100 1.1602 - - - -
0.2693 14200 1.1504 - - - -
0.2712 14300 1.1483 - - - -
0.2730 14400 1.1488 - - - -
0.2749 14500 1.1392 - - - -
0.2768 14600 1.1343 - - - -
0.2787 14700 1.1363 - - - -
0.2806 14800 1.1342 - - - -
0.2825 14900 1.1327 - - - -
0.2844 15000 1.1219 -111.9139 0.3794 0.3791 0.3793
0.2863 15100 1.1246 - - - -
0.2882 15200 1.1152 - - - -
0.2901 15300 1.1196 - - - -
0.2920 15400 1.1097 - - - -
0.2939 15500 1.1067 - - - -
0.2958 15600 1.0994 - - - -
0.2977 15700 1.1077 - - - -
0.2996 15800 1.1057 - - - -
0.3015 15900 1.0949 - - - -
0.3034 16000 1.0981 -109.2994 0.3867 0.3855 0.3861
0.3053 16100 1.0933 - - - -
0.3072 16200 1.0873 - - - -
0.3091 16300 1.0851 - - - -
0.3110 16400 1.0840 - - - -
0.3129 16500 1.0831 - - - -
0.3148 16600 1.0755 - - - -
0.3167 16700 1.0733 - - - -
0.3186 16800 1.0724 - - - -
0.3205 16900 1.0698 - - - -
0.3223 17000 1.0710 -106.3769 0.4092 0.4066 0.4079
0.3242 17100 1.0699 - - - -
0.3261 17200 1.0642 - - - -
0.3280 17300 1.0576 - - - -
0.3299 17400 1.0597 - - - -
0.3318 17500 1.0572 - - - -
0.3337 17600 1.0547 - - - -
0.3356 17700 1.0502 - - - -
0.3375 17800 1.0467 - - - -
0.3394 17900 1.0485 - - - -
0.3413 18000 1.0455 -103.7698 0.4510 0.4237 0.4374
0.3432 18100 1.0433 - - - -
0.3451 18200 1.0404 - - - -
0.3470 18300 1.0397 - - - -
0.3489 18400 1.0352 - - - -
0.3508 18500 1.0318 - - - -
0.3527 18600 1.0302 - - - -
0.3546 18700 1.0330 - - - -
0.3565 18800 1.0220 - - - -
0.3584 18900 1.0223 - - - -
0.3603 19000 1.0254 -101.5743 0.4439 0.4265 0.4352
0.3622 19100 1.0186 - - - -
0.3641 19200 1.0216 - - - -
0.3660 19300 1.0152 - - - -
0.3679 19400 1.0139 - - - -
0.3698 19500 1.0125 - - - -
0.3716 19600 1.0087 - - - -
0.3735 19700 1.0045 - - - -
0.3754 19800 1.0032 - - - -
0.3773 19900 1.0013 - - - -
0.3792 20000 1.0017 -99.6613 0.4554 0.4374 0.4464
0.3811 20100 1.0007 - - - -
0.3830 20200 0.9959 - - - -
0.3849 20300 0.9965 - - - -
0.3868 20400 0.9909 - - - -
0.3887 20500 0.9902 - - - -
0.3906 20600 0.9903 - - - -
0.3925 20700 0.9927 - - - -
0.3944 20800 0.9865 - - - -
0.3963 20900 0.9843 - - - -
0.3982 21000 0.9809 -97.4922 0.4689 0.4462 0.4575
0.4001 21100 0.9801 - - - -
0.4020 21200 0.9785 - - - -
0.4039 21300 0.9718 - - - -
0.4058 21400 0.9725 - - - -
0.4077 21500 0.9705 - - - -
0.4096 21600 0.9729 - - - -
0.4115 21700 0.9714 - - - -
0.4134 21800 0.9647 - - - -
0.4153 21900 0.9623 - - - -
0.4172 22000 0.9579 -95.7813 0.4642 0.4549 0.4595
0.4191 22100 0.9553 - - - -
0.4209 22200 0.9558 - - - -
0.4228 22300 0.9584 - - - -
0.4247 22400 0.9544 - - - -
0.4266 22500 0.9520 - - - -
0.4285 22600 0.9516 - - - -
0.4304 22700 0.9543 - - - -
0.4323 22800 0.9502 - - - -
0.4342 22900 0.9477 - - - -
0.4361 23000 0.9405 -93.9238 0.4856 0.4521 0.4688
0.4380 23100 0.9448 - - - -
0.4399 23200 0.9424 - - - -
0.4418 23300 0.9369 - - - -
0.4437 23400 0.9318 - - - -
0.4456 23500 0.9342 - - - -
0.4475 23600 0.9392 - - - -
0.4494 23700 0.9358 - - - -
0.4513 23800 0.9303 - - - -
0.4532 23900 0.9306 - - - -
0.4551 24000 0.9277 -92.2427 0.4946 0.4798 0.4872
0.4570 24100 0.9267 - - - -
0.4589 24200 0.9228 - - - -
0.4608 24300 0.9239 - - - -
0.4627 24400 0.9225 - - - -
0.4646 24500 0.9169 - - - -
0.4665 24600 0.9170 - - - -
0.4684 24700 0.9195 - - - -
0.4702 24800 0.9153 - - - -
0.4721 24900 0.9138 - - - -
0.4740 25000 0.9108 -90.7635 0.4622 0.4812 0.4717
0.4759 25100 0.9133 - - - -
0.4778 25200 0.9076 - - - -
0.4797 25300 0.9081 - - - -
0.4816 25400 0.9093 - - - -
0.4835 25500 0.9037 - - - -
0.4854 25600 0.9025 - - - -
0.4873 25700 0.9058 - - - -
0.4892 25800 0.9018 - - - -
0.4911 25900 0.9014 - - - -
0.4930 26000 0.8946 -89.2562 0.4745 0.4957 0.4851
0.4949 26100 0.8982 - - - -
0.4968 26200 0.8946 - - - -
0.4987 26300 0.8941 - - - -
0.5006 26400 0.8925 - - - -
0.5025 26500 0.8947 - - - -
0.5044 26600 0.8906 - - - -
0.5063 26700 0.8895 - - - -
0.5082 26800 0.8866 - - - -
0.5101 26900 0.8840 - - - -
0.5120 27000 0.8764 -87.8039 0.5011 0.5173 0.5092
0.5139 27100 0.8859 - - - -
0.5158 27200 0.8839 - - - -
0.5177 27300 0.8794 - - - -
0.5195 27400 0.8790 - - - -
0.5214 27500 0.8788 - - - -
0.5233 27600 0.8780 - - - -
0.5252 27700 0.8749 - - - -
0.5271 27800 0.8742 - - - -
0.5290 27900 0.8700 - - - -
0.5309 28000 0.8691 -86.4419 0.4936 0.4776 0.4856
0.5328 28100 0.8747 - - - -
0.5347 28200 0.8644 - - - -
0.5366 28300 0.8673 - - - -
0.5385 28400 0.8670 - - - -
0.5404 28500 0.8638 - - - -
0.5423 28600 0.8649 - - - -
0.5442 28700 0.8629 - - - -
0.5461 28800 0.8629 - - - -
0.5480 28900 0.8591 - - - -
0.5499 29000 0.8566 -85.0408 0.4792 0.4918 0.4855
0.5518 29100 0.8588 - - - -
0.5537 29200 0.8545 - - - -
0.5556 29300 0.8534 - - - -
0.5575 29400 0.8543 - - - -
0.5594 29500 0.8534 - - - -
0.5613 29600 0.8519 - - - -
0.5632 29700 0.8486 - - - -
0.5651 29800 0.8530 - - - -
0.5670 29900 0.8477 - - - -
0.5688 30000 0.8465 -83.9435 0.4986 0.5097 0.5042
0.5707 30100 0.8425 - - - -
0.5726 30200 0.8437 - - - -
0.5745 30300 0.8430 - - - -
0.5764 30400 0.8431 - - - -
0.5783 30500 0.8424 - - - -
0.5802 30600 0.8403 - - - -
0.5821 30700 0.8347 - - - -
0.5840 30800 0.8344 - - - -
0.5859 30900 0.8348 - - - -
0.5878 31000 0.8351 -82.8113 0.4999 0.5088 0.5043
0.5897 31100 0.8362 - - - -
0.5916 31200 0.8307 - - - -
0.5935 31300 0.8315 - - - -
0.5954 31400 0.8311 - - - -
0.5973 31500 0.8305 - - - -
0.5992 31600 0.8304 - - - -
0.6011 31700 0.8277 - - - -
0.6030 31800 0.8249 - - - -
0.6049 31900 0.8262 - - - -
0.6068 32000 0.8236 -81.7389 0.4811 0.5256 0.5034
0.6087 32100 0.8209 - - - -
0.6106 32200 0.8226 - - - -
0.6125 32300 0.8207 - - - -
0.6144 32400 0.8224 - - - -
0.6163 32500 0.8163 - - - -
0.6182 32600 0.8181 - - - -
0.6200 32700 0.8147 - - - -
0.6219 32800 0.8170 - - - -
0.6238 32900 0.8156 - - - -
0.6257 33000 0.8141 -80.4979 0.5042 0.5085 0.5064
0.6276 33100 0.8088 - - - -
0.6295 33200 0.8098 - - - -
0.6314 33300 0.8133 - - - -
0.6333 33400 0.8087 - - - -
0.6352 33500 0.8086 - - - -
0.6371 33600 0.8094 - - - -
0.6390 33700 0.8054 - - - -
0.6409 33800 0.8043 - - - -
0.6428 33900 0.8035 - - - -
0.6447 34000 0.7990 -79.5726 0.4990 0.5166 0.5078
0.6466 34100 0.8035 - - - -
0.6485 34200 0.7990 - - - -
0.6504 34300 0.7996 - - - -
0.6523 34400 0.8005 - - - -
0.6542 34500 0.8000 - - - -
0.6561 34600 0.7975 - - - -
0.6580 34700 0.7959 - - - -
0.6599 34800 0.7921 - - - -
0.6618 34900 0.7916 - - - -
0.6637 35000 0.7933 -78.7884 0.5104 0.5139 0.5122
0.6656 35100 0.7908 - - - -
0.6675 35200 0.7913 - - - -
0.6693 35300 0.7921 - - - -
0.6712 35400 0.7929 - - - -
0.6731 35500 0.7915 - - - -
0.6750 35600 0.7871 - - - -
0.6769 35700 0.7836 - - - -
0.6788 35800 0.7805 - - - -
0.6807 35900 0.7870 - - - -
0.6826 36000 0.7797 -77.7400 0.5251 0.5457 0.5354

Framework Versions

  • Python: 3.11.13
  • Sentence Transformers: 5.2.2
  • Transformers: 5.1.0
  • PyTorch: 2.7.1+cu128
  • Accelerate: 1.9.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MSELoss

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}

ModernBERT Model Architecture

@misc{warner2024smarterbetterfasterlonger,
      title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference}, 
      author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
      year={2024},
      eprint={2412.13663},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13663}, 
}

Model Weight Initialization

@misc{trinh2025guideguidedinitializationdistillation,
      title={GUIDE: Guided Initialization and Distillation of Embeddings}, 
      author={Khoa Trinh and Gaurav Menghani and Erik Vee},
      year={2025},
      eprint={2510.06502},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.06502}, 
}