DannyAI's picture
Add new SentenceTransformer model
0625858 verified
metadata
language:
  - en
license: mit
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:64147
  - loss:CachedMultipleNegativesRankingLoss
base_model: BAAI/bge-large-en-v1.5
widget:
  - source_sentence: who is the second prime minister of india
    sentences:
      - >-
        List of Prime Ministers of India Since 1947, India has had fourteen
        Prime Ministers, fifteen including Gulzarilal Nanda who twice acted in
        the role. The first was Jawaharlal Nehru of the Indian National Congress
        party, who was sworn-in on 15 August 1947, when India gained
        independence from the British. Serving until his death in May 1964,
        Nehru remains India's longest-serving prime minister. He was succeeded
        by fellow Congressman Lal Bahadur Shastri, whose 19-month term also
        ended in death. Indira Gandhi, Nehru's daughter, succeeded Shastri in
        1966 to become the country's first woman premier. Eleven years later,
        she was voted out of power in favour of the Janata Party, whose leader
        Morarji Desai became the first non-Congress prime minister. After he
        resigned in 1979, his former deputy Charan Singh briefly held office
        until Indira Gandhi was voted back six months later. Indira Gandhi's
        second stint as Prime Minister ended five years later on the morning of
        31 October 1984, when she was gunned down by her own bodyguards. That
        evening, her son Rajiv Gandhi was sworn-in as India's youngest premier,
        and the third from his family. Thus far, members of Nehru–Gandhi dynasty
        have been Prime Minister for a total of 37 years and 303 days.[1]
      - >-
        Can You Feel the Love Tonight The song was performed in the film by
        Kristle Edwards, Joseph Williams, Sally Dworsky, Nathan Lane, and Ernie
        Sabella, while the end title version was performed by Elton John. It won
        the 1994 Academy Award for Best Original Song,[1] and the Golden Globe
        Award for Best Original Song. It also earned Elton John the Grammy Award
        for Best Male Pop Vocal Performance.
      - >-
        Sam Worthington Samuel Henry John Worthington[1] (born 2 August 1976) is
        an English born, Australian actor and writer. He portrayed Jake Sully in
        the 2009 film Avatar, Marcus Wright in Terminator Salvation, and Perseus
        in Clash of the Titans as well as its sequel Wrath of the Titans before
        transitioning to more dramatic roles in Everest (2015), Hacksaw Ridge
        (2016), The Shack, and Manhunt: Unabomber (both in 2017). He also played
        the main protagonist, Captain Alex Mason, in Call of Duty: Black Ops.
  - source_sentence: who drafted most of the declaration of independence
    sentences:
      - >-
        United States Declaration of Independence John Adams persuaded the
        committee to select Thomas Jefferson to compose the original draft of
        the document,[3] which Congress would edit to produce the final version.
        The Declaration was ultimately a formal explanation of why Congress had
        voted on July 2 to declare independence from Great Britain, more than a
        year after the outbreak of the American Revolutionary War. The next day,
        Adams wrote to his wife Abigail: "The Second Day of July 1776, will be
        the most memorable Epocha, in the History of America."[4] But
        Independence Day is actually celebrated on July 4, the date that the
        Declaration of Independence was approved.
      - Luke Cage (season 2) The season is set to premiere in 2018.
      - >-
        Politics of the European Union The competencies of the European Union
        stem from the original Coal and Steel Community, which had as its goal
        an integrated market. The original competencies were regulatory in
        nature, restricted to matters of maintaining a healthy business
        environment. Rulings were confined to laws covering trade, currency, and
        competition. Increases in the number of EU competencies result from a
        process known as functional spillover. Functional spillover resulted in,
        first, the integration of banking and insurance industries to manage
        finance and investment. The size of the bureaucracies increased,
        requiring modifications to the treaty system as the scope of
        competencies integrated more and more functions. While member states
        hold their sovereignty inviolate, they remain within a system to which
        they have delegated the tasks of managing the marketplace. These tasks
        have expanded to include the competencies of free movement of persons,
        employment, transportation, and environmental regulation.
  - source_sentence: is there a difference between 300 blackout and 300 aac blackout
    sentences:
      - >-
        Call of Duty: World at War Call of Duty: World at War is a 2008
        first-person shooter video game developed by Treyarch and published by
        Activision for Microsoft Windows, PlayStation 3, Wii, and Xbox 360. The
        game is the fifth mainstream game of the Call of Duty series and returns
        the setting to World War II for the last time until Call of Duty: WWII
        almost nine years later. The game is also the first title in the Black
        Ops story line. The game was released in North America on November 11,
        2008, and in Europe on November 14, 2008. A Windows Mobile version was
        also made available by Glu Mobile and different storyline versions for
        the Nintendo DS and PlayStation 2 were also produced, but remain in the
        World War II setting. The game is based on an enhanced version of the
        Call of Duty 4: Modern Warfare game engine developed by Infinity Ward
        with increased development on audio and visual effects.
      - >-
        Vincent van Gogh Van Gogh suffered from psychotic episodes and delusions
        and though he worried about his mental stability, he often neglected his
        physical health, did not eat properly and drank heavily. His friendship
        with Gauguin ended after a confrontation with a razor, when in a rage,
        he severed part of his own left ear. He spent time in psychiatric
        hospitals, including a period at Saint-Rémy. After he discharged himself
        and moved to the Auberge Ravoux in Auvers-sur-Oise near Paris, he came
        under the care of the homoeopathic doctor Paul Gachet. His depression
        continued and on 27 July 1890, Van Gogh shot himself in the chest with a
        revolver. He died from his injuries two days later.
      - >-
        .300 AAC Blackout The .300 AAC Blackout (designated as the 300 BLK by
        the SAAMI[1] and 300 AAC Blackout by the C.I.P.[2]), also known as
        7.62×35mm is a carbine cartridge developed in the United States by
        Advanced Armament Corporation (AAC) for use in the M4 carbine. Its
        purpose is to achieve ballistics similar to the 7.62×39mm Soviet
        cartridge in an AR-15 while using standard AR-15 magazines at their
        normal capacity. It can be seen as a SAAMI-certified copy of J. D.
        Jones' wildcat .300 Whisper. Care should be taken not to use 300 BLK
        ammunition in a rifle chambered for 7.62×40mm Wilson Tactical.[3]
  - source_sentence: when does the new army uniform come out
    sentences:
      - >-
        United States v. Paramount Pictures, Inc. The case reached the U.S.
        Supreme Court in 1948; their verdict went against the movie studios,
        forcing all of them to divest themselves of their movie theater
        chains.[8] This, coupled with the advent of television and the attendant
        drop in movie ticket sales, brought about a severe slump in the movie
        business, a slump that would not be reversed until 1972, with the
        release of The Godfather, the first modern blockbuster.
      - >-
        E. L. James James says the idea for the Fifty Shades trilogy began as a
        response to the vampire novel series Twilight. In late 2008 James saw
        the movie Twilight, and then became intensely absorbed with the novels
        that the movie was based on. She read the novels several times over in a
        period of a few days, and then, for the first time in her life, sat down
        to write a book: basically a sequel to the Twilight novels. Between
        January and August 2009 she wrote two such books in quick succession.
        She says she then discovered the phenomenon of fan fiction, and this
        inspired her to publish her novels as Kindle books under the pen name
        "Snowqueens Icedragon". Beginning in August 2009 she then began to write
        the Fifty Shades books.[12][13]
      - >-
        Army Combat Uniform In May 2014, the Army unofficially announced that
        the Operational Camouflage Pattern (OCP) would replace UCP on the ACU.
        The original "Scorpion" pattern was developed at United States Army
        Soldier Systems Center by Crye Precision in 2002 for the Objective Force
        Warrior program. Crye later modified and trademarked their version of
        the pattern as MultiCam, which was selected for use by U.S. soldiers in
        Afghanistan in 2010. After talks to officially adopt MultiCam broke down
        over costs in late 2013, the Army began experimenting with the original
        Scorpion pattern, creating a variant code named "Scorpion W2", noting
        that while a pattern can be copyrighted, a color palette cannot and that
        beyond 50 meters the actual pattern is "not that relevant." The pattern
        resembles MultiCam with muted greens, light beige, and dark brown
        colors, but uses fewer beige and brown patches and no vertical twig and
        branch elements.[12] On 31 July 2014, the Army formally announced that
        the pattern would begin being issued in uniforms in summer 2015. The
        official name is intended to emphasize its use beyond Afghanistan to all
        combatant commands.[13] The UCP pattern is planned to be fully replaced
        by the OCP on the ACU by 1 October 2019.[14] ACUs printed in OCP first
        became available for purchase on 1 July 2015, with deployed soldiers
        already being issued uniforms and equipment in the new pattern.[15]
  - source_sentence: what was agenda 21 of earth summit of rio de janeiro
    sentences:
      - >-
        Jab Harry Met Sejal Jab Harry Met Sejal (English: When Harry Met Sejal)
        is a 2017 Indian romantic comedy film written and directed by Imtiaz
        Ali. It features Shah Rukh Khan and Anushka Sharma in the lead roles,[1]
        their third collaboration after Rab Ne Bana Di Jodi (2008) and Jab Tak
        Hai Jaan (2012). Pre-production of the film begun in April 2015 and
        principal photography commenced in August 2016 in Prague, Amsterdam,
        Vienna, Lisbon and Budapest.
      - >-
        Agenda 21 Agenda 21 is a non-binding, action plan of the United Nations
        with regard to sustainable development.[1] It is a product of the Earth
        Summit (UN Conference on Environment and Development) held in Rio de
        Janeiro, Brazil, in 1992. It is an action agenda for the UN, other
        multilateral organizations, and individual governments around the world
        that can be executed at local, national, and global levels.
      - >-
        Pencil Most manufacturers, and almost all in Europe, designate their
        pencils with the letters H (commonly interpreted as "hardness") to B
        (commonly "blackness"), as well as F (usually taken to mean "fineness",
        although F pencils are no more fine or more easily sharpened than any
        other grade. also known as "firm" in Japan[68]). The standard writing
        pencil is graded HB.[69] This designation might have been first used in
        the early 20th century by Brookman, an English pencil maker. It used B
        for black and H for hard; a pencil's grade was described by a sequence
        or successive Hs or Bs such as BB and BBB for successively softer leads,
        and HH and HHH for successively harder ones.[70] The Koh-i-Noor
        Hardtmuth pencil manufacturers claim to have first used the HB
        designations, with H standing for Hardtmuth, B for the company's
        location of Budějovice, and F for Franz Hardtmuth, who was responsible
        for technological improvements in pencil manufacture.[71][72]
datasets:
  - sentence-transformers/natural-questions
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: bge-large-en-v1.5
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: NanoQuoraRetrieval
          type: NanoQuoraRetrieval
        metrics:
          - type: cosine_accuracy@1
            value: 0.88
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.96
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.98
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 1
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.88
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3999999999999999
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.25999999999999995
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.13599999999999998
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.7673333333333332
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.922
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.966
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9933333333333334
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.9311833586321692
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.9228888888888889
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.9056754689754689
            name: Cosine Map@100
          - type: cosine_accuracy@1
            value: 0.88
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.96
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.98
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 1
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.88
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3999999999999999
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.25999999999999995
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.13599999999999998
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.7673333333333332
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.922
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.966
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9933333333333334
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.9311833586321692
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.9228888888888889
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.9056754689754689
            name: Cosine Map@100

bge-large-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-large-en-v1.5 on the natural-questions dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-large-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en
  • License: mit

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("DannyAI/embedding_fine_tuning_with_prompts_bge_large_en_v1.5")
# Run inference
queries = [
    "what was agenda 21 of earth summit of rio de janeiro",
]
documents = [
    'Agenda 21 Agenda 21 is a non-binding, action plan of the United Nations with regard to sustainable development.[1] It is a product of the Earth Summit (UN Conference on Environment and Development) held in Rio de Janeiro, Brazil, in 1992. It is an action agenda for the UN, other multilateral organizations, and individual governments around the world that can be executed at local, national, and global levels.',
    'Jab Harry Met Sejal Jab Harry Met Sejal (English: When Harry Met Sejal) is a 2017 Indian romantic comedy film written and directed by Imtiaz Ali. It features Shah Rukh Khan and Anushka Sharma in the lead roles,[1] their third collaboration after Rab Ne Bana Di Jodi (2008) and Jab Tak Hai Jaan (2012). Pre-production of the film begun in April 2015 and principal photography commenced in August 2016 in Prague, Amsterdam, Vienna, Lisbon and Budapest.',
    'Pencil Most manufacturers, and almost all in Europe, designate their pencils with the letters H (commonly interpreted as "hardness") to B (commonly "blackness"), as well as F (usually taken to mean "fineness", although F pencils are no more fine or more easily sharpened than any other grade. also known as "firm" in Japan[68]). The standard writing pencil is graded HB.[69] This designation might have been first used in the early 20th century by Brookman, an English pencil maker. It used B for black and H for hard; a pencil\'s grade was described by a sequence or successive Hs or Bs such as BB and BBB for successively softer leads, and HH and HHH for successively harder ones.[70] The Koh-i-Noor Hardtmuth pencil manufacturers claim to have first used the HB designations, with H standing for Hardtmuth, B for the company\'s location of Budějovice, and F for Franz Hardtmuth, who was responsible for technological improvements in pencil manufacture.[71][72]',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 1024] [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.9017, 0.2307, 0.2148]])

Evaluation

Metrics

Information Retrieval

  • Dataset: NanoQuoraRetrieval
  • Evaluated with InformationRetrievalEvaluator with these parameters:
    {
        "query_prompt": "query: ",
        "corpus_prompt": "document: "
    }
    
Metric Value
cosine_accuracy@1 0.88
cosine_accuracy@3 0.96
cosine_accuracy@5 0.98
cosine_accuracy@10 1.0
cosine_precision@1 0.88
cosine_precision@3 0.4
cosine_precision@5 0.26
cosine_precision@10 0.136
cosine_recall@1 0.7673
cosine_recall@3 0.922
cosine_recall@5 0.966
cosine_recall@10 0.9933
cosine_ndcg@10 0.9312
cosine_mrr@10 0.9229
cosine_map@100 0.9057

Information Retrieval

  • Dataset: NanoQuoraRetrieval
  • Evaluated with InformationRetrievalEvaluator with these parameters:
    {
        "query_prompt": "query: ",
        "corpus_prompt": "document: "
    }
    
Metric Value
cosine_accuracy@1 0.88
cosine_accuracy@3 0.96
cosine_accuracy@5 0.98
cosine_accuracy@10 1.0
cosine_precision@1 0.88
cosine_precision@3 0.4
cosine_precision@5 0.26
cosine_precision@10 0.136
cosine_recall@1 0.7673
cosine_recall@3 0.922
cosine_recall@5 0.966
cosine_recall@10 0.9933
cosine_ndcg@10 0.9312
cosine_mrr@10 0.9229
cosine_map@100 0.9057

Training Details

Training Dataset

natural-questions

  • Dataset: natural-questions at f9e894e
  • Size: 64,147 training samples
  • Columns: query and answer
  • Approximate statistics based on the first 1000 samples:
    query answer
    type string string
    details
    • min: 10 tokens
    • mean: 11.81 tokens
    • max: 26 tokens
    • min: 21 tokens
    • mean: 137.28 tokens
    • max: 512 tokens
  • Samples:
    query answer
    the internal revenue code is part of federal statutory law. true false Internal Revenue Code The Internal Revenue Code (IRC), formally the Internal Revenue Code of 1986, is the domestic portion of federal statutory tax law in the United States, published in various volumes of the United States Statutes at Large, and separately as Title 26 of the United States Code (USC).[1] It is organized topically, into subtitles and sections, covering income tax (see Income tax in the United States), payroll taxes, estate taxes, gift taxes, and excise taxes; as well as procedure and administration. Its implementing agency is the Internal Revenue Service.
    where is the pyramid temple at borobudur located Borobudur Approximately 40 kilometres (25 mi) northwest of Yogyakarta and 86 kilometres (53 mi) west of Surakarta, Borobudur is located in an elevated area between two twin volcanoes, Sundoro-Sumbing and Merbabu-Merapi, and two rivers, the Progo and the Elo. According to local myth, the area known as Kedu Plain is a Javanese "sacred" place and has been dubbed "the garden of Java" due to its high agricultural fertility.[19] During the restoration in the early 20th century, it was discovered that three Buddhist temples in the region, Borobudur, Pawon and Mendut, are positioned along a straight line.[20] A ritual relationship between the three temples must have existed, although the exact ritual process is unknown.[14]
    what does uncle stand for in the show man from uncle The Man from U.N.C.L.E. Originally, co-creator Sam Rolfe wanted to leave the meaning of U.N.C.L.E. ambiguous so it could refer to either "Uncle Sam" or the United Nations.[2]:14 Concerns by Metro-Goldwyn-Mayer's (MGM) legal department about using "U.N." for commercial purposes resulted in the producers' clarification that U.N.C.L.E. was an acronym for the United Network Command for Law and Enforcement.[3] Each episode had an "acknowledgement" to the U.N.C.L.E. in the end titles.
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 16,
        "gather_across_devices": false
    }
    

Evaluation Dataset

natural-questions

  • Dataset: natural-questions at f9e894e
  • Size: 16,037 evaluation samples
  • Columns: query and answer
  • Approximate statistics based on the first 1000 samples:
    query answer
    type string string
    details
    • min: 10 tokens
    • mean: 11.67 tokens
    • max: 22 tokens
    • min: 12 tokens
    • mean: 134.64 tokens
    • max: 512 tokens
  • Samples:
    query answer
    when did last harry potter movie come out Harry Potter (film series) Harry Potter is a British-American film series based on the Harry Potter novels by author J. K. Rowling. The series is distributed by Warner Bros. and consists of eight fantasy films, beginning with Harry Potter and the Philosopher's Stone (2001) and culminating with Harry Potter and the Deathly Hallows – Part 2 (2011).[2][3] A spin-off prequel series will consist of five films, starting with Fantastic Beasts and Where to Find Them (2016). The Fantastic Beasts films mark the beginning of a shared media franchise known as J. K. Rowling's Wizarding World.[4]
    where did the saying debbie downer come from Debbie Downer The character's name, Debbie Downer, is a slang phrase which refers to someone who frequently adds bad news and negative feelings to a gathering, thus bringing down the mood of everyone around them. Dratch's character would usually appear at social gatherings and interrupt the conversation to voice negative opinions and pronouncements. She is especially concerned about the rate of feline AIDS, a subject that she would bring up on more than one occasion, saying it was the number one killer of domestic cats.
    the financial crisis of 2008 was caused by Financial crisis of 2007–2008 It began in 2007 with a crisis in the subprime mortgage market in the United States, and developed into a full-blown international banking crisis with the collapse of the investment bank Lehman Brothers on September 15, 2008.[5] Excessive risk-taking by banks such as Lehman Brothers helped to magnify the financial impact globally.[6] Massive bail-outs of financial institutions and other palliative monetary and fiscal policies were employed to prevent a possible collapse of the world financial system. The crisis was nonetheless followed by a global economic downturn, the Great Recession. The European debt crisis, a crisis in the banking system of the European countries using the euro, followed later.
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "mini_batch_size": 16,
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 5
  • per_device_eval_batch_size: 5
  • learning_rate: 2e-05
  • max_steps: 100
  • warmup_ratio: 0.1
  • seed: 30
  • bf16: True
  • load_best_model_at_end: True
  • prompts: {'query': 'query: ', 'answer': 'document: '}
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 5
  • per_device_eval_batch_size: 5
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3.0
  • max_steps: 100
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 30
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: {'query': 'query: ', 'answer': 'document: '}
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss NanoQuoraRetrieval_cosine_ndcg@10
-1 -1 - - 0.9583
0.0078 100 0.0063 0.0029 0.9312
-1 -1 - - 0.9312
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.1.0
  • Transformers: 4.56.1
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}