metadata
language:
- en
license: mit
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- dense
- generated_from_trainer
- dataset_size:80184
- loss:CachedMultipleNegativesRankingLoss
base_model: BAAI/bge-large-en-v1.5
widget:
- source_sentence: who does lennie choose in the sky is everywhere
sentences:
- >-
Bank of England £1 note The Bank of England £1 note was a banknote of
the pound sterling. After the ten shilling note was withdrawn in 1970 it
became the smallest denomination note issued by the Bank of England. The
one pound note was issued by the Bank of England for the first time in
1797 and continued to be printed until 1984. The note was withdrawn in
1988 in favour of the one pound coin.
- >-
The Sky Is Everywhere Lennie tries to make up with Joe by taking him
some of Gram's roses, but doesn't succeed. Gram becomes furious with
Lennie for cutting her roses and criticizes her for being selfish.
Lennie realizes that she needs to change, apologizes to her grandmother,
and tells her about the situation with Joe. Gram reassures Lennie that
Joe is in love with her. Lennie writes Joe a letter expressing her
feelings, and Joe ultimately forgives her and they reconcile. Toby and
Lennie become good friends and visit Bailey's grave together to
apologize to her. Lennie walks away from the grave with a smile, knowing
that her sister would have forgiven her and that the only way to deal
with grief is to accept that it is a part of you and to look ahead to
the future.
- >-
Senate of the Philippines The Senate of the Philippines (Filipino:
Senado ng Pilipinas, also Mataas na Kapulungan ng Pilipinas or "upper
chamber") is the upper house of the bicameral legislature of the
Philippines, the Congress; the House of Representatives is the lower
house. The Senate is composed of 24 senators who are elected at-large
with the country as one district under plurality-at-large voting.
- source_sentence: who played charlie in charlie and the chocolate factory 2005
sentences:
- >-
Charlie and the Chocolate Factory (film) Charlie and the Chocolate
Factory is a 2005 musical fantasy comedy film directed by Tim Burton and
written by John August, based on the 1964 British novel of the same name
by Roald Dahl. The film stars Johnny Depp as Willy Wonka and Freddie
Highmore as Charlie Bucket. The storyline follows Charlie, who wins a
contest and is along with four other contest winners, subsequently led
by Wonka on a tour of his chocolate factory, the most magnificent in the
world.
- >-
The Punisher (TV series) The Punisher is scheduled to be released on
November 17, 2017.
- >-
The Vampire Diaries (season 2) The Vampire Diaries, an American
supernatural drama, was officially renewed by The CW for a full
22-episode season on February 16, 2010.[1] The first episode premiered
on September 9, 2010, at 8 p.m. ET.[2] The season picks up immediately
after the events of the season one finale. All the series regulars
returned.[3] Season two focuses on the return of Elena Gilbert's (Nina
Dobrev) doppelgänger, Katherine Pierce, the introduction of werewolves,
the sun and moon curse, and the arrival of the original vampires. Tyler
Lockwood's (Michael Trevino) uncle, Mason Lockwood (Taylor Kinney),
arrives in town searching for the moonstone, a family heirloom. Tyler
later learns of his family's werewolf curse. Meanwhile, Caroline Forbes
(Candice Accola) is killed by Katherine while having Damon Salvatore's
(Ian Somerhalder) blood in her system, turning her into a vampire. The
arrival of the original vampires, Elijah (Daniel Gillies) and Klaus
Mikaelson (Joseph Morgan), also bring about complications. Klaus is a
vampire-werewolf hybrid, but his werewolf side had been forced into
dormancy by witches, as nature would not stand for such an imbalance in
power. Therefore, Klaus arrives in town with plans to break the curse
and unleash his werewolf side by channelling the power of the full moon
into the moonstone, sacrificing a vampire and a werewolf, and drinking
the blood of the doppelgänger. The season is currently on air in Urdu on
filmax channel in Pakistan. It became available on DVD and Blu-ray on
August 30, 2011.[4]
- source_sentence: most of the really good agricultural land in mexico is owned by
sentences:
- >-
State of the art The origin of the concept of "state of the art" took
place in the beginning of the twentieth century.[3] The earliest use of
the term "state of the art" documented by the Oxford English Dictionary
dates back to 1910, from an engineering manual by Henry Harrison Suplee
(1856-post 1943), an engineering graduate (University of Pennsylvania,
1876), titled Gas Turbine: progress in the design and construction of
turbines operated by gases of combustion. The relevant passage reads:
"In the present state of the art this is all that can be done". The term
"art" refers to technics, rather than performing or fine arts.[4]
- "London sewerage system Joseph Bazalgette, a civil engineer and Chief Engineer of the Metropolitan Board of Works, was given responsibility for the work. He designed an extensive underground sewerage system that diverted waste to the Thames Estuary, downstream of the main centre of population. Six main interceptor sewers, totalling almost 160\_km (100\_miles) in length, were constructed, some incorporating stretches of London's \"lost\" rivers. Three of these sewers were north of the river, the southernmost, low-level one being incorporated in the Thames Embankment. The Embankment also allowed new roads, new public gardens, and the Circle line of the London Underground. Victoria Embankment was finally officially opened on 13 July 1870.[3][4]"
- >-
Agriculture in Mexico During the early colonial period, the Spanish
introduced more plants and the concept of domesticated animals,
principally cattle, horses, donkeys, mules, goats and sheep, and barn
yard animals such as chickens and pigs. Farming from the colonial period
until the Mexican Revolution was focused on large private properties.
After the Revolution these were broken up and the land redistributed.
Since the latter 20th century NAFTA and economic policies have again
favored large scale commercial agricultural holdings.
- source_sentence: who is the person who plays black panther
sentences:
- >-
United States Capitol The United States Capitol, often called the
Capitol Building, is the home of the United States Congress, and the
seat of the legislative branch of the U.S. federal government. It is
located on Capitol Hill at the eastern end of the National Mall in
Washington, D.C. Though not at the geographic center of the Federal
District, the Capitol forms the origin point for the District's
street-numbering system and the District's four quadrants.
- >-
Supreme Court of the United States The Supreme Court of the United
States is the highest federal court of the United States. Established
pursuant to Article Three of the United States Constitution in 1789, it
has ultimate (and largely discretionary) appellate jurisdiction over all
federal courts and state court cases involving issues of federal law
plus original jurisdiction over a small range of cases. In the legal
system of the United States, the Supreme Court is generally the final
interpreter of federal law including the United States Constitution, but
it may act only within the context of a case in which it has
jurisdiction. The Court may decide cases having political overtones but
does not have power to decide nonjusticiable political questions, and
its enforcement arm is in the executive rather than judicial branch of
government.
- >-
Chadwick Boseman Chadwick Aaron Boseman[1] (born November 29,
1977)[2][3] is an American actor. He is known for portraying Jackie
Robinson in 42 (2013), James Brown in Get on Up (2014), Black Panther in
the Marvel Cinematic Universe (since 2016), and Thurgood Marshall in
Marshall (2017). He also had roles in the television series Lincoln
Heights (2008) and Persons Unknown (2010), and the films The Express
(2008), Draft Day (2014), and Message from the King (2016).
- source_sentence: can you find a pearl in a mussel
sentences:
- >-
Freshwater pearl mussel Although the name "freshwater pearl mussel" is
often used for this species, other freshwater mussel species can also
create pearls and some can also be used as a source of mother of pearl.
In fact, most cultured pearls today come from Hyriopsis species in Asia,
or Amblema species in North America, both members of the related family
Unionidae; pearls are also found within species in the genus Unio.
- >-
Ellis Island Generally, those immigrants who were approved spent from
two to five hours at Ellis Island. Arrivals were asked 29 questions
including name, occupation, and the amount of money carried. It was
important to the American government that the new arrivals could support
themselves and have money to get started. The average the government
wanted the immigrants to have was between 18 and 25 dollars ($600 in
2015 adjusted for inflation). Those with visible health problems or
diseases were sent home or held in the island's hospital facilities for
long periods of time. More than 3,000 would-be immigrants died on Ellis
Island while being held in the hospital facilities. Some unskilled
workers were rejected because they were considered "likely to become a
public charge." About 2% were denied admission to the U.S. and sent back
to their countries of origin for reasons such as having a chronic
contagious disease, criminal background, or insanity.[43] Ellis Island
was sometimes known as "The Island of Tears" or "Heartbreak Island"[44]
because of those 2% who were not admitted after the long transatlantic
voyage. The Kissing Post is a wooden column outside the Registry Room,
where new arrivals were greeted by their relatives and friends,
typically with tears, hugs, and kisses.[45][46]
- "Glee (season 1) The first season of the musical comedy-drama television series Glee originally aired on Fox in the United States. The pilot episode was broadcast as an advanced preview of the series on May 19, 2009, with the remainder of the season airing between September 9, 2009 and June 8, 2010. The season consisted of 22 episodes; the first 13 aired on Wednesdays at 9\_pm (ET) and the final 9 aired on Tuesdays at 9\_pm (ET). The season was executive produced by Ryan Murphy, Brad Falchuk, and Dante Di Loreto; Murphy's production company helped co-produce the series alongside 20th Century Fox."
datasets:
- sentence-transformers/natural-questions
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: bge-large-en-v1.5
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: NanoQuoraRetrieval
type: NanoQuoraRetrieval
metrics:
- type: cosine_accuracy@1
value: 0.88
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.98
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.98
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.88
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.4133333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.25199999999999995
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.13999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.7673333333333332
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.9520000000000001
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.9553333333333334
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9435612217207588
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9295238095238095
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.919404761904762
name: Cosine Map@100
- type: cosine_accuracy@1
value: 0.88
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.98
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.98
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.88
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.4133333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.25199999999999995
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.13999999999999999
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.7673333333333332
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.9520000000000001
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.9553333333333334
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9435612217207588
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9295238095238095
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.919404761904762
name: Cosine Map@100
bge-large-en-v1.5
This is a sentence-transformers model finetuned from BAAI/bge-large-en-v1.5 on the natural-questions dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-large-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
- License: mit
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("DannyAI/embedding_fine_tuning_with_peft_bge_large_en_v1.5")
# Run inference
queries = [
"can you find a pearl in a mussel",
]
documents = [
'Freshwater pearl mussel Although the name "freshwater pearl mussel" is often used for this species, other freshwater mussel species can also create pearls and some can also be used as a source of mother of pearl. In fact, most cultured pearls today come from Hyriopsis species in Asia, or Amblema species in North America, both members of the related family Unionidae; pearls are also found within species in the genus Unio.',
'Ellis Island Generally, those immigrants who were approved spent from two to five hours at Ellis Island. Arrivals were asked 29 questions including name, occupation, and the amount of money carried. It was important to the American government that the new arrivals could support themselves and have money to get started. The average the government wanted the immigrants to have was between 18 and 25 dollars ($600 in 2015 adjusted for inflation). Those with visible health problems or diseases were sent home or held in the island\'s hospital facilities for long periods of time. More than 3,000 would-be immigrants died on Ellis Island while being held in the hospital facilities. Some unskilled workers were rejected because they were considered "likely to become a public charge." About 2% were denied admission to the U.S. and sent back to their countries of origin for reasons such as having a chronic contagious disease, criminal background, or insanity.[43] Ellis Island was sometimes known as "The Island of Tears" or "Heartbreak Island"[44] because of those 2% who were not admitted after the long transatlantic voyage. The Kissing Post is a wooden column outside the Registry Room, where new arrivals were greeted by their relatives and friends, typically with tears, hugs, and kisses.[45][46]',
"Glee (season 1) The first season of the musical comedy-drama television series Glee originally aired on Fox in the United States. The pilot episode was broadcast as an advanced preview of the series on May 19, 2009, with the remainder of the season airing between September 9, 2009 and June 8, 2010. The season consisted of 22 episodes; the first 13 aired on Wednesdays at 9\xa0pm (ET) and the final 9 aired on Tuesdays at 9\xa0pm (ET). The season was executive produced by Ryan Murphy, Brad Falchuk, and Dante Di Loreto; Murphy's production company helped co-produce the series alongside 20th Century Fox.",
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 1024] [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.7103, 0.3918, 0.2758]])
Evaluation
Metrics
Information Retrieval
- Dataset:
NanoQuoraRetrieval - Evaluated with
InformationRetrievalEvaluatorwith these parameters:{ "query_prompt": "query: ", "corpus_prompt": "document: " }
| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.88 |
| cosine_accuracy@3 | 0.98 |
| cosine_accuracy@5 | 0.98 |
| cosine_accuracy@10 | 1.0 |
| cosine_precision@1 | 0.88 |
| cosine_precision@3 | 0.4133 |
| cosine_precision@5 | 0.252 |
| cosine_precision@10 | 0.14 |
| cosine_recall@1 | 0.7673 |
| cosine_recall@3 | 0.952 |
| cosine_recall@5 | 0.9553 |
| cosine_recall@10 | 1.0 |
| cosine_ndcg@10 | 0.9436 |
| cosine_mrr@10 | 0.9295 |
| cosine_map@100 | 0.9194 |
Information Retrieval
- Dataset:
NanoQuoraRetrieval - Evaluated with
InformationRetrievalEvaluatorwith these parameters:{ "query_prompt": "query: ", "corpus_prompt": "document: " }
| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.88 |
| cosine_accuracy@3 | 0.98 |
| cosine_accuracy@5 | 0.98 |
| cosine_accuracy@10 | 1.0 |
| cosine_precision@1 | 0.88 |
| cosine_precision@3 | 0.4133 |
| cosine_precision@5 | 0.252 |
| cosine_precision@10 | 0.14 |
| cosine_recall@1 | 0.7673 |
| cosine_recall@3 | 0.952 |
| cosine_recall@5 | 0.9553 |
| cosine_recall@10 | 1.0 |
| cosine_ndcg@10 | 0.9436 |
| cosine_mrr@10 | 0.9295 |
| cosine_map@100 | 0.9194 |
Training Details
Training Dataset
natural-questions
- Dataset: natural-questions at f9e894e
- Size: 80,184 training samples
- Columns:
queryandanswer - Approximate statistics based on the first 1000 samples:
query answer type string string details - min: 10 tokens
- mean: 11.72 tokens
- max: 24 tokens
- min: 11 tokens
- mean: 132.91 tokens
- max: 512 tokens
- Samples:
query answer who wrote i came in like a wrecking ballWrecking Ball (Miley Cyrus song) "Wrecking Ball" is a song recorded by American singer Miley Cyrus for her fourth studio album Bangerz (2013). It was released on August 25, 2013, by RCA Records as the album's second single. The song was written by MoZella, Stephan Moccio, Sacha Skarbek, Kiyanu Kim,[2] Lukasz Gottwald, and Henry Russell Walter;[3] production was helmed by the last two. "Wrecking Ball" is a pop ballad which lyrically discusses the deterioration of a relationship.what was the purpose of the three-field systemThree-field system The three-field system is a regime of crop rotation that was used in medieval and early-modern Europe. Crop rotation is the practice of growing a series of different types of crops in the same area in sequential seasons. Under this system, the arable land of an estate or village was divided into three large fields: one was planted in the autumn with winter wheat or rye; the second field was planted with other crops such as peas, lentils, or beans; and the third was left fallow, in order to allow the soil of that field to regain its nutrients. With each rotation, the field would be used differently, so that a field would be planted for two out of the three years used, whilst one year it "rested". Previously a "two field system" had been in place, with half the land being left fallow. The three field system allowed farmers to plant more crops and therefore to increase production and legumes have the ability to fix nitrogen and so fertilize the soil. With more crops ava...who is the main person in the legislative branchArticle One of the United States Constitution Section 1 is a vesting clause that bestows federal legislative power exclusively to Congress. Similar clauses are found in Articles II and III. The former confers executive power upon the President alone, and the latter grants judicial power solely to the federal judiciary. These three articles create a separation of powers among the three branches of the federal government. This separation of powers, by which each department may exercise only its own constitutional powers and no others,[1][2] is fundamental to the idea of a limited government accountable to the people. - Loss:
CachedMultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 16, "gather_across_devices": false }
Evaluation Dataset
natural-questions
- Dataset: natural-questions at f9e894e
- Size: 20,047 evaluation samples
- Columns:
queryandanswer - Approximate statistics based on the first 1000 samples:
query answer type string string details - min: 10 tokens
- mean: 11.79 tokens
- max: 25 tokens
- min: 7 tokens
- mean: 135.48 tokens
- max: 512 tokens
- Samples:
query answer when did call of duty ww2 come outCall of Duty: WWII Call of Duty: WWII is a first-person shooter video game developed by Sledgehammer Games and published by Activision. It is the fourteenth main installment in the Call of Duty series and was released worldwide on November 3, 2017 for Microsoft Windows, PlayStation 4 and Xbox One. It is the first title in the series to be set primarily during World War II since Call of Duty: World at War in 2008.[2] The game is set in the European theatre, and is centered around a squad in the 1st Infantry Division, following their battles on the Western Front, and set mainly in the historical events of Operation Overlord; the multiplayer expands to different fronts not seen in the campaign.who is doing the half time super bowlSuper Bowl LII halftime show The Super Bowl LII Halftime Show (officially known as the Pepsi Super Bowl LII Halftime Show) took place on February 4, 2018 at U.S. Bank Stadium in Minneapolis, Minnesota, as part of Super Bowl LII. Justin Timberlake was the featured performer, as confirmed by the National Football League (NFL) on October 22, 2017.[1] It was televised nationally by NBC.when was the sewage system built in londonLondon sewerage system Joseph Bazalgette, a civil engineer and Chief Engineer of the Metropolitan Board of Works, was given responsibility for the work. He designed an extensive underground sewerage system that diverted waste to the Thames Estuary, downstream of the main centre of population. Six main interceptor sewers, totalling almost 160 km (100 miles) in length, were constructed, some incorporating stretches of London's "lost" rivers. Three of these sewers were north of the river, the southernmost, low-level one being incorporated in the Thames Embankment. The Embankment also allowed new roads, new public gardens, and the Circle line of the London Underground. Victoria Embankment was finally officially opened on 13 July 1870.[3][4] - Loss:
CachedMultipleNegativesRankingLosswith these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 16, "gather_across_devices": false }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 5per_device_eval_batch_size: 5learning_rate: 2e-05max_steps: 100warmup_ratio: 0.1seed: 30bf16: Trueload_best_model_at_end: Trueprompts: {'query': 'query: ', 'answer': 'document: '}batch_sampler: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 5per_device_eval_batch_size: 5per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3.0max_steps: 100lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 30data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: {'query': 'query: ', 'answer': 'document: '}batch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}
Training Logs
| Epoch | Step | Training Loss | Validation Loss | NanoQuoraRetrieval_cosine_ndcg@10 |
|---|---|---|---|---|
| -1 | -1 | - | - | 0.9583 |
| 0.0062 | 100 | 0.0156 | 0.0067 | 0.9436 |
| -1 | -1 | - | - | 0.9436 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.12.11
- Sentence Transformers: 5.1.0
- Transformers: 4.56.1
- PyTorch: 2.8.0+cu126
- Accelerate: 1.10.1
- Datasets: 4.0.0
- Tokenizers: 0.22.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}