|
|
--- |
|
|
tags: |
|
|
- sentence-transformers |
|
|
- sentence-similarity |
|
|
- feature-extraction |
|
|
- generated_from_trainer |
|
|
- dataset_size:5489 |
|
|
- loss:MultipleNegativesRankingLoss |
|
|
base_model: zacbrld/MNLP_M2_document_encoder |
|
|
widget: |
|
|
- source_sentence: Military activity affects the physical geology. This was first |
|
|
noted through the intensive shelling on the Western Front during World War I, |
|
|
which caused the shattering of the bedrock and changed the rocks' permeability. |
|
|
New minerals, rocks, and land-forms are also a byproduct of nuclear testing. |
|
|
sentences: |
|
|
- 'Silicon can form sigma bonds to other silicon atoms (and disilane is the parent |
|
|
of this class of compounds). However, it is difficult to prepare and isolate SinH2n+2 |
|
|
(analogous to the saturated alkane hydrocarbons) with n greater than about 8, |
|
|
as their thermal stability decreases with increases in the number of silicon atoms. Silanes |
|
|
higher in molecular weight than disilane decompose to polymeric polysilicon hydride |
|
|
and hydrogen. But with a suitable pair of organic substituents in place of hydrogen |
|
|
on each silicon it is possible to prepare polysilanes (sometimes, erroneously |
|
|
called polysilenes) that are analogues of alkanes. These long chain compounds |
|
|
have surprising electronic properties - high electrical conductivity, for example |
|
|
- arising from sigma delocalization of the electrons in the chain. |
|
|
|
|
|
Even silicon–silicon pi bonds are possible. However, these bonds are less stable |
|
|
than the carbon analogues. Disilane and longer silanes are quite reactive compared |
|
|
to alkanes. Disilene and disilynes are quite rare, unlike alkenes and alkynes. |
|
|
Examples of disilynes, long thought to be too unstable to be isolated were reported |
|
|
in 2004.' |
|
|
- 'The increasing sophistication of brain-reading technologies has led many to investigate |
|
|
their potential applications for lie detection. Legally required brain scans arguably |
|
|
violate “the guarantee against self-incrimination” because they differ from acceptable |
|
|
forms of bodily evidence, such as fingerprints or blood samples, in an important |
|
|
way: they are not simply physical, hard evidence, but evidence that is intimately |
|
|
linked to the defendant''s mind. Under US law, brain-scanning technologies might |
|
|
also raise implications for the Fourth Amendment, calling into question whether |
|
|
they constitute an unreasonable search and seizure.' |
|
|
- Military activity affects the physical geology. This was first noted through the |
|
|
intensive shelling on the Western Front during World War I, which caused the shattering |
|
|
of the bedrock and changed the rocks' permeability. New minerals, rocks, and land-forms |
|
|
are also a byproduct of nuclear testing. |
|
|
- source_sentence: Right after a bombing in Moscow on September 6, 1999, several anti-nuclear |
|
|
activists were detained under suspicion. Vladimir Slivyak was one of the three |
|
|
arrested under suspicion. He was an activist in the anti-nuclear movement and |
|
|
a Voronezh action camp organizer. After the bombing Slivyak was pushed into a |
|
|
car by several men who claimed to be Moscow police. The police interrogated and |
|
|
threatened Slivyak for around ninety minutes before letting him go. The Moscow |
|
|
police thought environmentalists from the anti-nuclear movement were associated |
|
|
with the bombing since an earlier bombing occurred on August 31 at Manezh Palace |
|
|
in Moscow . After the incident, on August 31, several more bombings occurred which |
|
|
agitated many people, leading to the racially profiled arrest of dark-skinned |
|
|
Muscovites and visitors to the Russian capital. |
|
|
sentences: |
|
|
- The technique works backwards from the target to identify a precursor molecule |
|
|
and an enzyme that converts it into the target, and then a second precursor that |
|
|
can produce the first and so on until a simple, inexpensive molecule becomes the |
|
|
beginning of the series. For each precursor, the enzyme is evolved using induced |
|
|
mutations and natural selection to produce a more productive version. The evolutionary |
|
|
process can be repeated over multiple generations until acceptable productivity |
|
|
is achieved. The process does not require high temperature, high pressure, the |
|
|
use of exotic catalysts or other elements that can increase costs. The enzyme |
|
|
"optimizations" that increase the production of one precursor from another are |
|
|
cumulative in that the same precursor productivity improvements can potentially |
|
|
be leveraged across multiple target molecules. |
|
|
- Right after a bombing in Moscow on September 6, 1999, several anti-nuclear activists |
|
|
were detained under suspicion. Vladimir Slivyak was one of the three arrested |
|
|
under suspicion. He was an activist in the anti-nuclear movement and a Voronezh |
|
|
action camp organizer. After the bombing Slivyak was pushed into a car by several |
|
|
men who claimed to be Moscow police. The police interrogated and threatened Slivyak |
|
|
for around ninety minutes before letting him go. The Moscow police thought environmentalists |
|
|
from the anti-nuclear movement were associated with the bombing since an earlier |
|
|
bombing occurred on August 31 at Manezh Palace in Moscow . After the incident, |
|
|
on August 31, several more bombings occurred which agitated many people, leading |
|
|
to the racially profiled arrest of dark-skinned Muscovites and visitors to the |
|
|
Russian capital. |
|
|
- One of the main sources of information about the Earth's composition comes from |
|
|
understanding the relationship between peridotite and basalt melting. Peridotite |
|
|
makes up most of Earth's mantle. Basalt, which is highly concentrated in the Earth's |
|
|
oceanic crust, is formed when magma reaches the Earth's surface and cools down |
|
|
at a very fast rate. When magma cools, different minerals crystallize at different |
|
|
times depending on the cooling temperature of that respective mineral. This ultimately |
|
|
changes the chemical composition of the melt as different minerals begin to crystallize. |
|
|
Fractional crystallization of elements in basaltic liquids has also been studied |
|
|
to observe the composition of lava in the upper mantle. This concept can be applied |
|
|
by scientists to give insight on the evolution of Earth's mantle and how concentrations |
|
|
of lithophile trace elements have varied over the last 3.5 billion years. |
|
|
- source_sentence: 'The group designs numerous structural concepts such as frameworks |
|
|
and floors like Dalle O''Portune and D-Dalle. |
|
|
|
|
|
The timber design office of excellence is an entity specializing in the design |
|
|
and optimization of wood construction projects. It stands out for its ability |
|
|
to meet the highest demands in terms of performance, durability and aesthetics, |
|
|
and is thus recognized for its contribution to the realization of ambitious projects |
|
|
in the field of timber construction.' |
|
|
sentences: |
|
|
- 'The group designs numerous structural concepts such as frameworks and floors |
|
|
like Dalle O''Portune and D-Dalle. |
|
|
|
|
|
The timber design office of excellence is an entity specializing in the design |
|
|
and optimization of wood construction projects. It stands out for its ability |
|
|
to meet the highest demands in terms of performance, durability and aesthetics, |
|
|
and is thus recognized for its contribution to the realization of ambitious projects |
|
|
in the field of timber construction.' |
|
|
- 'In waterways, the term bridge strike may be used when a water vessel collides |
|
|
with a bridge. This may include a collision to the bridge span or a collision |
|
|
to the bridge support structure such as a pier. Bridge protection systems are |
|
|
used to mitigate the effects of a ship strike. |
|
|
|
|
|
In 2014, the United States Coast Guard published statistics that it investigated |
|
|
205 bridge strikes in the eleven years prior to the publication. All of those |
|
|
collisions involved involved a fixed, swing, lift or draw bridge. That number |
|
|
was 1.2% of all vessel collision incidents investigated by the Coast Guard. The |
|
|
primary causal factor was the lack of accurate air draft data, the distance between |
|
|
water surface to the top most part of the vessel.' |
|
|
- 'Post, Stephen Garrard. Encyclopedia of bioethics. Third edition. Macmillan Reference |
|
|
USA, 2003. ISBN 0028657748. ISSN 0950-4125; DOI:10.1108/09504120510573477. (5-Volume |
|
|
Set; 3062 pages). |
|
|
|
|
|
Reich, Warren Thomas Encyclopedia of Bioethics. First edition. New York: Free |
|
|
Press, 1978. ISBN 0029261805. ISBN 978-0029261804. (4-Volume Set; 1933 pages) |
|
|
|
|
|
Reich, Warren Thomas Encyclopedia of Bioethics. Second edition. New York: Free |
|
|
Press, 1982. (5-Volume Set; 2950 pages) |
|
|
|
|
|
Reich, Warren Thomas Encyclopedia of Bioethics. Third edition. New York: Simon |
|
|
& Schuster Macmillan, 1995; London: Simon and Schuster and Prentice Hall International, |
|
|
c1995. Rev. ed. (5-Volume Set; 2950 pages; 464 articles) ISBN 0028973550. ISBN |
|
|
978-0028973555.' |
|
|
- source_sentence: 'Regression is used to make predictions based on the retrieved |
|
|
data through statistical trends and statistical modeling. Different uses of this |
|
|
technique are used for fetching Photometric redshifts and measurements of physical |
|
|
parameters of stars. The approaches are listed below: |
|
|
|
|
|
|
|
|
Artificial neural network (ANN) |
|
|
|
|
|
Support vector regression (SVR) |
|
|
|
|
|
Decision tree |
|
|
|
|
|
Random forest |
|
|
|
|
|
k-nearest neighbors regression |
|
|
|
|
|
Kernel regression |
|
|
|
|
|
Principal component regression (PCR) |
|
|
|
|
|
Gaussian process |
|
|
|
|
|
Least squared regression (LSR) |
|
|
|
|
|
Partial least squares regression' |
|
|
sentences: |
|
|
- 'Regression is used to make predictions based on the retrieved data through statistical |
|
|
trends and statistical modeling. Different uses of this technique are used for |
|
|
fetching Photometric redshifts and measurements of physical parameters of stars. |
|
|
The approaches are listed below: |
|
|
|
|
|
|
|
|
Artificial neural network (ANN) |
|
|
|
|
|
Support vector regression (SVR) |
|
|
|
|
|
Decision tree |
|
|
|
|
|
Random forest |
|
|
|
|
|
k-nearest neighbors regression |
|
|
|
|
|
Kernel regression |
|
|
|
|
|
Principal component regression (PCR) |
|
|
|
|
|
Gaussian process |
|
|
|
|
|
Least squared regression (LSR) |
|
|
|
|
|
Partial least squares regression' |
|
|
- 'Clandestine chemistry is not limited to drugs; it is also associated with explosives, |
|
|
and other illegal chemicals. Of the explosives manufactured illegally, nitroglycerin |
|
|
and acetone peroxide are easiest to produce due to the ease with which the precursors |
|
|
can be acquired. |
|
|
|
|
|
Uncle Fester is a writer who commonly writes about different aspects of clandestine |
|
|
chemistry. Secrets of Methamphetamine Manufacture is among his most popular books, |
|
|
and is considered required reading for DEA agents. More of his books deal with |
|
|
other aspects of clandestine chemistry, including explosives, and poisons. Fester |
|
|
is, however, considered by many to be a faulty and unreliable source for information |
|
|
in regard to the clandestine manufacture of chemicals.' |
|
|
- A novel input representation has been developed consisting of a combination of |
|
|
sparse encoding, Blosum encoding, and input derived from hidden Markov models. |
|
|
this method predicts T-cell epitopes for the genome of hepatitis C virus and discuss |
|
|
possible applications of the prediction method to guide the process of rational |
|
|
vaccine design. |
|
|
- source_sentence: 'Burray and The Barriers |
|
|
|
|
|
Undiscovered Scotland: The Churchill Barriers |
|
|
|
|
|
Our Past History: The Churchill Barriers Archived 17 December 2006 at the Wayback |
|
|
Machine |
|
|
|
|
|
Okneypics.com: photos of the barrier Archived 15 May 2008 at the Wayback Machine' |
|
|
sentences: |
|
|
- "For a neuron, in the limit of \n \n \n \n b\n =\n \ |
|
|
\ 0\n \n \n {\\displaystyle b=0}\n \n, the map becomes 1D, since\ |
|
|
\ \n \n \n \n y\n \n \n {\\displaystyle y}\n \n converges\ |
|
|
\ to a constant. If the parameter \n \n \n \n b\n \n \n\ |
|
|
\ {\\displaystyle b}\n \n is scanned in a range, different orbits will be\ |
|
|
\ seen, some periodic, others chaotic, that appear between two fixed points, one\ |
|
|
\ at \n \n \n \n x\n =\n 1\n \n \n {\\\ |
|
|
displaystyle x=1}\n \n ; \n \n \n \n y\n =\n 1\n\ |
|
|
\ \n \n {\\displaystyle y=1}\n \n and the other close to the value\ |
|
|
\ of \n \n \n \n k\n \n \n {\\displaystyle k}\n \n\ |
|
|
\ (which would be the regime excitable).\n\n\n== References ==" |
|
|
- 'Cerebellar Purkinje neurons have been proposed to have two distinct bursting |
|
|
modes: dendritically driven, by dendritic Ca2+ spikes, and somatically driven, |
|
|
wherein the persistent Na+ current is the burst initiator and the SK K+ current |
|
|
is the burst terminator. Purkinje neurons may utilise these bursting forms in |
|
|
information coding to the deep cerebellar nuclei.' |
|
|
- 'Burray and The Barriers |
|
|
|
|
|
Undiscovered Scotland: The Churchill Barriers |
|
|
|
|
|
Our Past History: The Churchill Barriers Archived 17 December 2006 at the Wayback |
|
|
Machine |
|
|
|
|
|
Okneypics.com: photos of the barrier Archived 15 May 2008 at the Wayback Machine' |
|
|
pipeline_tag: sentence-similarity |
|
|
library_name: sentence-transformers |
|
|
--- |
|
|
|
|
|
# SentenceTransformer based on zacbrld/MNLP_M2_document_encoder |
|
|
|
|
|
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [zacbrld/MNLP_M2_document_encoder](https://huggingface.co/zacbrld/MNLP_M2_document_encoder). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
- **Model Type:** Sentence Transformer |
|
|
- **Base model:** [zacbrld/MNLP_M2_document_encoder](https://huggingface.co/zacbrld/MNLP_M2_document_encoder) <!-- at revision 0256ba97b154a34e25bfdf236061c0fdb0c5d146 --> |
|
|
- **Maximum Sequence Length:** 256 tokens |
|
|
- **Output Dimensionality:** 384 dimensions |
|
|
- **Similarity Function:** Cosine Similarity |
|
|
<!-- - **Training Dataset:** Unknown --> |
|
|
<!-- - **Language:** Unknown --> |
|
|
<!-- - **License:** Unknown --> |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
|
|
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
|
|
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) |
|
|
|
|
|
### Full Model Architecture |
|
|
|
|
|
``` |
|
|
SentenceTransformer( |
|
|
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel |
|
|
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
|
|
(2): Normalize() |
|
|
) |
|
|
``` |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Direct Usage (Sentence Transformers) |
|
|
|
|
|
First install the Sentence Transformers library: |
|
|
|
|
|
```bash |
|
|
pip install -U sentence-transformers |
|
|
``` |
|
|
|
|
|
Then you can load this model and run inference. |
|
|
```python |
|
|
from sentence_transformers import SentenceTransformer |
|
|
|
|
|
# Download from the 🤗 Hub |
|
|
model = SentenceTransformer("zacbrld/MNLP_M2_document_encoder") |
|
|
# Run inference |
|
|
sentences = [ |
|
|
'Burray and The Barriers\nUndiscovered Scotland: The Churchill Barriers\nOur Past History: The Churchill Barriers Archived 17 December 2006 at the Wayback Machine\nOkneypics.com: photos of the barrier Archived 15 May 2008 at the Wayback Machine', |
|
|
'Burray and The Barriers\nUndiscovered Scotland: The Churchill Barriers\nOur Past History: The Churchill Barriers Archived 17 December 2006 at the Wayback Machine\nOkneypics.com: photos of the barrier Archived 15 May 2008 at the Wayback Machine', |
|
|
'Cerebellar Purkinje neurons have been proposed to have two distinct bursting modes: dendritically driven, by dendritic Ca2+ spikes, and somatically driven, wherein the persistent Na+ current is the burst initiator and the SK K+ current is the burst terminator. Purkinje neurons may utilise these bursting forms in information coding to the deep cerebellar nuclei.', |
|
|
] |
|
|
embeddings = model.encode(sentences) |
|
|
print(embeddings.shape) |
|
|
# [3, 384] |
|
|
|
|
|
# Get the similarity scores for the embeddings |
|
|
similarities = model.similarity(embeddings, embeddings) |
|
|
print(similarities.shape) |
|
|
# [3, 3] |
|
|
``` |
|
|
|
|
|
<!-- |
|
|
### Direct Usage (Transformers) |
|
|
|
|
|
<details><summary>Click to see the direct usage in Transformers</summary> |
|
|
|
|
|
</details> |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
### Downstream Usage (Sentence Transformers) |
|
|
|
|
|
You can finetune this model on your own dataset. |
|
|
|
|
|
<details><summary>Click to expand</summary> |
|
|
|
|
|
</details> |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
### Out-of-Scope Use |
|
|
|
|
|
*List how the model may foreseeably be misused and address what users ought not to do with the model.* |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
## Bias, Risks and Limitations |
|
|
|
|
|
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
### Recommendations |
|
|
|
|
|
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
|
|
--> |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Dataset |
|
|
|
|
|
#### Unnamed Dataset |
|
|
|
|
|
* Size: 5,489 training samples |
|
|
* Columns: <code>sentence_0</code> and <code>sentence_1</code> |
|
|
* Approximate statistics based on the first 1000 samples: |
|
|
| | sentence_0 | sentence_1 | |
|
|
|:--------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------| |
|
|
| type | string | string | |
|
|
| details | <ul><li>min: 34 tokens</li><li>mean: 144.23 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 34 tokens</li><li>mean: 144.23 tokens</li><li>max: 256 tokens</li></ul> | |
|
|
* Samples: |
|
|
| sentence_0 | sentence_1 | |
|
|
|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
|
|
| <code>In related work, Smoller, Temple, and Vogler propose that this shockwave may have resulted in our part of the universe having a lower density than that surrounding it, causing the accelerated expansion normally attributed to dark energy. <br>They also propose that this related theory could be tested: a universe with dark energy should give a figure for the cubic correction to redshift versus luminosity C = −0.180 at a = a whereas for Smoller, Temple, and Vogler's alternative C should be positive rather than negative. They give a more precise calculation for their wave model alternative as: the cubic correction to redshift versus luminosity at a = a is C = 0.359.</code> | <code>In related work, Smoller, Temple, and Vogler propose that this shockwave may have resulted in our part of the universe having a lower density than that surrounding it, causing the accelerated expansion normally attributed to dark energy. <br>They also propose that this related theory could be tested: a universe with dark energy should give a figure for the cubic correction to redshift versus luminosity C = −0.180 at a = a whereas for Smoller, Temple, and Vogler's alternative C should be positive rather than negative. They give a more precise calculation for their wave model alternative as: the cubic correction to redshift versus luminosity at a = a is C = 0.359.</code> | |
|
|
| <code>Evolution is a central organizing concept in biology. It is the change in heritable characteristics of populations over successive generations. In artificial selection, animals were selectively bred for specific traits.<br> Given that traits are inherited, populations contain a varied mix of traits, and reproduction is able to increase any population, Darwin argued that in the natural world, it was nature that played the role of humans in selecting for specific traits. Darwin inferred that individuals who possessed heritable traits better adapted to their environments are more likely to survive and produce more offspring than other individuals. He further inferred that this would lead to the accumulation of favorable traits over successive generations, thereby increasing the match between the organisms and their environment.</code> | <code>Evolution is a central organizing concept in biology. It is the change in heritable characteristics of populations over successive generations. In artificial selection, animals were selectively bred for specific traits.<br> Given that traits are inherited, populations contain a varied mix of traits, and reproduction is able to increase any population, Darwin argued that in the natural world, it was nature that played the role of humans in selecting for specific traits. Darwin inferred that individuals who possessed heritable traits better adapted to their environments are more likely to survive and produce more offspring than other individuals. He further inferred that this would lead to the accumulation of favorable traits over successive generations, thereby increasing the match between the organisms and their environment.</code> | |
|
|
| <code>The total number of engineers employed in the U.S. in 2015 was roughly 1.6 million. Of these, 278,340 were mechanical engineers (17.28%), the largest discipline by size. In 2012, the median annual income of mechanical engineers in the U.S. workforce was $80,580. The median income was highest when working for the government ($92,030), and lowest in education ($57,090). In 2014, the total number of mechanical engineering jobs was projected to grow 5% over the next decade. As of 2009, the average starting salary was $58,800 with a bachelor's degree.</code> | <code>The total number of engineers employed in the U.S. in 2015 was roughly 1.6 million. Of these, 278,340 were mechanical engineers (17.28%), the largest discipline by size. In 2012, the median annual income of mechanical engineers in the U.S. workforce was $80,580. The median income was highest when working for the government ($92,030), and lowest in education ($57,090). In 2014, the total number of mechanical engineering jobs was projected to grow 5% over the next decade. As of 2009, the average starting salary was $58,800 with a bachelor's degree.</code> | |
|
|
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: |
|
|
```json |
|
|
{ |
|
|
"scale": 20.0, |
|
|
"similarity_fct": "cos_sim" |
|
|
} |
|
|
``` |
|
|
|
|
|
### Training Hyperparameters |
|
|
#### Non-Default Hyperparameters |
|
|
|
|
|
- `per_device_train_batch_size`: 16 |
|
|
- `per_device_eval_batch_size`: 16 |
|
|
- `num_train_epochs`: 5 |
|
|
- `multi_dataset_batch_sampler`: round_robin |
|
|
|
|
|
#### All Hyperparameters |
|
|
<details><summary>Click to expand</summary> |
|
|
|
|
|
- `overwrite_output_dir`: False |
|
|
- `do_predict`: False |
|
|
- `eval_strategy`: no |
|
|
- `prediction_loss_only`: True |
|
|
- `per_device_train_batch_size`: 16 |
|
|
- `per_device_eval_batch_size`: 16 |
|
|
- `per_gpu_train_batch_size`: None |
|
|
- `per_gpu_eval_batch_size`: None |
|
|
- `gradient_accumulation_steps`: 1 |
|
|
- `eval_accumulation_steps`: None |
|
|
- `torch_empty_cache_steps`: None |
|
|
- `learning_rate`: 5e-05 |
|
|
- `weight_decay`: 0.0 |
|
|
- `adam_beta1`: 0.9 |
|
|
- `adam_beta2`: 0.999 |
|
|
- `adam_epsilon`: 1e-08 |
|
|
- `max_grad_norm`: 1 |
|
|
- `num_train_epochs`: 5 |
|
|
- `max_steps`: -1 |
|
|
- `lr_scheduler_type`: linear |
|
|
- `lr_scheduler_kwargs`: {} |
|
|
- `warmup_ratio`: 0.0 |
|
|
- `warmup_steps`: 0 |
|
|
- `log_level`: passive |
|
|
- `log_level_replica`: warning |
|
|
- `log_on_each_node`: True |
|
|
- `logging_nan_inf_filter`: True |
|
|
- `save_safetensors`: True |
|
|
- `save_on_each_node`: False |
|
|
- `save_only_model`: False |
|
|
- `restore_callback_states_from_checkpoint`: False |
|
|
- `no_cuda`: False |
|
|
- `use_cpu`: False |
|
|
- `use_mps_device`: False |
|
|
- `seed`: 42 |
|
|
- `data_seed`: None |
|
|
- `jit_mode_eval`: False |
|
|
- `use_ipex`: False |
|
|
- `bf16`: False |
|
|
- `fp16`: False |
|
|
- `fp16_opt_level`: O1 |
|
|
- `half_precision_backend`: auto |
|
|
- `bf16_full_eval`: False |
|
|
- `fp16_full_eval`: False |
|
|
- `tf32`: None |
|
|
- `local_rank`: 0 |
|
|
- `ddp_backend`: None |
|
|
- `tpu_num_cores`: None |
|
|
- `tpu_metrics_debug`: False |
|
|
- `debug`: [] |
|
|
- `dataloader_drop_last`: False |
|
|
- `dataloader_num_workers`: 0 |
|
|
- `dataloader_prefetch_factor`: None |
|
|
- `past_index`: -1 |
|
|
- `disable_tqdm`: False |
|
|
- `remove_unused_columns`: True |
|
|
- `label_names`: None |
|
|
- `load_best_model_at_end`: False |
|
|
- `ignore_data_skip`: False |
|
|
- `fsdp`: [] |
|
|
- `fsdp_min_num_params`: 0 |
|
|
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
|
|
- `tp_size`: 0 |
|
|
- `fsdp_transformer_layer_cls_to_wrap`: None |
|
|
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} |
|
|
- `deepspeed`: None |
|
|
- `label_smoothing_factor`: 0.0 |
|
|
- `optim`: adamw_torch |
|
|
- `optim_args`: None |
|
|
- `adafactor`: False |
|
|
- `group_by_length`: False |
|
|
- `length_column_name`: length |
|
|
- `ddp_find_unused_parameters`: None |
|
|
- `ddp_bucket_cap_mb`: None |
|
|
- `ddp_broadcast_buffers`: False |
|
|
- `dataloader_pin_memory`: True |
|
|
- `dataloader_persistent_workers`: False |
|
|
- `skip_memory_metrics`: True |
|
|
- `use_legacy_prediction_loop`: False |
|
|
- `push_to_hub`: False |
|
|
- `resume_from_checkpoint`: None |
|
|
- `hub_model_id`: None |
|
|
- `hub_strategy`: every_save |
|
|
- `hub_private_repo`: None |
|
|
- `hub_always_push`: False |
|
|
- `gradient_checkpointing`: False |
|
|
- `gradient_checkpointing_kwargs`: None |
|
|
- `include_inputs_for_metrics`: False |
|
|
- `include_for_metrics`: [] |
|
|
- `eval_do_concat_batches`: True |
|
|
- `fp16_backend`: auto |
|
|
- `push_to_hub_model_id`: None |
|
|
- `push_to_hub_organization`: None |
|
|
- `mp_parameters`: |
|
|
- `auto_find_batch_size`: False |
|
|
- `full_determinism`: False |
|
|
- `torchdynamo`: None |
|
|
- `ray_scope`: last |
|
|
- `ddp_timeout`: 1800 |
|
|
- `torch_compile`: False |
|
|
- `torch_compile_backend`: None |
|
|
- `torch_compile_mode`: None |
|
|
- `include_tokens_per_second`: False |
|
|
- `include_num_input_tokens_seen`: False |
|
|
- `neftune_noise_alpha`: None |
|
|
- `optim_target_modules`: None |
|
|
- `batch_eval_metrics`: False |
|
|
- `eval_on_start`: False |
|
|
- `use_liger_kernel`: False |
|
|
- `eval_use_gather_object`: False |
|
|
- `average_tokens_across_devices`: False |
|
|
- `prompts`: None |
|
|
- `batch_sampler`: batch_sampler |
|
|
- `multi_dataset_batch_sampler`: round_robin |
|
|
|
|
|
</details> |
|
|
|
|
|
### Training Logs |
|
|
| Epoch | Step | Training Loss | |
|
|
|:------:|:----:|:-------------:| |
|
|
| 1.4535 | 500 | 0.0002 | |
|
|
| 2.9070 | 1000 | 0.0 | |
|
|
| 4.3605 | 1500 | 0.0007 | |
|
|
|
|
|
|
|
|
### Framework Versions |
|
|
- Python: 3.10.11 |
|
|
- Sentence Transformers: 3.4.1 |
|
|
- Transformers: 4.51.3 |
|
|
- PyTorch: 2.6.0 |
|
|
- Accelerate: 1.7.0 |
|
|
- Datasets: 3.6.0 |
|
|
- Tokenizers: 0.21.1 |
|
|
|
|
|
## Citation |
|
|
|
|
|
### BibTeX |
|
|
|
|
|
#### Sentence Transformers |
|
|
```bibtex |
|
|
@inproceedings{reimers-2019-sentence-bert, |
|
|
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
|
|
author = "Reimers, Nils and Gurevych, Iryna", |
|
|
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
|
|
month = "11", |
|
|
year = "2019", |
|
|
publisher = "Association for Computational Linguistics", |
|
|
url = "https://arxiv.org/abs/1908.10084", |
|
|
} |
|
|
``` |
|
|
|
|
|
#### MultipleNegativesRankingLoss |
|
|
```bibtex |
|
|
@misc{henderson2017efficient, |
|
|
title={Efficient Natural Language Response Suggestion for Smart Reply}, |
|
|
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, |
|
|
year={2017}, |
|
|
eprint={1705.00652}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL} |
|
|
} |
|
|
``` |
|
|
|
|
|
<!-- |
|
|
## Glossary |
|
|
|
|
|
*Clearly define terms in order to be accessible across audiences.* |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
## Model Card Authors |
|
|
|
|
|
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
|
|
--> |
|
|
|
|
|
<!-- |
|
|
## Model Card Contact |
|
|
|
|
|
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
|
|
--> |