Sentence Similarity
sentence-transformers
Safetensors
roberta
feature-extraction
Generated from Trainer
dataset_size:8522
loss:DenoisingAutoEncoderLoss
text-embeddings-inference
Instructions to use SeppeV/roberta_TSDAE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use SeppeV/roberta_TSDAE with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("SeppeV/roberta_TSDAE") sentences = [ "This . A engineer and go a trip walking the when a The physicist the distance of the the drop bullet his rifle fires the deer to . engineer his . to account for he rifle licks finger the speed and of fires deer 5 right . statistician \"got!\"", "This is a mean joke.\nA physicist, an engineer, and a statistician go on a hunting trip, they are walking through the woods when they spot a deer in a clearing. The physicist calculates the distance of the target, the velocity and drop of the bullet, adjusts his rifle and fires, missing the deer 5 feet to the left. The engineer rolls his eyes. 'You forgot to account for wind. Give it here', he snatches the rifle, licks his finger and estimates the speed and direction of the wind and fires, missing the deer 5 feet to the right. Suddenly, the statistician claps his hands and yells \"We got him!\"", "While driving to work, robbers jumped into my car and stole everything.\nThey were pirates of the car I be in.", "Driving and trying to read twitter, I just ran over a poodle. Unfortunately I drive a Yaris. My car got a dent and the poodle got annoyed." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Add new SentenceTransformer model
Browse files- 1_Pooling/config.json +10 -0
- README.md +386 -0
- config.json +28 -0
- config_sentence_transformers.json +10 -0
- merges.txt +0 -0
- model.safetensors +3 -0
- modules.json +14 -0
- sentence_bert_config.json +4 -0
- special_tokens_map.json +51 -0
- tokenizer.json +0 -0
- tokenizer_config.json +65 -0
- vocab.json +0 -0
1_Pooling/config.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"word_embedding_dimension": 1024,
|
| 3 |
+
"pooling_mode_cls_token": true,
|
| 4 |
+
"pooling_mode_mean_tokens": false,
|
| 5 |
+
"pooling_mode_max_tokens": false,
|
| 6 |
+
"pooling_mode_mean_sqrt_len_tokens": false,
|
| 7 |
+
"pooling_mode_weightedmean_tokens": false,
|
| 8 |
+
"pooling_mode_lasttoken": false,
|
| 9 |
+
"include_prompt": true
|
| 10 |
+
}
|
README.md
ADDED
|
@@ -0,0 +1,386 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- sentence-transformers
|
| 4 |
+
- sentence-similarity
|
| 5 |
+
- feature-extraction
|
| 6 |
+
- generated_from_trainer
|
| 7 |
+
- dataset_size:8522
|
| 8 |
+
- loss:DenoisingAutoEncoderLoss
|
| 9 |
+
base_model: sentence-transformers/all-roberta-large-v1
|
| 10 |
+
widget:
|
| 11 |
+
- source_sentence: This . A engineer and go a trip walking the when a The physicist
|
| 12 |
+
the distance of the the drop bullet his rifle fires the deer to . engineer his
|
| 13 |
+
. to account for he rifle licks finger the speed and of fires deer 5 right . statistician
|
| 14 |
+
"got!"
|
| 15 |
+
sentences:
|
| 16 |
+
- 'This is a mean joke.
|
| 17 |
+
|
| 18 |
+
A physicist, an engineer, and a statistician go on a hunting trip, they are walking
|
| 19 |
+
through the woods when they spot a deer in a clearing. The physicist calculates
|
| 20 |
+
the distance of the target, the velocity and drop of the bullet, adjusts his rifle
|
| 21 |
+
and fires, missing the deer 5 feet to the left. The engineer rolls his eyes. ''You
|
| 22 |
+
forgot to account for wind. Give it here'', he snatches the rifle, licks his finger
|
| 23 |
+
and estimates the speed and direction of the wind and fires, missing the deer
|
| 24 |
+
5 feet to the right. Suddenly, the statistician claps his hands and yells "We
|
| 25 |
+
got him!"'
|
| 26 |
+
- 'While driving to work, robbers jumped into my car and stole everything.
|
| 27 |
+
|
| 28 |
+
They were pirates of the car I be in.'
|
| 29 |
+
- Driving and trying to read twitter, I just ran over a poodle. Unfortunately I
|
| 30 |
+
drive a Yaris. My car got a dent and the poodle got annoyed.
|
| 31 |
+
- source_sentence: ': the love?? They.'
|
| 32 |
+
sentences:
|
| 33 |
+
- I have a super hero joke Fantastic four
|
| 34 |
+
- 'Monroe: What did the trailer and the truck do after they fell in love?
|
| 35 |
+
|
| 36 |
+
Amanda: What?
|
| 37 |
+
|
| 38 |
+
Monroe: They got hitched.'
|
| 39 |
+
- 'JOSIAH: What is a lawn mower’s favorite kind of music?
|
| 40 |
+
|
| 41 |
+
TIM: I’m not sure.
|
| 42 |
+
|
| 43 |
+
JOSIAH: Bluegrass.'
|
| 44 |
+
- source_sentence: 'JAYDEN What panda ’ s: JAYDEN: Bam-BOO!'
|
| 45 |
+
sentences:
|
| 46 |
+
- BlackBerry and Apple have come together to create a something for ladies who have
|
| 47 |
+
trouble listening. It's been called the Black-i.
|
| 48 |
+
- Where do you put the Duke? In the duke box!
|
| 49 |
+
- 'JAYDEN: What is a panda’s favorite Halloween food?
|
| 50 |
+
|
| 51 |
+
CAYDEN: What?
|
| 52 |
+
|
| 53 |
+
JAYDEN: Bam-BOO!'
|
| 54 |
+
- source_sentence: we should be the time expand language, not it instead of 'probababably
|
| 55 |
+
sentences:
|
| 56 |
+
- '"Don''t dip your pen in company ink." - HR training seminar explaining why I
|
| 57 |
+
shouldn''t sleep with the receptionist...I think.'
|
| 58 |
+
- we should be using all the time technology frees up to expand language, not shorten
|
| 59 |
+
it. instead of 'prolly' try 'probababably.'
|
| 60 |
+
- If you like internet jokes, you should see my online bank account.
|
| 61 |
+
- source_sentence: yoga What the to when she him Nahimastay
|
| 62 |
+
sentences:
|
| 63 |
+
- 'CRESENCIO: Why do turkeys eat so little?
|
| 64 |
+
|
| 65 |
+
MAX: I don’t know.
|
| 66 |
+
|
| 67 |
+
CRESENCIO: Because they are always stuffed.'
|
| 68 |
+
- I'm really sick of making my dog a birthday cake every 52 days.
|
| 69 |
+
- Redneck yoga. What did the redneck say to the yoga instructor when she asked him
|
| 70 |
+
to leave the class? Nahimastay
|
| 71 |
+
pipeline_tag: sentence-similarity
|
| 72 |
+
library_name: sentence-transformers
|
| 73 |
+
---
|
| 74 |
+
|
| 75 |
+
# SentenceTransformer based on sentence-transformers/all-roberta-large-v1
|
| 76 |
+
|
| 77 |
+
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-roberta-large-v1](https://huggingface.co/sentence-transformers/all-roberta-large-v1). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
| 78 |
+
|
| 79 |
+
## Model Details
|
| 80 |
+
|
| 81 |
+
### Model Description
|
| 82 |
+
- **Model Type:** Sentence Transformer
|
| 83 |
+
- **Base model:** [sentence-transformers/all-roberta-large-v1](https://huggingface.co/sentence-transformers/all-roberta-large-v1) <!-- at revision cf74d8acd4f198de950bf004b262e6accfed5d2c -->
|
| 84 |
+
- **Maximum Sequence Length:** 512 tokens
|
| 85 |
+
- **Output Dimensionality:** 1024 dimensions
|
| 86 |
+
- **Similarity Function:** Cosine Similarity
|
| 87 |
+
<!-- - **Training Dataset:** Unknown -->
|
| 88 |
+
<!-- - **Language:** Unknown -->
|
| 89 |
+
<!-- - **License:** Unknown -->
|
| 90 |
+
|
| 91 |
+
### Model Sources
|
| 92 |
+
|
| 93 |
+
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
|
| 94 |
+
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
|
| 95 |
+
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
|
| 96 |
+
|
| 97 |
+
### Full Model Architecture
|
| 98 |
+
|
| 99 |
+
```
|
| 100 |
+
SentenceTransformer(
|
| 101 |
+
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel
|
| 102 |
+
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
| 103 |
+
)
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
## Usage
|
| 107 |
+
|
| 108 |
+
### Direct Usage (Sentence Transformers)
|
| 109 |
+
|
| 110 |
+
First install the Sentence Transformers library:
|
| 111 |
+
|
| 112 |
+
```bash
|
| 113 |
+
pip install -U sentence-transformers
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
Then you can load this model and run inference.
|
| 117 |
+
```python
|
| 118 |
+
from sentence_transformers import SentenceTransformer
|
| 119 |
+
|
| 120 |
+
# Download from the 🤗 Hub
|
| 121 |
+
model = SentenceTransformer("SeppeV/roberta_TSDAE")
|
| 122 |
+
# Run inference
|
| 123 |
+
sentences = [
|
| 124 |
+
'yoga What the to when she him Nahimastay',
|
| 125 |
+
'Redneck yoga. What did the redneck say to the yoga instructor when she asked him to leave the class? Nahimastay',
|
| 126 |
+
"I'm really sick of making my dog a birthday cake every 52 days.",
|
| 127 |
+
]
|
| 128 |
+
embeddings = model.encode(sentences)
|
| 129 |
+
print(embeddings.shape)
|
| 130 |
+
# [3, 1024]
|
| 131 |
+
|
| 132 |
+
# Get the similarity scores for the embeddings
|
| 133 |
+
similarities = model.similarity(embeddings, embeddings)
|
| 134 |
+
print(similarities.shape)
|
| 135 |
+
# [3, 3]
|
| 136 |
+
```
|
| 137 |
+
|
| 138 |
+
<!--
|
| 139 |
+
### Direct Usage (Transformers)
|
| 140 |
+
|
| 141 |
+
<details><summary>Click to see the direct usage in Transformers</summary>
|
| 142 |
+
|
| 143 |
+
</details>
|
| 144 |
+
-->
|
| 145 |
+
|
| 146 |
+
<!--
|
| 147 |
+
### Downstream Usage (Sentence Transformers)
|
| 148 |
+
|
| 149 |
+
You can finetune this model on your own dataset.
|
| 150 |
+
|
| 151 |
+
<details><summary>Click to expand</summary>
|
| 152 |
+
|
| 153 |
+
</details>
|
| 154 |
+
-->
|
| 155 |
+
|
| 156 |
+
<!--
|
| 157 |
+
### Out-of-Scope Use
|
| 158 |
+
|
| 159 |
+
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
| 160 |
+
-->
|
| 161 |
+
|
| 162 |
+
<!--
|
| 163 |
+
## Bias, Risks and Limitations
|
| 164 |
+
|
| 165 |
+
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
|
| 166 |
+
-->
|
| 167 |
+
|
| 168 |
+
<!--
|
| 169 |
+
### Recommendations
|
| 170 |
+
|
| 171 |
+
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
|
| 172 |
+
-->
|
| 173 |
+
|
| 174 |
+
## Training Details
|
| 175 |
+
|
| 176 |
+
### Training Dataset
|
| 177 |
+
|
| 178 |
+
#### Unnamed Dataset
|
| 179 |
+
|
| 180 |
+
* Size: 8,522 training samples
|
| 181 |
+
* Columns: <code>sentence_0</code> and <code>sentence_1</code>
|
| 182 |
+
* Approximate statistics based on the first 1000 samples:
|
| 183 |
+
| | sentence_0 | sentence_1 |
|
| 184 |
+
|:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
|
| 185 |
+
| type | string | string |
|
| 186 |
+
| details | <ul><li>min: 3 tokens</li><li>mean: 13.95 tokens</li><li>max: 83 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 33.15 tokens</li><li>max: 231 tokens</li></ul> |
|
| 187 |
+
* Samples:
|
| 188 |
+
| sentence_0 | sentence_1 |
|
| 189 |
+
|:-------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
| 190 |
+
| <code>.... recently changed sound of my clock to Justin Bieber Baby" I wake up 5 earlier do to to it.</code> | <code>Justin Bieber.... I have recently changed the sound of my alarm clock to "Justin Bieber - Baby". Now I wake up 5 minutes earlier every day, so I don't have to listen to it.</code> |
|
| 191 |
+
| <code>A got yesterday . joke be funny it had a tit</code> | <code>A woman got breast implants made of wood yesterday.<br>This joke would be funny if it had a punchline<br><br>Wooden tit</code> |
|
| 192 |
+
| <code>TIL unvaccinated children are less likely autistic Because they more</code> | <code>TIL unvaccinated children are less likely to be autistic<br>Because they are more likely to be dead</code> |
|
| 193 |
+
* Loss: [<code>DenoisingAutoEncoderLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#denoisingautoencoderloss)
|
| 194 |
+
|
| 195 |
+
### Training Hyperparameters
|
| 196 |
+
#### Non-Default Hyperparameters
|
| 197 |
+
|
| 198 |
+
- `num_train_epochs`: 1
|
| 199 |
+
- `multi_dataset_batch_sampler`: round_robin
|
| 200 |
+
|
| 201 |
+
#### All Hyperparameters
|
| 202 |
+
<details><summary>Click to expand</summary>
|
| 203 |
+
|
| 204 |
+
- `overwrite_output_dir`: False
|
| 205 |
+
- `do_predict`: False
|
| 206 |
+
- `eval_strategy`: no
|
| 207 |
+
- `prediction_loss_only`: True
|
| 208 |
+
- `per_device_train_batch_size`: 8
|
| 209 |
+
- `per_device_eval_batch_size`: 8
|
| 210 |
+
- `per_gpu_train_batch_size`: None
|
| 211 |
+
- `per_gpu_eval_batch_size`: None
|
| 212 |
+
- `gradient_accumulation_steps`: 1
|
| 213 |
+
- `eval_accumulation_steps`: None
|
| 214 |
+
- `torch_empty_cache_steps`: None
|
| 215 |
+
- `learning_rate`: 5e-05
|
| 216 |
+
- `weight_decay`: 0.0
|
| 217 |
+
- `adam_beta1`: 0.9
|
| 218 |
+
- `adam_beta2`: 0.999
|
| 219 |
+
- `adam_epsilon`: 1e-08
|
| 220 |
+
- `max_grad_norm`: 1
|
| 221 |
+
- `num_train_epochs`: 1
|
| 222 |
+
- `max_steps`: -1
|
| 223 |
+
- `lr_scheduler_type`: linear
|
| 224 |
+
- `lr_scheduler_kwargs`: {}
|
| 225 |
+
- `warmup_ratio`: 0.0
|
| 226 |
+
- `warmup_steps`: 0
|
| 227 |
+
- `log_level`: passive
|
| 228 |
+
- `log_level_replica`: warning
|
| 229 |
+
- `log_on_each_node`: True
|
| 230 |
+
- `logging_nan_inf_filter`: True
|
| 231 |
+
- `save_safetensors`: True
|
| 232 |
+
- `save_on_each_node`: False
|
| 233 |
+
- `save_only_model`: False
|
| 234 |
+
- `restore_callback_states_from_checkpoint`: False
|
| 235 |
+
- `no_cuda`: False
|
| 236 |
+
- `use_cpu`: False
|
| 237 |
+
- `use_mps_device`: False
|
| 238 |
+
- `seed`: 42
|
| 239 |
+
- `data_seed`: None
|
| 240 |
+
- `jit_mode_eval`: False
|
| 241 |
+
- `use_ipex`: False
|
| 242 |
+
- `bf16`: False
|
| 243 |
+
- `fp16`: False
|
| 244 |
+
- `fp16_opt_level`: O1
|
| 245 |
+
- `half_precision_backend`: auto
|
| 246 |
+
- `bf16_full_eval`: False
|
| 247 |
+
- `fp16_full_eval`: False
|
| 248 |
+
- `tf32`: None
|
| 249 |
+
- `local_rank`: 0
|
| 250 |
+
- `ddp_backend`: None
|
| 251 |
+
- `tpu_num_cores`: None
|
| 252 |
+
- `tpu_metrics_debug`: False
|
| 253 |
+
- `debug`: []
|
| 254 |
+
- `dataloader_drop_last`: False
|
| 255 |
+
- `dataloader_num_workers`: 0
|
| 256 |
+
- `dataloader_prefetch_factor`: None
|
| 257 |
+
- `past_index`: -1
|
| 258 |
+
- `disable_tqdm`: False
|
| 259 |
+
- `remove_unused_columns`: True
|
| 260 |
+
- `label_names`: None
|
| 261 |
+
- `load_best_model_at_end`: False
|
| 262 |
+
- `ignore_data_skip`: False
|
| 263 |
+
- `fsdp`: []
|
| 264 |
+
- `fsdp_min_num_params`: 0
|
| 265 |
+
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
|
| 266 |
+
- `fsdp_transformer_layer_cls_to_wrap`: None
|
| 267 |
+
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
|
| 268 |
+
- `deepspeed`: None
|
| 269 |
+
- `label_smoothing_factor`: 0.0
|
| 270 |
+
- `optim`: adamw_torch
|
| 271 |
+
- `optim_args`: None
|
| 272 |
+
- `adafactor`: False
|
| 273 |
+
- `group_by_length`: False
|
| 274 |
+
- `length_column_name`: length
|
| 275 |
+
- `ddp_find_unused_parameters`: None
|
| 276 |
+
- `ddp_bucket_cap_mb`: None
|
| 277 |
+
- `ddp_broadcast_buffers`: False
|
| 278 |
+
- `dataloader_pin_memory`: True
|
| 279 |
+
- `dataloader_persistent_workers`: False
|
| 280 |
+
- `skip_memory_metrics`: True
|
| 281 |
+
- `use_legacy_prediction_loop`: False
|
| 282 |
+
- `push_to_hub`: False
|
| 283 |
+
- `resume_from_checkpoint`: None
|
| 284 |
+
- `hub_model_id`: None
|
| 285 |
+
- `hub_strategy`: every_save
|
| 286 |
+
- `hub_private_repo`: None
|
| 287 |
+
- `hub_always_push`: False
|
| 288 |
+
- `gradient_checkpointing`: False
|
| 289 |
+
- `gradient_checkpointing_kwargs`: None
|
| 290 |
+
- `include_inputs_for_metrics`: False
|
| 291 |
+
- `include_for_metrics`: []
|
| 292 |
+
- `eval_do_concat_batches`: True
|
| 293 |
+
- `fp16_backend`: auto
|
| 294 |
+
- `push_to_hub_model_id`: None
|
| 295 |
+
- `push_to_hub_organization`: None
|
| 296 |
+
- `mp_parameters`:
|
| 297 |
+
- `auto_find_batch_size`: False
|
| 298 |
+
- `full_determinism`: False
|
| 299 |
+
- `torchdynamo`: None
|
| 300 |
+
- `ray_scope`: last
|
| 301 |
+
- `ddp_timeout`: 1800
|
| 302 |
+
- `torch_compile`: False
|
| 303 |
+
- `torch_compile_backend`: None
|
| 304 |
+
- `torch_compile_mode`: None
|
| 305 |
+
- `dispatch_batches`: None
|
| 306 |
+
- `split_batches`: None
|
| 307 |
+
- `include_tokens_per_second`: False
|
| 308 |
+
- `include_num_input_tokens_seen`: False
|
| 309 |
+
- `neftune_noise_alpha`: None
|
| 310 |
+
- `optim_target_modules`: None
|
| 311 |
+
- `batch_eval_metrics`: False
|
| 312 |
+
- `eval_on_start`: False
|
| 313 |
+
- `use_liger_kernel`: False
|
| 314 |
+
- `eval_use_gather_object`: False
|
| 315 |
+
- `average_tokens_across_devices`: False
|
| 316 |
+
- `prompts`: None
|
| 317 |
+
- `batch_sampler`: batch_sampler
|
| 318 |
+
- `multi_dataset_batch_sampler`: round_robin
|
| 319 |
+
|
| 320 |
+
</details>
|
| 321 |
+
|
| 322 |
+
### Training Logs
|
| 323 |
+
| Epoch | Step | Training Loss |
|
| 324 |
+
|:------:|:----:|:-------------:|
|
| 325 |
+
| 0.4690 | 500 | 7.4675 |
|
| 326 |
+
| 0.9381 | 1000 | 6.8434 |
|
| 327 |
+
|
| 328 |
+
|
| 329 |
+
### Framework Versions
|
| 330 |
+
- Python: 3.10.16
|
| 331 |
+
- Sentence Transformers: 3.4.1
|
| 332 |
+
- Transformers: 4.49.0
|
| 333 |
+
- PyTorch: 2.6.0
|
| 334 |
+
- Accelerate: 1.4.0
|
| 335 |
+
- Datasets: 3.3.2
|
| 336 |
+
- Tokenizers: 0.21.0
|
| 337 |
+
|
| 338 |
+
## Citation
|
| 339 |
+
|
| 340 |
+
### BibTeX
|
| 341 |
+
|
| 342 |
+
#### Sentence Transformers
|
| 343 |
+
```bibtex
|
| 344 |
+
@inproceedings{reimers-2019-sentence-bert,
|
| 345 |
+
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
|
| 346 |
+
author = "Reimers, Nils and Gurevych, Iryna",
|
| 347 |
+
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
|
| 348 |
+
month = "11",
|
| 349 |
+
year = "2019",
|
| 350 |
+
publisher = "Association for Computational Linguistics",
|
| 351 |
+
url = "https://arxiv.org/abs/1908.10084",
|
| 352 |
+
}
|
| 353 |
+
```
|
| 354 |
+
|
| 355 |
+
#### DenoisingAutoEncoderLoss
|
| 356 |
+
```bibtex
|
| 357 |
+
@inproceedings{wang-2021-TSDAE,
|
| 358 |
+
title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
|
| 359 |
+
author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna",
|
| 360 |
+
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
|
| 361 |
+
month = nov,
|
| 362 |
+
year = "2021",
|
| 363 |
+
address = "Punta Cana, Dominican Republic",
|
| 364 |
+
publisher = "Association for Computational Linguistics",
|
| 365 |
+
pages = "671--688",
|
| 366 |
+
url = "https://arxiv.org/abs/2104.06979",
|
| 367 |
+
}
|
| 368 |
+
```
|
| 369 |
+
|
| 370 |
+
<!--
|
| 371 |
+
## Glossary
|
| 372 |
+
|
| 373 |
+
*Clearly define terms in order to be accessible across audiences.*
|
| 374 |
+
-->
|
| 375 |
+
|
| 376 |
+
<!--
|
| 377 |
+
## Model Card Authors
|
| 378 |
+
|
| 379 |
+
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
|
| 380 |
+
-->
|
| 381 |
+
|
| 382 |
+
<!--
|
| 383 |
+
## Model Card Contact
|
| 384 |
+
|
| 385 |
+
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
|
| 386 |
+
-->
|
config.json
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_name_or_path": "sentence-transformers/all-roberta-large-v1",
|
| 3 |
+
"architectures": [
|
| 4 |
+
"RobertaModel"
|
| 5 |
+
],
|
| 6 |
+
"attention_probs_dropout_prob": 0.1,
|
| 7 |
+
"bos_token_id": 0,
|
| 8 |
+
"classifier_dropout": null,
|
| 9 |
+
"eos_token_id": 2,
|
| 10 |
+
"gradient_checkpointing": false,
|
| 11 |
+
"hidden_act": "gelu",
|
| 12 |
+
"hidden_dropout_prob": 0.1,
|
| 13 |
+
"hidden_size": 1024,
|
| 14 |
+
"initializer_range": 0.02,
|
| 15 |
+
"intermediate_size": 4096,
|
| 16 |
+
"layer_norm_eps": 1e-05,
|
| 17 |
+
"max_position_embeddings": 514,
|
| 18 |
+
"model_type": "roberta",
|
| 19 |
+
"num_attention_heads": 16,
|
| 20 |
+
"num_hidden_layers": 24,
|
| 21 |
+
"pad_token_id": 1,
|
| 22 |
+
"position_embedding_type": "absolute",
|
| 23 |
+
"torch_dtype": "float32",
|
| 24 |
+
"transformers_version": "4.49.0",
|
| 25 |
+
"type_vocab_size": 1,
|
| 26 |
+
"use_cache": true,
|
| 27 |
+
"vocab_size": 50265
|
| 28 |
+
}
|
config_sentence_transformers.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"__version__": {
|
| 3 |
+
"sentence_transformers": "3.4.1",
|
| 4 |
+
"transformers": "4.49.0",
|
| 5 |
+
"pytorch": "2.6.0"
|
| 6 |
+
},
|
| 7 |
+
"prompts": {},
|
| 8 |
+
"default_prompt_name": null,
|
| 9 |
+
"similarity_fn_name": "cosine"
|
| 10 |
+
}
|
merges.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f4ab4d620bc68056d8a1deefb3dcee6e14157c0f66010546064b3175c10e9c3a
|
| 3 |
+
size 1421483904
|
modules.json
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"idx": 0,
|
| 4 |
+
"name": "0",
|
| 5 |
+
"path": "",
|
| 6 |
+
"type": "sentence_transformers.models.Transformer"
|
| 7 |
+
},
|
| 8 |
+
{
|
| 9 |
+
"idx": 1,
|
| 10 |
+
"name": "1",
|
| 11 |
+
"path": "1_Pooling",
|
| 12 |
+
"type": "sentence_transformers.models.Pooling"
|
| 13 |
+
}
|
| 14 |
+
]
|
sentence_bert_config.json
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"max_seq_length": 512,
|
| 3 |
+
"do_lower_case": false
|
| 4 |
+
}
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bos_token": {
|
| 3 |
+
"content": "<s>",
|
| 4 |
+
"lstrip": false,
|
| 5 |
+
"normalized": false,
|
| 6 |
+
"rstrip": false,
|
| 7 |
+
"single_word": false
|
| 8 |
+
},
|
| 9 |
+
"cls_token": {
|
| 10 |
+
"content": "<s>",
|
| 11 |
+
"lstrip": false,
|
| 12 |
+
"normalized": false,
|
| 13 |
+
"rstrip": false,
|
| 14 |
+
"single_word": false
|
| 15 |
+
},
|
| 16 |
+
"eos_token": {
|
| 17 |
+
"content": "</s>",
|
| 18 |
+
"lstrip": false,
|
| 19 |
+
"normalized": false,
|
| 20 |
+
"rstrip": false,
|
| 21 |
+
"single_word": false
|
| 22 |
+
},
|
| 23 |
+
"mask_token": {
|
| 24 |
+
"content": "<mask>",
|
| 25 |
+
"lstrip": true,
|
| 26 |
+
"normalized": false,
|
| 27 |
+
"rstrip": false,
|
| 28 |
+
"single_word": false
|
| 29 |
+
},
|
| 30 |
+
"pad_token": {
|
| 31 |
+
"content": "<pad>",
|
| 32 |
+
"lstrip": false,
|
| 33 |
+
"normalized": false,
|
| 34 |
+
"rstrip": false,
|
| 35 |
+
"single_word": false
|
| 36 |
+
},
|
| 37 |
+
"sep_token": {
|
| 38 |
+
"content": "</s>",
|
| 39 |
+
"lstrip": false,
|
| 40 |
+
"normalized": false,
|
| 41 |
+
"rstrip": false,
|
| 42 |
+
"single_word": false
|
| 43 |
+
},
|
| 44 |
+
"unk_token": {
|
| 45 |
+
"content": "<unk>",
|
| 46 |
+
"lstrip": false,
|
| 47 |
+
"normalized": false,
|
| 48 |
+
"rstrip": false,
|
| 49 |
+
"single_word": false
|
| 50 |
+
}
|
| 51 |
+
}
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"added_tokens_decoder": {
|
| 4 |
+
"0": {
|
| 5 |
+
"content": "<s>",
|
| 6 |
+
"lstrip": false,
|
| 7 |
+
"normalized": false,
|
| 8 |
+
"rstrip": false,
|
| 9 |
+
"single_word": false,
|
| 10 |
+
"special": true
|
| 11 |
+
},
|
| 12 |
+
"1": {
|
| 13 |
+
"content": "<pad>",
|
| 14 |
+
"lstrip": false,
|
| 15 |
+
"normalized": false,
|
| 16 |
+
"rstrip": false,
|
| 17 |
+
"single_word": false,
|
| 18 |
+
"special": true
|
| 19 |
+
},
|
| 20 |
+
"2": {
|
| 21 |
+
"content": "</s>",
|
| 22 |
+
"lstrip": false,
|
| 23 |
+
"normalized": false,
|
| 24 |
+
"rstrip": false,
|
| 25 |
+
"single_word": false,
|
| 26 |
+
"special": true
|
| 27 |
+
},
|
| 28 |
+
"3": {
|
| 29 |
+
"content": "<unk>",
|
| 30 |
+
"lstrip": false,
|
| 31 |
+
"normalized": false,
|
| 32 |
+
"rstrip": false,
|
| 33 |
+
"single_word": false,
|
| 34 |
+
"special": true
|
| 35 |
+
},
|
| 36 |
+
"50264": {
|
| 37 |
+
"content": "<mask>",
|
| 38 |
+
"lstrip": true,
|
| 39 |
+
"normalized": false,
|
| 40 |
+
"rstrip": false,
|
| 41 |
+
"single_word": false,
|
| 42 |
+
"special": true
|
| 43 |
+
}
|
| 44 |
+
},
|
| 45 |
+
"bos_token": "<s>",
|
| 46 |
+
"clean_up_tokenization_spaces": false,
|
| 47 |
+
"cls_token": "<s>",
|
| 48 |
+
"eos_token": "</s>",
|
| 49 |
+
"errors": "replace",
|
| 50 |
+
"extra_special_tokens": {},
|
| 51 |
+
"mask_token": "<mask>",
|
| 52 |
+
"max_length": 128,
|
| 53 |
+
"model_max_length": 512,
|
| 54 |
+
"pad_to_multiple_of": null,
|
| 55 |
+
"pad_token": "<pad>",
|
| 56 |
+
"pad_token_type_id": 0,
|
| 57 |
+
"padding_side": "right",
|
| 58 |
+
"sep_token": "</s>",
|
| 59 |
+
"stride": 0,
|
| 60 |
+
"tokenizer_class": "RobertaTokenizer",
|
| 61 |
+
"trim_offsets": true,
|
| 62 |
+
"truncation_side": "right",
|
| 63 |
+
"truncation_strategy": "longest_first",
|
| 64 |
+
"unk_token": "<unk>"
|
| 65 |
+
}
|
vocab.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|