Add new SentenceTransformer model
Browse files
README.md
CHANGED
|
@@ -12,50 +12,8 @@ tags:
|
|
| 12 |
- retrieval
|
| 13 |
- reranking
|
| 14 |
- generated_from_trainer
|
| 15 |
-
- dataset_size:49642
|
| 16 |
- loss:ArcFaceInBatchLoss
|
| 17 |
-
base_model:
|
| 18 |
-
widget:
|
| 19 |
-
- source_sentence: '"How much would I need to narrate a ""Let''s Play"" video in order
|
| 20 |
-
to make money from it on YouTube?"'
|
| 21 |
-
sentences:
|
| 22 |
-
- How much money do people make from YouTube videos with 1 million views?
|
| 23 |
-
- '"How much would I need to narrate a ""Let''s Play"" video in order to make money
|
| 24 |
-
from it on YouTube?"'
|
| 25 |
-
- '"Does the sentence, ""I expect to be disappointed,"" make sense?"'
|
| 26 |
-
- source_sentence: '"I appreciate that.'
|
| 27 |
-
sentences:
|
| 28 |
-
- '"How is the Mariner rewarded in ""The Rime of the Ancient Mariner"" by Samuel
|
| 29 |
-
Taylor Coleridge?"'
|
| 30 |
-
- '"I appreciate that.'
|
| 31 |
-
- I can appreciate that.
|
| 32 |
-
- source_sentence: '"""It is very easy to defeat someone, but too hard to win some
|
| 33 |
-
one"". What does the previous sentence mean?"'
|
| 34 |
-
sentences:
|
| 35 |
-
- '"How can you use the word ""visceral"" in a sentence?"'
|
| 36 |
-
- '"""It is very easy to defeat someone, but too hard to win some one"". What does
|
| 37 |
-
the previous sentence mean?"'
|
| 38 |
-
- '"What does ""The loudest one in the room is the weakest one in the room."" Mean?"'
|
| 39 |
-
- source_sentence: '" We condemn this raid which is in our view illegal and morally
|
| 40 |
-
and politically unjustifiable , " London-based NCRI official Ali Safavi told Reuters
|
| 41 |
-
by telephone .'
|
| 42 |
-
sentences:
|
| 43 |
-
- 'London-based NCRI official Ali Safavi told Reuters : " We condemn this raid ,
|
| 44 |
-
which is in our view illegal and morally and politically unjustifiable . "'
|
| 45 |
-
- The social awkwardness is complicated by the fact that Marianne is a white girl
|
| 46 |
-
living with a black family .
|
| 47 |
-
- art's cause, this in my opinion
|
| 48 |
-
- source_sentence: '"If you click ""like"" on an old post that someone made on your
|
| 49 |
-
wall yet you''re no longer Facebook friends, will they still receive a notification?"'
|
| 50 |
-
sentences:
|
| 51 |
-
- '"Is there is any two wheeler having a gear box which has the feature ""automatic
|
| 52 |
-
neutral"" when the engine is off while it is in gear?"'
|
| 53 |
-
- '"If you click ""like"" on an old post that someone made on your wall yet you''re
|
| 54 |
-
no longer Facebook friends, will they still receive a notification?"'
|
| 55 |
-
- '"If your teenage son posted ""La commedia e finita"" on his Facebook wall, would
|
| 56 |
-
you be concerned?"'
|
| 57 |
-
datasets:
|
| 58 |
-
- redis/langcache-sentencepairs-v2
|
| 59 |
pipeline_tag: sentence-similarity
|
| 60 |
library_name: sentence-transformers
|
| 61 |
metrics:
|
|
@@ -78,45 +36,44 @@ model-index:
|
|
| 78 |
type: test
|
| 79 |
metrics:
|
| 80 |
- type: cosine_accuracy@1
|
| 81 |
-
value: 0.
|
| 82 |
name: Cosine Accuracy@1
|
| 83 |
- type: cosine_precision@1
|
| 84 |
-
value: 0.
|
| 85 |
name: Cosine Precision@1
|
| 86 |
- type: cosine_recall@1
|
| 87 |
-
value: 0.
|
| 88 |
name: Cosine Recall@1
|
| 89 |
- type: cosine_ndcg@10
|
| 90 |
-
value: 0.
|
| 91 |
name: Cosine Ndcg@10
|
| 92 |
- type: cosine_mrr@1
|
| 93 |
-
value: 0.
|
| 94 |
name: Cosine Mrr@1
|
| 95 |
- type: cosine_map@100
|
| 96 |
-
value: 0.
|
| 97 |
name: Cosine Map@100
|
| 98 |
- type: cosine_auc_precision_cache_hit_ratio
|
| 99 |
-
value: 0.
|
| 100 |
name: Cosine Auc Precision Cache Hit Ratio
|
| 101 |
- type: cosine_auc_similarity_distribution
|
| 102 |
-
value: 0.
|
| 103 |
name: Cosine Auc Similarity Distribution
|
| 104 |
---
|
| 105 |
|
| 106 |
# Redis fine-tuned BiEncoder model for semantic caching on LangCache
|
| 107 |
|
| 108 |
-
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [
|
| 109 |
|
| 110 |
## Model Details
|
| 111 |
|
| 112 |
### Model Description
|
| 113 |
- **Model Type:** Sentence Transformer
|
| 114 |
-
- **Base model:** [
|
| 115 |
-
- **Maximum Sequence Length:**
|
| 116 |
- **Output Dimensionality:** 384 dimensions
|
| 117 |
- **Similarity Function:** Cosine Similarity
|
| 118 |
-
- **Training Dataset:**
|
| 119 |
-
- [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
|
| 120 |
- **Language:** en
|
| 121 |
- **License:** apache-2.0
|
| 122 |
|
|
@@ -130,7 +87,7 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [s
|
|
| 130 |
|
| 131 |
```
|
| 132 |
SentenceTransformer(
|
| 133 |
-
(0): Transformer({'max_seq_length':
|
| 134 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
| 135 |
(2): Normalize()
|
| 136 |
)
|
|
@@ -154,9 +111,9 @@ from sentence_transformers import SentenceTransformer
|
|
| 154 |
model = SentenceTransformer("redis/langcache-embed-v3-mini")
|
| 155 |
# Run inference
|
| 156 |
sentences = [
|
| 157 |
-
'
|
| 158 |
-
|
| 159 |
-
'
|
| 160 |
]
|
| 161 |
embeddings = model.encode(sentences)
|
| 162 |
print(embeddings.shape)
|
|
@@ -165,9 +122,9 @@ print(embeddings.shape)
|
|
| 165 |
# Get the similarity scores for the embeddings
|
| 166 |
similarities = model.similarity(embeddings, embeddings)
|
| 167 |
print(similarities)
|
| 168 |
-
# tensor([[
|
| 169 |
-
# [
|
| 170 |
-
# [0.
|
| 171 |
```
|
| 172 |
|
| 173 |
<!--
|
|
@@ -205,14 +162,14 @@ You can finetune this model on your own dataset.
|
|
| 205 |
|
| 206 |
| Metric | Value |
|
| 207 |
|:-------------------------------------|:-----------|
|
| 208 |
-
| cosine_accuracy@1 | 0.
|
| 209 |
-
| cosine_precision@1 | 0.
|
| 210 |
-
| cosine_recall@1 | 0.
|
| 211 |
-
| **cosine_ndcg@10** | **0.
|
| 212 |
-
| cosine_mrr@1 | 0.
|
| 213 |
-
| cosine_map@100 | 0.
|
| 214 |
-
| cosine_auc_precision_cache_hit_ratio | 0.
|
| 215 |
-
| cosine_auc_similarity_distribution | 0.
|
| 216 |
|
| 217 |
<!--
|
| 218 |
## Bias, Risks and Limitations
|
|
@@ -228,76 +185,19 @@ You can finetune this model on your own dataset.
|
|
| 228 |
|
| 229 |
## Training Details
|
| 230 |
|
| 231 |
-
### Training Dataset
|
| 232 |
-
|
| 233 |
-
#### LangCache Sentence Pairs (all)
|
| 234 |
-
|
| 235 |
-
* Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
|
| 236 |
-
* Size: 132,354 training samples
|
| 237 |
-
* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
|
| 238 |
-
* Approximate statistics based on the first 1000 samples:
|
| 239 |
-
| | anchor | positive | negative |
|
| 240 |
-
|:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
| 241 |
-
| type | string | string | string |
|
| 242 |
-
| details | <ul><li>min: 4 tokens</li><li>mean: 27.17 tokens</li><li>max: 120 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 26.61 tokens</li><li>max: 120 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 19.39 tokens</li><li>max: 64 tokens</li></ul> |
|
| 243 |
-
* Samples:
|
| 244 |
-
| anchor | positive | negative |
|
| 245 |
-
|:----------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|
|
| 246 |
-
| <code> What high potential jobs are there other than computer science?</code> | <code> What high potential jobs are there other than computer science?</code> | <code>Why IT or Computer Science jobs are being over rated than other Engineering jobs?</code> |
|
| 247 |
-
| <code> Would India ever be able to develop a missile system like S300 or S400 missile?</code> | <code> Would India ever be able to develop a missile system like S300 or S400 missile?</code> | <code>Should India buy the Russian S400 air defence missile system?</code> |
|
| 248 |
-
| <code> water from the faucet is being drunk by a yellow dog</code> | <code>A yellow dog is drinking water from the faucet</code> | <code>Childlessness is low in Eastern European countries.</code> |
|
| 249 |
-
* Loss: <code>losses.ArcFaceInBatchLoss</code> with these parameters:
|
| 250 |
-
```json
|
| 251 |
-
{
|
| 252 |
-
"scale": 20.0,
|
| 253 |
-
"similarity_fct": "cos_sim",
|
| 254 |
-
"gather_across_devices": false
|
| 255 |
-
}
|
| 256 |
-
```
|
| 257 |
-
|
| 258 |
-
### Evaluation Dataset
|
| 259 |
-
|
| 260 |
-
#### LangCache Sentence Pairs (all)
|
| 261 |
-
|
| 262 |
-
* Dataset: [LangCache Sentence Pairs (all)](https://huggingface.co/datasets/redis/langcache-sentencepairs-v2)
|
| 263 |
-
* Size: 132,354 evaluation samples
|
| 264 |
-
* Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
|
| 265 |
-
* Approximate statistics based on the first 1000 samples:
|
| 266 |
-
| | anchor | positive | negative |
|
| 267 |
-
|:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
|
| 268 |
-
| type | string | string | string |
|
| 269 |
-
| details | <ul><li>min: 4 tokens</li><li>mean: 27.17 tokens</li><li>max: 120 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 26.61 tokens</li><li>max: 120 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 19.39 tokens</li><li>max: 64 tokens</li></ul> |
|
| 270 |
-
* Samples:
|
| 271 |
-
| anchor | positive | negative |
|
| 272 |
-
|:----------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------|
|
| 273 |
-
| <code> What high potential jobs are there other than computer science?</code> | <code> What high potential jobs are there other than computer science?</code> | <code>Why IT or Computer Science jobs are being over rated than other Engineering jobs?</code> |
|
| 274 |
-
| <code> Would India ever be able to develop a missile system like S300 or S400 missile?</code> | <code> Would India ever be able to develop a missile system like S300 or S400 missile?</code> | <code>Should India buy the Russian S400 air defence missile system?</code> |
|
| 275 |
-
| <code> water from the faucet is being drunk by a yellow dog</code> | <code>A yellow dog is drinking water from the faucet</code> | <code>Childlessness is low in Eastern European countries.</code> |
|
| 276 |
-
* Loss: <code>losses.ArcFaceInBatchLoss</code> with these parameters:
|
| 277 |
-
```json
|
| 278 |
-
{
|
| 279 |
-
"scale": 20.0,
|
| 280 |
-
"similarity_fct": "cos_sim",
|
| 281 |
-
"gather_across_devices": false
|
| 282 |
-
}
|
| 283 |
-
```
|
| 284 |
-
|
| 285 |
### Training Hyperparameters
|
| 286 |
#### Non-Default Hyperparameters
|
| 287 |
|
| 288 |
- `eval_strategy`: steps
|
| 289 |
-
- `per_device_train_batch_size`:
|
| 290 |
-
- `per_device_eval_batch_size`:
|
| 291 |
- `weight_decay`: 0.001
|
| 292 |
- `adam_beta2`: 0.98
|
| 293 |
- `adam_epsilon`: 1e-06
|
| 294 |
-
- `
|
| 295 |
- `warmup_ratio`: 0.05
|
| 296 |
- `bf16`: True
|
| 297 |
-
- `dataloader_num_workers`: 4
|
| 298 |
-
- `dataloader_prefetch_factor`: 2
|
| 299 |
- `load_best_model_at_end`: True
|
| 300 |
-
- `optim`: stable_adamw
|
| 301 |
- `ddp_find_unused_parameters`: False
|
| 302 |
- `push_to_hub`: True
|
| 303 |
- `hub_model_id`: redis/langcache-embed-v3-mini
|
|
@@ -311,8 +211,8 @@ You can finetune this model on your own dataset.
|
|
| 311 |
- `do_predict`: False
|
| 312 |
- `eval_strategy`: steps
|
| 313 |
- `prediction_loss_only`: True
|
| 314 |
-
- `per_device_train_batch_size`:
|
| 315 |
-
- `per_device_eval_batch_size`:
|
| 316 |
- `per_gpu_train_batch_size`: None
|
| 317 |
- `per_gpu_eval_batch_size`: None
|
| 318 |
- `gradient_accumulation_steps`: 1
|
|
@@ -324,8 +224,8 @@ You can finetune this model on your own dataset.
|
|
| 324 |
- `adam_beta2`: 0.98
|
| 325 |
- `adam_epsilon`: 1e-06
|
| 326 |
- `max_grad_norm`: 1.0
|
| 327 |
-
- `num_train_epochs`:
|
| 328 |
-
- `max_steps`:
|
| 329 |
- `lr_scheduler_type`: linear
|
| 330 |
- `lr_scheduler_kwargs`: {}
|
| 331 |
- `warmup_ratio`: 0.05
|
|
@@ -357,8 +257,8 @@ You can finetune this model on your own dataset.
|
|
| 357 |
- `tpu_metrics_debug`: False
|
| 358 |
- `debug`: []
|
| 359 |
- `dataloader_drop_last`: False
|
| 360 |
-
- `dataloader_num_workers`:
|
| 361 |
-
- `dataloader_prefetch_factor`:
|
| 362 |
- `past_index`: -1
|
| 363 |
- `disable_tqdm`: False
|
| 364 |
- `remove_unused_columns`: True
|
|
@@ -373,7 +273,7 @@ You can finetune this model on your own dataset.
|
|
| 373 |
- `parallelism_config`: None
|
| 374 |
- `deepspeed`: None
|
| 375 |
- `label_smoothing_factor`: 0.0
|
| 376 |
-
- `optim`:
|
| 377 |
- `optim_args`: None
|
| 378 |
- `adafactor`: False
|
| 379 |
- `group_by_length`: False
|
|
@@ -430,9 +330,9 @@ You can finetune this model on your own dataset.
|
|
| 430 |
</details>
|
| 431 |
|
| 432 |
### Training Logs
|
| 433 |
-
| Epoch | Step |
|
| 434 |
-
|
| 435 |
-
| 0 | 0 | 0.
|
| 436 |
|
| 437 |
|
| 438 |
### Framework Versions
|
|
|
|
| 12 |
- retrieval
|
| 13 |
- reranking
|
| 14 |
- generated_from_trainer
|
|
|
|
| 15 |
- loss:ArcFaceInBatchLoss
|
| 16 |
+
base_model: thenlper/gte-small
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
pipeline_tag: sentence-similarity
|
| 18 |
library_name: sentence-transformers
|
| 19 |
metrics:
|
|
|
|
| 36 |
type: test
|
| 37 |
metrics:
|
| 38 |
- type: cosine_accuracy@1
|
| 39 |
+
value: 0.548650317572336
|
| 40 |
name: Cosine Accuracy@1
|
| 41 |
- type: cosine_precision@1
|
| 42 |
+
value: 0.548650317572336
|
| 43 |
name: Cosine Precision@1
|
| 44 |
- type: cosine_recall@1
|
| 45 |
+
value: 0.529780177773297
|
| 46 |
name: Cosine Recall@1
|
| 47 |
- type: cosine_ndcg@10
|
| 48 |
+
value: 0.7467559051152127
|
| 49 |
name: Cosine Ndcg@10
|
| 50 |
- type: cosine_mrr@1
|
| 51 |
+
value: 0.548650317572336
|
| 52 |
name: Cosine Mrr@1
|
| 53 |
- type: cosine_map@100
|
| 54 |
+
value: 0.691192638604471
|
| 55 |
name: Cosine Map@100
|
| 56 |
- type: cosine_auc_precision_cache_hit_ratio
|
| 57 |
+
value: 0.31983377806645374
|
| 58 |
name: Cosine Auc Precision Cache Hit Ratio
|
| 59 |
- type: cosine_auc_similarity_distribution
|
| 60 |
+
value: 0.15293509382911363
|
| 61 |
name: Cosine Auc Similarity Distribution
|
| 62 |
---
|
| 63 |
|
| 64 |
# Redis fine-tuned BiEncoder model for semantic caching on LangCache
|
| 65 |
|
| 66 |
+
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [thenlper/gte-small](https://huggingface.co/thenlper/gte-small). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for sentence pair similarity.
|
| 67 |
|
| 68 |
## Model Details
|
| 69 |
|
| 70 |
### Model Description
|
| 71 |
- **Model Type:** Sentence Transformer
|
| 72 |
+
- **Base model:** [thenlper/gte-small](https://huggingface.co/thenlper/gte-small) <!-- at revision 17e1f347d17fe144873b1201da91788898c639cd -->
|
| 73 |
+
- **Maximum Sequence Length:** 64 tokens
|
| 74 |
- **Output Dimensionality:** 384 dimensions
|
| 75 |
- **Similarity Function:** Cosine Similarity
|
| 76 |
+
<!-- - **Training Dataset:** Unknown -->
|
|
|
|
| 77 |
- **Language:** en
|
| 78 |
- **License:** apache-2.0
|
| 79 |
|
|
|
|
| 87 |
|
| 88 |
```
|
| 89 |
SentenceTransformer(
|
| 90 |
+
(0): Transformer({'max_seq_length': 64, 'do_lower_case': False, 'architecture': 'BertModel'})
|
| 91 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
| 92 |
(2): Normalize()
|
| 93 |
)
|
|
|
|
| 111 |
model = SentenceTransformer("redis/langcache-embed-v3-mini")
|
| 112 |
# Run inference
|
| 113 |
sentences = [
|
| 114 |
+
'The weather is lovely today.',
|
| 115 |
+
"It's so sunny outside!",
|
| 116 |
+
'He drove to the stadium.',
|
| 117 |
]
|
| 118 |
embeddings = model.encode(sentences)
|
| 119 |
print(embeddings.shape)
|
|
|
|
| 122 |
# Get the similarity scores for the embeddings
|
| 123 |
similarities = model.similarity(embeddings, embeddings)
|
| 124 |
print(similarities)
|
| 125 |
+
# tensor([[0.9999, 0.9036, 0.7702],
|
| 126 |
+
# [0.9036, 1.0000, 0.7837],
|
| 127 |
+
# [0.7702, 0.7837, 1.0000]])
|
| 128 |
```
|
| 129 |
|
| 130 |
<!--
|
|
|
|
| 162 |
|
| 163 |
| Metric | Value |
|
| 164 |
|:-------------------------------------|:-----------|
|
| 165 |
+
| cosine_accuracy@1 | 0.5487 |
|
| 166 |
+
| cosine_precision@1 | 0.5487 |
|
| 167 |
+
| cosine_recall@1 | 0.5298 |
|
| 168 |
+
| **cosine_ndcg@10** | **0.7468** |
|
| 169 |
+
| cosine_mrr@1 | 0.5487 |
|
| 170 |
+
| cosine_map@100 | 0.6912 |
|
| 171 |
+
| cosine_auc_precision_cache_hit_ratio | 0.3198 |
|
| 172 |
+
| cosine_auc_similarity_distribution | 0.1529 |
|
| 173 |
|
| 174 |
<!--
|
| 175 |
## Bias, Risks and Limitations
|
|
|
|
| 185 |
|
| 186 |
## Training Details
|
| 187 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 188 |
### Training Hyperparameters
|
| 189 |
#### Non-Default Hyperparameters
|
| 190 |
|
| 191 |
- `eval_strategy`: steps
|
| 192 |
+
- `per_device_train_batch_size`: 512
|
| 193 |
+
- `per_device_eval_batch_size`: 512
|
| 194 |
- `weight_decay`: 0.001
|
| 195 |
- `adam_beta2`: 0.98
|
| 196 |
- `adam_epsilon`: 1e-06
|
| 197 |
+
- `max_steps`: 100000
|
| 198 |
- `warmup_ratio`: 0.05
|
| 199 |
- `bf16`: True
|
|
|
|
|
|
|
| 200 |
- `load_best_model_at_end`: True
|
|
|
|
| 201 |
- `ddp_find_unused_parameters`: False
|
| 202 |
- `push_to_hub`: True
|
| 203 |
- `hub_model_id`: redis/langcache-embed-v3-mini
|
|
|
|
| 211 |
- `do_predict`: False
|
| 212 |
- `eval_strategy`: steps
|
| 213 |
- `prediction_loss_only`: True
|
| 214 |
+
- `per_device_train_batch_size`: 512
|
| 215 |
+
- `per_device_eval_batch_size`: 512
|
| 216 |
- `per_gpu_train_batch_size`: None
|
| 217 |
- `per_gpu_eval_batch_size`: None
|
| 218 |
- `gradient_accumulation_steps`: 1
|
|
|
|
| 224 |
- `adam_beta2`: 0.98
|
| 225 |
- `adam_epsilon`: 1e-06
|
| 226 |
- `max_grad_norm`: 1.0
|
| 227 |
+
- `num_train_epochs`: 3.0
|
| 228 |
+
- `max_steps`: 100000
|
| 229 |
- `lr_scheduler_type`: linear
|
| 230 |
- `lr_scheduler_kwargs`: {}
|
| 231 |
- `warmup_ratio`: 0.05
|
|
|
|
| 257 |
- `tpu_metrics_debug`: False
|
| 258 |
- `debug`: []
|
| 259 |
- `dataloader_drop_last`: False
|
| 260 |
+
- `dataloader_num_workers`: 0
|
| 261 |
+
- `dataloader_prefetch_factor`: None
|
| 262 |
- `past_index`: -1
|
| 263 |
- `disable_tqdm`: False
|
| 264 |
- `remove_unused_columns`: True
|
|
|
|
| 273 |
- `parallelism_config`: None
|
| 274 |
- `deepspeed`: None
|
| 275 |
- `label_smoothing_factor`: 0.0
|
| 276 |
+
- `optim`: adamw_torch_fused
|
| 277 |
- `optim_args`: None
|
| 278 |
- `adafactor`: False
|
| 279 |
- `group_by_length`: False
|
|
|
|
| 330 |
</details>
|
| 331 |
|
| 332 |
### Training Logs
|
| 333 |
+
| Epoch | Step | test_cosine_ndcg@10 |
|
| 334 |
+
|:-----:|:----:|:-------------------:|
|
| 335 |
+
| 0 | 0 | 0.7468 |
|
| 336 |
|
| 337 |
|
| 338 |
### Framework Versions
|