LamaDiab commited on
Commit
1315e2e
·
verified ·
1 Parent(s): 851b436

Training in progress, epoch 4, checkpoint

Browse files
last-checkpoint/README.md CHANGED
@@ -7,7 +7,6 @@ tags:
7
  - generated_from_trainer
8
  - dataset_size:291522
9
  - loss:MultipleNegativesSymmetricRankingLoss
10
- base_model: sentence-transformers/all-MiniLM-L6-v2
11
  widget:
12
  - source_sentence: cream 21 baby oil with almond oil
13
  sentences:
@@ -41,7 +40,7 @@ library_name: sentence-transformers
41
  metrics:
42
  - cosine_accuracy
43
  model-index:
44
- - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
45
  results:
46
  - task:
47
  type: triplet
@@ -51,19 +50,19 @@ model-index:
51
  type: unknown
52
  metrics:
53
  - type: cosine_accuracy
54
- value: 0.9412940740585327
55
  name: Cosine Accuracy
56
  ---
57
 
58
- # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
59
 
60
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
61
 
62
  ## Model Details
63
 
64
  ### Model Description
65
  - **Model Type:** Sentence Transformer
66
- - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
67
  - **Maximum Sequence Length:** 256 tokens
68
  - **Output Dimensionality:** 384 dimensions
69
  - **Similarity Function:** Cosine Similarity
@@ -116,9 +115,9 @@ print(embeddings.shape)
116
  # Get the similarity scores for the embeddings
117
  similarities = model.similarity(embeddings, embeddings)
118
  print(similarities)
119
- # tensor([[1.0000, 0.7198, 0.3823],
120
- # [0.7198, 1.0000, 0.3737],
121
- # [0.3823, 0.3737, 1.0000]])
122
  ```
123
 
124
  <!--
@@ -155,7 +154,7 @@ You can finetune this model on your own dataset.
155
 
156
  | Metric | Value |
157
  |:--------------------|:-----------|
158
- | **cosine_accuracy** | **0.9413** |
159
 
160
  <!--
161
  ## Bias, Risks and Limitations
@@ -226,10 +225,11 @@ You can finetune this model on your own dataset.
226
  ### Training Hyperparameters
227
  #### Non-Default Hyperparameters
228
 
229
- - `eval_strategy`: steps
230
  - `per_device_train_batch_size`: 128
231
  - `per_device_eval_batch_size`: 128
232
  - `weight_decay`: 0.001
 
233
  - `warmup_steps`: 2733
234
  - `fp16`: True
235
  - `dataloader_num_workers`: 2
@@ -245,7 +245,7 @@ You can finetune this model on your own dataset.
245
 
246
  - `overwrite_output_dir`: False
247
  - `do_predict`: False
248
- - `eval_strategy`: steps
249
  - `prediction_loss_only`: True
250
  - `per_device_train_batch_size`: 128
251
  - `per_device_eval_batch_size`: 128
@@ -260,7 +260,7 @@ You can finetune this model on your own dataset.
260
  - `adam_beta2`: 0.999
261
  - `adam_epsilon`: 1e-08
262
  - `max_grad_norm`: 1.0
263
- - `num_train_epochs`: 3
264
  - `max_steps`: -1
265
  - `lr_scheduler_type`: linear
266
  - `lr_scheduler_kwargs`: {}
@@ -364,10 +364,9 @@ You can finetune this model on your own dataset.
364
  </details>
365
 
366
  ### Training Logs
367
- | Epoch | Step | Training Loss | Validation Loss | cosine_accuracy |
368
- |:------:|:----:|:-------------:|:---------------:|:---------------:|
369
- | 0.0004 | 1 | 5.3655 | - | - |
370
- | 2.1949 | 5000 | 2.1423 | 0.7694 | 0.9413 |
371
 
372
 
373
  ### Framework Versions
 
7
  - generated_from_trainer
8
  - dataset_size:291522
9
  - loss:MultipleNegativesSymmetricRankingLoss
 
10
  widget:
11
  - source_sentence: cream 21 baby oil with almond oil
12
  sentences:
 
40
  metrics:
41
  - cosine_accuracy
42
  model-index:
43
+ - name: SentenceTransformer
44
  results:
45
  - task:
46
  type: triplet
 
50
  type: unknown
51
  metrics:
52
  - type: cosine_accuracy
53
+ value: 0.9375065565109253
54
  name: Cosine Accuracy
55
  ---
56
 
57
+ # SentenceTransformer
58
 
59
+ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
60
 
61
  ## Model Details
62
 
63
  ### Model Description
64
  - **Model Type:** Sentence Transformer
65
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
66
  - **Maximum Sequence Length:** 256 tokens
67
  - **Output Dimensionality:** 384 dimensions
68
  - **Similarity Function:** Cosine Similarity
 
115
  # Get the similarity scores for the embeddings
116
  similarities = model.similarity(embeddings, embeddings)
117
  print(similarities)
118
+ # tensor([[1.0000, 0.6993, 0.3841],
119
+ # [0.6993, 1.0000, 0.3711],
120
+ # [0.3841, 0.3711, 1.0000]])
121
  ```
122
 
123
  <!--
 
154
 
155
  | Metric | Value |
156
  |:--------------------|:-----------|
157
+ | **cosine_accuracy** | **0.9375** |
158
 
159
  <!--
160
  ## Bias, Risks and Limitations
 
225
  ### Training Hyperparameters
226
  #### Non-Default Hyperparameters
227
 
228
+ - `eval_strategy`: epoch
229
  - `per_device_train_batch_size`: 128
230
  - `per_device_eval_batch_size`: 128
231
  - `weight_decay`: 0.001
232
+ - `num_train_epochs`: 6
233
  - `warmup_steps`: 2733
234
  - `fp16`: True
235
  - `dataloader_num_workers`: 2
 
245
 
246
  - `overwrite_output_dir`: False
247
  - `do_predict`: False
248
+ - `eval_strategy`: epoch
249
  - `prediction_loss_only`: True
250
  - `per_device_train_batch_size`: 128
251
  - `per_device_eval_batch_size`: 128
 
260
  - `adam_beta2`: 0.999
261
  - `adam_epsilon`: 1e-08
262
  - `max_grad_norm`: 1.0
263
+ - `num_train_epochs`: 6
264
  - `max_steps`: -1
265
  - `lr_scheduler_type`: linear
266
  - `lr_scheduler_kwargs`: {}
 
364
  </details>
365
 
366
  ### Training Logs
367
+ | Epoch | Step | Training Loss | Validation Loss | cosine_accuracy |
368
+ |:-----:|:----:|:-------------:|:---------------:|:---------------:|
369
+ | 4.0 | 9112 | 1.4316 | 0.7736 | 0.9375 |
 
370
 
371
 
372
  ### Framework Versions
last-checkpoint/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9e46149fd09a9867b9acad65acdb71570057411c6a87b5b28cc4922225edf94c
3
  size 90864192
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49d47e67fd64444d1bef9079ac3e87fe40f99c1e431014e043dadc9c1c6fcdd1
3
  size 90864192
last-checkpoint/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f309c0f49859e92f45e91d15d010c986b3039d5aee5aa13a7a6a8b652636cbd3
3
  size 180607738
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8371d259eab4397e20808c5f3707bcb677999ede71ca90832bb56e58cfdb3428
3
  size 180607738
last-checkpoint/rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d11ae26ad0553937353377362dcdfdfc64b495a56e520ee9d5cafa528daa8602
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae9a3cbcca6bf743673d6e3a369dedc99ea1f47c1765d50c994934bd3af201c9
3
  size 14244
last-checkpoint/scaler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:38e75fca916f8178bf9cd33054df9c31b71689bd5bddb2e11917964dcae00b45
3
  size 988
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5428823afa033ffc8f182c048fb98e8b38691e01883f6e183389a94595d29dfd
3
  size 988
last-checkpoint/scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6b8ed2557d72b721bbe933588bb84b4e8fd67437924faa2318d545f860f51f41
3
  size 1064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17fc7dcbf4e82e93b77a6ea394c88d4c3b907333ba1aa74d5f235a8d4390a6b1
3
  size 1064
last-checkpoint/trainer_state.json CHANGED
@@ -2,9 +2,9 @@
2
  "best_global_step": null,
3
  "best_metric": null,
4
  "best_model_checkpoint": null,
5
- "epoch": 3.0,
6
  "eval_steps": 5000,
7
- "global_step": 6834,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
@@ -31,12 +31,28 @@
31
  "eval_samples_per_second": 292.451,
32
  "eval_steps_per_second": 2.308,
33
  "step": 5000
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  }
35
  ],
36
  "logging_steps": 5000,
37
- "max_steps": 6834,
38
  "num_input_tokens_seen": 0,
39
- "num_train_epochs": 3,
40
  "save_steps": 500,
41
  "stateful_callbacks": {
42
  "TrainerControl": {
@@ -45,7 +61,7 @@
45
  "should_evaluate": false,
46
  "should_log": false,
47
  "should_save": true,
48
- "should_training_stop": true
49
  },
50
  "attributes": {}
51
  }
 
2
  "best_global_step": null,
3
  "best_metric": null,
4
  "best_model_checkpoint": null,
5
+ "epoch": 4.0,
6
  "eval_steps": 5000,
7
+ "global_step": 9112,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
 
31
  "eval_samples_per_second": 292.451,
32
  "eval_steps_per_second": 2.308,
33
  "step": 5000
34
+ },
35
+ {
36
+ "epoch": 4.0,
37
+ "grad_norm": 10.984454154968262,
38
+ "learning_rate": 2.085048010973937e-05,
39
+ "loss": 1.4316,
40
+ "step": 9112
41
+ },
42
+ {
43
+ "epoch": 4.0,
44
+ "eval_cosine_accuracy": 0.9375065565109253,
45
+ "eval_loss": 0.7735732793807983,
46
+ "eval_runtime": 32.1541,
47
+ "eval_samples_per_second": 295.608,
48
+ "eval_steps_per_second": 2.333,
49
+ "step": 9112
50
  }
51
  ],
52
  "logging_steps": 5000,
53
+ "max_steps": 13668,
54
  "num_input_tokens_seen": 0,
55
+ "num_train_epochs": 6,
56
  "save_steps": 500,
57
  "stateful_callbacks": {
58
  "TrainerControl": {
 
61
  "should_evaluate": false,
62
  "should_log": false,
63
  "should_save": true,
64
+ "should_training_stop": false
65
  },
66
  "attributes": {}
67
  }
last-checkpoint/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8f7649e73631f57a3718416585c15049b225d54f1e7ef6d27a16fe580479258d
3
  size 5752
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:daf7bfc66086ded6020bb06775e66282df8536a53ff24f583e60602a29fa87f3
3
  size 5752