Training in progress, epoch 4, checkpoint

Browse files

Files changed (8) hide show

last-checkpoint/README.md +16 -17
last-checkpoint/model.safetensors +1 -1
last-checkpoint/optimizer.pt +1 -1
last-checkpoint/rng_state.pth +1 -1
last-checkpoint/scaler.pt +1 -1
last-checkpoint/scheduler.pt +1 -1
last-checkpoint/trainer_state.json +21 -5
last-checkpoint/training_args.bin +1 -1

last-checkpoint/README.md CHANGED Viewed

@@ -7,7 +7,6 @@ tags:
 - generated_from_trainer
 - dataset_size:291522
 - loss:MultipleNegativesSymmetricRankingLoss
-base_model: sentence-transformers/all-MiniLM-L6-v2
 widget:
 - source_sentence: cream 21 baby oil with almond oil
   sentences:
@@ -41,7 +40,7 @@ library_name: sentence-transformers
 metrics:
 - cosine_accuracy
 model-index:
-- name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
   results:
   - task:
       type: triplet
@@ -51,19 +50,19 @@ model-index:
       type: unknown
     metrics:
     - type: cosine_accuracy
-      value: 0.9412940740585327
       name: Cosine Accuracy
 ---
-# SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
-This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
-- **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
 - **Maximum Sequence Length:** 256 tokens
 - **Output Dimensionality:** 384 dimensions
 - **Similarity Function:** Cosine Similarity
@@ -116,9 +115,9 @@ print(embeddings.shape)
 # Get the similarity scores for the embeddings
 similarities = model.similarity(embeddings, embeddings)
 print(similarities)
-# tensor([[1.0000, 0.7198, 0.3823],
-#         [0.7198, 1.0000, 0.3737],
-#         [0.3823, 0.3737, 1.0000]])
 ```
 <!--
@@ -155,7 +154,7 @@ You can finetune this model on your own dataset.
 | Metric              | Value      |
 |:--------------------|:-----------|
-| **cosine_accuracy** | **0.9413** |
 <!--
 ## Bias, Risks and Limitations
@@ -226,10 +225,11 @@ You can finetune this model on your own dataset.
 ### Training Hyperparameters
 #### Non-Default Hyperparameters
-- `eval_strategy`: steps
 - `per_device_train_batch_size`: 128
 - `per_device_eval_batch_size`: 128
 - `weight_decay`: 0.001
 - `warmup_steps`: 2733
 - `fp16`: True
 - `dataloader_num_workers`: 2
@@ -245,7 +245,7 @@ You can finetune this model on your own dataset.
 - `overwrite_output_dir`: False
 - `do_predict`: False
-- `eval_strategy`: steps
 - `prediction_loss_only`: True
 - `per_device_train_batch_size`: 128
 - `per_device_eval_batch_size`: 128
@@ -260,7 +260,7 @@ You can finetune this model on your own dataset.
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
 - `max_grad_norm`: 1.0
-- `num_train_epochs`: 3
 - `max_steps`: -1
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
@@ -364,10 +364,9 @@ You can finetune this model on your own dataset.
 </details>
 ### Training Logs
-| Epoch  | Step | Training Loss | Validation Loss | cosine_accuracy |
-|:------:|:----:|:-------------:|:---------------:|:---------------:|
-| 0.0004 | 1    | 5.3655        | -               | -               |
-| 2.1949 | 5000 | 2.1423        | 0.7694          | 0.9413          |
 ### Framework Versions

 - generated_from_trainer
 - dataset_size:291522
 - loss:MultipleNegativesSymmetricRankingLoss
 widget:
 - source_sentence: cream 21 baby oil with almond oil
   sentences:
 metrics:
 - cosine_accuracy
 model-index:
+- name: SentenceTransformer
   results:
   - task:
       type: triplet
       type: unknown
     metrics:
     - type: cosine_accuracy
+      value: 0.9375065565109253
       name: Cosine Accuracy
 ---
+# SentenceTransformer
+This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
+<!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
 - **Maximum Sequence Length:** 256 tokens
 - **Output Dimensionality:** 384 dimensions
 - **Similarity Function:** Cosine Similarity
 # Get the similarity scores for the embeddings
 similarities = model.similarity(embeddings, embeddings)
 print(similarities)
+# tensor([[1.0000, 0.6993, 0.3841],
+#         [0.6993, 1.0000, 0.3711],
+#         [0.3841, 0.3711, 1.0000]])
 ```
 <!--
 | Metric              | Value      |
 |:--------------------|:-----------|
+| **cosine_accuracy** | **0.9375** |
 <!--
 ## Bias, Risks and Limitations
 ### Training Hyperparameters
 #### Non-Default Hyperparameters
+- `eval_strategy`: epoch
 - `per_device_train_batch_size`: 128
 - `per_device_eval_batch_size`: 128
 - `weight_decay`: 0.001
+- `num_train_epochs`: 6
 - `warmup_steps`: 2733
 - `fp16`: True
 - `dataloader_num_workers`: 2
 - `overwrite_output_dir`: False
 - `do_predict`: False
+- `eval_strategy`: epoch
 - `prediction_loss_only`: True
 - `per_device_train_batch_size`: 128
 - `per_device_eval_batch_size`: 128
 - `adam_beta2`: 0.999
 - `adam_epsilon`: 1e-08
 - `max_grad_norm`: 1.0
+- `num_train_epochs`: 6
 - `max_steps`: -1
 - `lr_scheduler_type`: linear
 - `lr_scheduler_kwargs`: {}
 </details>
 ### Training Logs
+| Epoch | Step | Training Loss | Validation Loss | cosine_accuracy |
+|:-----:|:----:|:-------------:|:---------------:|:---------------:|
+| 4.0   | 9112 | 1.4316        | 0.7736          | 0.9375          |
 ### Framework Versions

last-checkpoint/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9e46149fd09a9867b9acad65acdb71570057411c6a87b5b28cc4922225edf94c
 size 90864192

 version https://git-lfs.github.com/spec/v1
+oid sha256:49d47e67fd64444d1bef9079ac3e87fe40f99c1e431014e043dadc9c1c6fcdd1
 size 90864192

last-checkpoint/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f309c0f49859e92f45e91d15d010c986b3039d5aee5aa13a7a6a8b652636cbd3
 size 180607738

 version https://git-lfs.github.com/spec/v1
+oid sha256:8371d259eab4397e20808c5f3707bcb677999ede71ca90832bb56e58cfdb3428
 size 180607738

last-checkpoint/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d11ae26ad0553937353377362dcdfdfc64b495a56e520ee9d5cafa528daa8602
 size 14244

 version https://git-lfs.github.com/spec/v1
+oid sha256:ae9a3cbcca6bf743673d6e3a369dedc99ea1f47c1765d50c994934bd3af201c9
 size 14244

last-checkpoint/scaler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:38e75fca916f8178bf9cd33054df9c31b71689bd5bddb2e11917964dcae00b45
 size 988

 version https://git-lfs.github.com/spec/v1
+oid sha256:5428823afa033ffc8f182c048fb98e8b38691e01883f6e183389a94595d29dfd
 size 988

last-checkpoint/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6b8ed2557d72b721bbe933588bb84b4e8fd67437924faa2318d545f860f51f41
 size 1064

 version https://git-lfs.github.com/spec/v1
+oid sha256:17fc7dcbf4e82e93b77a6ea394c88d4c3b907333ba1aa74d5f235a8d4390a6b1
 size 1064

last-checkpoint/trainer_state.json CHANGED Viewed

@@ -2,9 +2,9 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 3.0,
   "eval_steps": 5000,
-  "global_step": 6834,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
@@ -31,12 +31,28 @@
       "eval_samples_per_second": 292.451,
       "eval_steps_per_second": 2.308,
       "step": 5000
     }
   ],
   "logging_steps": 5000,
-  "max_steps": 6834,
   "num_input_tokens_seen": 0,
-  "num_train_epochs": 3,
   "save_steps": 500,
   "stateful_callbacks": {
     "TrainerControl": {
@@ -45,7 +61,7 @@
         "should_evaluate": false,
         "should_log": false,
         "should_save": true,
-        "should_training_stop": true
       },
       "attributes": {}
     }

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 4.0,
   "eval_steps": 5000,
+  "global_step": 9112,
   "is_hyper_param_search": false,
   "is_local_process_zero": true,
   "is_world_process_zero": true,
       "eval_samples_per_second": 292.451,
       "eval_steps_per_second": 2.308,
       "step": 5000
+    },
+    {
+      "epoch": 4.0,
+      "grad_norm": 10.984454154968262,
+      "learning_rate": 2.085048010973937e-05,
+      "loss": 1.4316,
+      "step": 9112
+    },
+    {
+      "epoch": 4.0,
+      "eval_cosine_accuracy": 0.9375065565109253,
+      "eval_loss": 0.7735732793807983,
+      "eval_runtime": 32.1541,
+      "eval_samples_per_second": 295.608,
+      "eval_steps_per_second": 2.333,
+      "step": 9112
     }
   ],
   "logging_steps": 5000,
+  "max_steps": 13668,
   "num_input_tokens_seen": 0,
+  "num_train_epochs": 6,
   "save_steps": 500,
   "stateful_callbacks": {
     "TrainerControl": {
         "should_evaluate": false,
         "should_log": false,
         "should_save": true,
+        "should_training_stop": false
       },
       "attributes": {}
     }

last-checkpoint/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8f7649e73631f57a3718416585c15049b225d54f1e7ef6d27a16fe580479258d
 size 5752

 version https://git-lfs.github.com/spec/v1
+oid sha256:daf7bfc66086ded6020bb06775e66282df8536a53ff24f583e60602a29fa87f3
 size 5752