Training in progress, epoch 1, checkpoint

Browse files

Files changed (11) hide show

checkpoint-2363/README.md +17 -17
checkpoint-2363/model.safetensors +1 -1
checkpoint-2363/modules.json +0 -6
checkpoint-2363/optimizer.pt +1 -1
checkpoint-2363/scaler.pt +1 -1
checkpoint-2363/scheduler.pt +1 -1
checkpoint-2363/sentence_bert_config.json +1 -1
checkpoint-2363/tokenizer.json +1 -1
checkpoint-2363/tokenizer_config.json +1 -8
checkpoint-2363/trainer_state.json +18 -18
checkpoint-2363/training_args.bin +1 -1

checkpoint-2363/README.md CHANGED Viewed

@@ -7,7 +7,7 @@ tags:
 - generated_from_trainer
 - dataset_size:604740
 - loss:MultipleNegativesSymmetricRankingLoss
-base_model: sentence-transformers/all-MiniLM-L6-v2
 widget:
 - source_sentence: casa chandelier
   sentences:
@@ -39,7 +39,7 @@ library_name: sentence-transformers
 metrics:
 - cosine_accuracy
 model-index:
-- name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
   results:
   - task:
       type: triplet
@@ -49,20 +49,20 @@ model-index:
       type: unknown
     metrics:
     - type: cosine_accuracy
-      value: 0.9763382077217102
       name: Cosine Accuracy
 ---
-# SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
-This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
-- **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
-- **Maximum Sequence Length:** 256 tokens
 - **Output Dimensionality:** 384 dimensions
 - **Similarity Function:** Cosine Similarity
 <!-- - **Training Dataset:** Unknown -->
@@ -79,9 +79,8 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [s
 ```
 SentenceTransformer(
-  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
   (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
-  (2): Normalize()
 )
 ```
@@ -114,9 +113,9 @@ print(embeddings.shape)
 # Get the similarity scores for the embeddings
 similarities = model.similarity(embeddings, embeddings)
 print(similarities)
-# tensor([[1.0000, 0.6405, 0.2428],
-#         [0.6405, 1.0000, 0.2613],
-#         [0.2428, 0.2613, 1.0000]])
 ```
 <!--
@@ -153,7 +152,7 @@ You can finetune this model on your own dataset.
 | Metric              | Value      |
 |:--------------------|:-----------|
-| **cosine_accuracy** | **0.9763** |
 <!--
 ## Bias, Risks and Limitations
@@ -227,6 +226,7 @@ You can finetune this model on your own dataset.
 - `eval_strategy`: steps
 - `per_device_train_batch_size`: 256
 - `per_device_eval_batch_size`: 256
 - `weight_decay`: 0.01
 - `warmup_ratio`: 0.1
 - `fp16`: True
@@ -251,7 +251,7 @@ You can finetune this model on your own dataset.
 - `gradient_accumulation_steps`: 1
 - `eval_accumulation_steps`: None
 - `torch_empty_cache_steps`: None
-- `learning_rate`: 5e-05
 - `weight_decay`: 0.01
 - `adam_beta1`: 0.9
 - `adam_beta2`: 0.999
@@ -363,9 +363,9 @@ You can finetune this model on your own dataset.
 ### Training Logs
 | Epoch  | Step | Training Loss | Validation Loss | cosine_accuracy |
 |:------:|:----:|:-------------:|:---------------:|:---------------:|
-| 0.0004 | 1    | 3.5779        | -               | -               |
-| 0.4232 | 1000 | 2.3665        | 1.3055          | 0.9702          |
-| 0.8464 | 2000 | 1.6303        | 1.2463          | 0.9763          |
 ### Framework Versions

 - generated_from_trainer
 - dataset_size:604740
 - loss:MultipleNegativesSymmetricRankingLoss
+base_model: sentence-transformers/msmarco-MiniLM-L6-v3
 widget:
 - source_sentence: casa chandelier
   sentences:
 metrics:
 - cosine_accuracy
 model-index:
+- name: SentenceTransformer based on sentence-transformers/msmarco-MiniLM-L6-v3
   results:
   - task:
       type: triplet
       type: unknown
     metrics:
     - type: cosine_accuracy
+      value: 0.9670838117599487
       name: Cosine Accuracy
 ---
+# SentenceTransformer based on sentence-transformers/msmarco-MiniLM-L6-v3
+This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/msmarco-MiniLM-L6-v3](https://huggingface.co/sentence-transformers/msmarco-MiniLM-L6-v3). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
+- **Base model:** [sentence-transformers/msmarco-MiniLM-L6-v3](https://huggingface.co/sentence-transformers/msmarco-MiniLM-L6-v3) <!-- at revision fea93b3df3924e5649a4e322c345f951239d2c13 -->
+- **Maximum Sequence Length:** 512 tokens
 - **Output Dimensionality:** 384 dimensions
 - **Similarity Function:** Cosine Similarity
 <!-- - **Training Dataset:** Unknown -->
 ```
 SentenceTransformer(
+  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
   (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
 )
 ```
 # Get the similarity scores for the embeddings
 similarities = model.similarity(embeddings, embeddings)
 print(similarities)
+# tensor([[1.0000, 0.6259, 0.2012],
+#         [0.6259, 1.0000, 0.3276],
+#         [0.2012, 0.3276, 1.0000]])
 ```
 <!--
 | Metric              | Value      |
 |:--------------------|:-----------|
+| **cosine_accuracy** | **0.9671** |
 <!--
 ## Bias, Risks and Limitations
 - `eval_strategy`: steps
 - `per_device_train_batch_size`: 256
 - `per_device_eval_batch_size`: 256
+- `learning_rate`: 3e-05
 - `weight_decay`: 0.01
 - `warmup_ratio`: 0.1
 - `fp16`: True
 - `gradient_accumulation_steps`: 1
 - `eval_accumulation_steps`: None
 - `torch_empty_cache_steps`: None
+- `learning_rate`: 3e-05
 - `weight_decay`: 0.01
 - `adam_beta1`: 0.9
 - `adam_beta2`: 0.999
 ### Training Logs
 | Epoch  | Step | Training Loss | Validation Loss | cosine_accuracy |
 |:------:|:----:|:-------------:|:---------------:|:---------------:|
+| 0.0004 | 1    | 3.9633        | -               | -               |
+| 0.4232 | 1000 | 2.8713        | 1.4648          | 0.9557          |
+| 0.8464 | 2000 | 1.9927        | 1.3537          | 0.9671          |
 ### Framework Versions

checkpoint-2363/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0bfd81d3a99358b96086f8c4c08e0a612fe6fa4b2a852053806dfbb70e36327b
 size 90864192

 version https://git-lfs.github.com/spec/v1
+oid sha256:ef37ee5f36232d507045bb9c1c0ac1259ddd6f4aef5331faf8e4cf01d0389b50
 size 90864192

checkpoint-2363/modules.json CHANGED Viewed

@@ -10,11 +10,5 @@
     "name": "1",
     "path": "1_Pooling",
     "type": "sentence_transformers.models.Pooling"
-  },
-  {
-    "idx": 2,
-    "name": "2",
-    "path": "2_Normalize",
-    "type": "sentence_transformers.models.Normalize"
   }
 ]

     "name": "1",
     "path": "1_Pooling",
     "type": "sentence_transformers.models.Pooling"
   }
 ]

checkpoint-2363/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7710f7a38919249222975ea38d3eaf510d44114e9986d2265947c0ff1df64b51
 size 180607738

 version https://git-lfs.github.com/spec/v1
+oid sha256:1a717c6db455cabd70202220e3073dbfcd5aea9d07808176297b1c36e1dccd43
 size 180607738

checkpoint-2363/scaler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:cd257ee7a28cefc72dc197f54c65ecb78ed1a580549188a704acf026882e53cd
 size 988

 version https://git-lfs.github.com/spec/v1
+oid sha256:7ad395fbb9b78c24afd03f9d7c78851c2c0b1c7e115626d0420813c72da60efd
 size 988

checkpoint-2363/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:02702bc659d0a8eb2b27aa12a195144ffda7a1f0c1231777e3c9f2716ff76df9
 size 1064

 version https://git-lfs.github.com/spec/v1
+oid sha256:fe8a14a9a9ed42c3cf9c03994730fc6588e84252cf8181264e6dfeb40adff13d
 size 1064

checkpoint-2363/sentence_bert_config.json CHANGED Viewed

@@ -1,4 +1,4 @@
 {
-    "max_seq_length": 256,
     "do_lower_case": false
 }

 {
+    "max_seq_length": 512,
     "do_lower_case": false
 }

checkpoint-2363/tokenizer.json CHANGED Viewed

@@ -2,7 +2,7 @@
   "version": "1.0",
   "truncation": {
     "direction": "Right",
-    "max_length": 256,
     "strategy": "LongestFirst",
     "stride": 0
   },

   "version": "1.0",
   "truncation": {
     "direction": "Right",
+    "max_length": 512,
     "strategy": "LongestFirst",
     "stride": 0
   },

checkpoint-2363/tokenizer_config.json CHANGED Viewed

@@ -47,19 +47,12 @@
   "do_lower_case": true,
   "extra_special_tokens": {},
   "mask_token": "[MASK]",
-  "max_length": 128,
-  "model_max_length": 256,
   "never_split": null,
-  "pad_to_multiple_of": null,
   "pad_token": "[PAD]",
-  "pad_token_type_id": 0,
-  "padding_side": "right",
   "sep_token": "[SEP]",
-  "stride": 0,
   "strip_accents": null,
   "tokenize_chinese_chars": true,
   "tokenizer_class": "BertTokenizer",
-  "truncation_side": "right",
-  "truncation_strategy": "longest_first",
   "unk_token": "[UNK]"
 }

   "do_lower_case": true,
   "extra_special_tokens": {},
   "mask_token": "[MASK]",
+  "model_max_length": 512,
   "never_split": null,
   "pad_token": "[PAD]",
   "sep_token": "[SEP]",
   "strip_accents": null,
   "tokenize_chinese_chars": true,
   "tokenizer_class": "BertTokenizer",
   "unk_token": "[UNK]"
 }

checkpoint-2363/trainer_state.json CHANGED Viewed

@@ -11,41 +11,41 @@
   "log_history": [
     {
       "epoch": 0.00042319085907744394,
-      "grad_norm": 7.325957775115967,
       "learning_rate": 0.0,
-      "loss": 3.5779,
       "step": 1
     },
     {
       "epoch": 0.4231908590774439,
-      "grad_norm": 4.34982442855835,
-      "learning_rate": 4.772727272727273e-05,
-      "loss": 2.3665,
       "step": 1000
     },
     {
       "epoch": 0.4231908590774439,
-      "eval_cosine_accuracy": 0.9702387452125549,
-      "eval_loss": 1.3054696321487427,
-      "eval_runtime": 22.3048,
-      "eval_samples_per_second": 426.321,
-      "eval_steps_per_second": 1.704,
       "step": 1000
     },
     {
       "epoch": 0.8463817181548878,
-      "grad_norm": 3.8841800689697266,
-      "learning_rate": 3.989028213166144e-05,
-      "loss": 1.6303,
       "step": 2000
     },
     {
       "epoch": 0.8463817181548878,
-      "eval_cosine_accuracy": 0.9763382077217102,
-      "eval_loss": 1.2462551593780518,
-      "eval_runtime": 22.6647,
-      "eval_samples_per_second": 419.551,
-      "eval_steps_per_second": 1.677,
       "step": 2000
     }
   ],

   "log_history": [
     {
       "epoch": 0.00042319085907744394,
+      "grad_norm": 7.138582706451416,
       "learning_rate": 0.0,
+      "loss": 3.9633,
       "step": 1
     },
     {
       "epoch": 0.4231908590774439,
+      "grad_norm": 4.7261762619018555,
+      "learning_rate": 2.8636363636363637e-05,
+      "loss": 2.8713,
       "step": 1000
     },
     {
       "epoch": 0.4231908590774439,
+      "eval_cosine_accuracy": 0.955726146697998,
+      "eval_loss": 1.4648357629776,
+      "eval_runtime": 21.9623,
+      "eval_samples_per_second": 432.968,
+      "eval_steps_per_second": 1.73,
       "step": 1000
     },
     {
       "epoch": 0.8463817181548878,
+      "grad_norm": 4.730334281921387,
+      "learning_rate": 2.3934169278996865e-05,
+      "loss": 1.9927,
       "step": 2000
     },
     {
       "epoch": 0.8463817181548878,
+      "eval_cosine_accuracy": 0.9670838117599487,
+      "eval_loss": 1.3537352085113525,
+      "eval_runtime": 22.1966,
+      "eval_samples_per_second": 428.399,
+      "eval_steps_per_second": 1.712,
       "step": 2000
     }
   ],

checkpoint-2363/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f5d1bf9d84ad9960e265661047cdf67e46211355d49fa94ba9b701478c6da4ae
 size 5752

 version https://git-lfs.github.com/spec/v1
+oid sha256:205941f36c3b4c9d679bfe2bc0c478b9f96e84c22521706b4a131ca81a09243f
 size 5752