LamaDiab commited on
Commit
e370b31
·
verified ·
1 Parent(s): e9e89fc

Training in progress, epoch 1, checkpoint

Browse files
checkpoint-2363/README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
  - generated_from_trainer
8
  - dataset_size:604740
9
  - loss:MultipleNegativesSymmetricRankingLoss
10
- base_model: sentence-transformers/all-MiniLM-L6-v2
11
  widget:
12
  - source_sentence: casa chandelier
13
  sentences:
@@ -39,7 +39,7 @@ library_name: sentence-transformers
39
  metrics:
40
  - cosine_accuracy
41
  model-index:
42
- - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
43
  results:
44
  - task:
45
  type: triplet
@@ -49,20 +49,20 @@ model-index:
49
  type: unknown
50
  metrics:
51
  - type: cosine_accuracy
52
- value: 0.9763382077217102
53
  name: Cosine Accuracy
54
  ---
55
 
56
- # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
57
 
58
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
59
 
60
  ## Model Details
61
 
62
  ### Model Description
63
  - **Model Type:** Sentence Transformer
64
- - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
65
- - **Maximum Sequence Length:** 256 tokens
66
  - **Output Dimensionality:** 384 dimensions
67
  - **Similarity Function:** Cosine Similarity
68
  <!-- - **Training Dataset:** Unknown -->
@@ -79,9 +79,8 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [s
79
 
80
  ```
81
  SentenceTransformer(
82
- (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
83
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
84
- (2): Normalize()
85
  )
86
  ```
87
 
@@ -114,9 +113,9 @@ print(embeddings.shape)
114
  # Get the similarity scores for the embeddings
115
  similarities = model.similarity(embeddings, embeddings)
116
  print(similarities)
117
- # tensor([[1.0000, 0.6405, 0.2428],
118
- # [0.6405, 1.0000, 0.2613],
119
- # [0.2428, 0.2613, 1.0000]])
120
  ```
121
 
122
  <!--
@@ -153,7 +152,7 @@ You can finetune this model on your own dataset.
153
 
154
  | Metric | Value |
155
  |:--------------------|:-----------|
156
- | **cosine_accuracy** | **0.9763** |
157
 
158
  <!--
159
  ## Bias, Risks and Limitations
@@ -227,6 +226,7 @@ You can finetune this model on your own dataset.
227
  - `eval_strategy`: steps
228
  - `per_device_train_batch_size`: 256
229
  - `per_device_eval_batch_size`: 256
 
230
  - `weight_decay`: 0.01
231
  - `warmup_ratio`: 0.1
232
  - `fp16`: True
@@ -251,7 +251,7 @@ You can finetune this model on your own dataset.
251
  - `gradient_accumulation_steps`: 1
252
  - `eval_accumulation_steps`: None
253
  - `torch_empty_cache_steps`: None
254
- - `learning_rate`: 5e-05
255
  - `weight_decay`: 0.01
256
  - `adam_beta1`: 0.9
257
  - `adam_beta2`: 0.999
@@ -363,9 +363,9 @@ You can finetune this model on your own dataset.
363
  ### Training Logs
364
  | Epoch | Step | Training Loss | Validation Loss | cosine_accuracy |
365
  |:------:|:----:|:-------------:|:---------------:|:---------------:|
366
- | 0.0004 | 1 | 3.5779 | - | - |
367
- | 0.4232 | 1000 | 2.3665 | 1.3055 | 0.9702 |
368
- | 0.8464 | 2000 | 1.6303 | 1.2463 | 0.9763 |
369
 
370
 
371
  ### Framework Versions
 
7
  - generated_from_trainer
8
  - dataset_size:604740
9
  - loss:MultipleNegativesSymmetricRankingLoss
10
+ base_model: sentence-transformers/msmarco-MiniLM-L6-v3
11
  widget:
12
  - source_sentence: casa chandelier
13
  sentences:
 
39
  metrics:
40
  - cosine_accuracy
41
  model-index:
42
+ - name: SentenceTransformer based on sentence-transformers/msmarco-MiniLM-L6-v3
43
  results:
44
  - task:
45
  type: triplet
 
49
  type: unknown
50
  metrics:
51
  - type: cosine_accuracy
52
+ value: 0.9670838117599487
53
  name: Cosine Accuracy
54
  ---
55
 
56
+ # SentenceTransformer based on sentence-transformers/msmarco-MiniLM-L6-v3
57
 
58
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/msmarco-MiniLM-L6-v3](https://huggingface.co/sentence-transformers/msmarco-MiniLM-L6-v3). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
59
 
60
  ## Model Details
61
 
62
  ### Model Description
63
  - **Model Type:** Sentence Transformer
64
+ - **Base model:** [sentence-transformers/msmarco-MiniLM-L6-v3](https://huggingface.co/sentence-transformers/msmarco-MiniLM-L6-v3) <!-- at revision fea93b3df3924e5649a4e322c345f951239d2c13 -->
65
+ - **Maximum Sequence Length:** 512 tokens
66
  - **Output Dimensionality:** 384 dimensions
67
  - **Similarity Function:** Cosine Similarity
68
  <!-- - **Training Dataset:** Unknown -->
 
79
 
80
  ```
81
  SentenceTransformer(
82
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
83
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
 
84
  )
85
  ```
86
 
 
113
  # Get the similarity scores for the embeddings
114
  similarities = model.similarity(embeddings, embeddings)
115
  print(similarities)
116
+ # tensor([[1.0000, 0.6259, 0.2012],
117
+ # [0.6259, 1.0000, 0.3276],
118
+ # [0.2012, 0.3276, 1.0000]])
119
  ```
120
 
121
  <!--
 
152
 
153
  | Metric | Value |
154
  |:--------------------|:-----------|
155
+ | **cosine_accuracy** | **0.9671** |
156
 
157
  <!--
158
  ## Bias, Risks and Limitations
 
226
  - `eval_strategy`: steps
227
  - `per_device_train_batch_size`: 256
228
  - `per_device_eval_batch_size`: 256
229
+ - `learning_rate`: 3e-05
230
  - `weight_decay`: 0.01
231
  - `warmup_ratio`: 0.1
232
  - `fp16`: True
 
251
  - `gradient_accumulation_steps`: 1
252
  - `eval_accumulation_steps`: None
253
  - `torch_empty_cache_steps`: None
254
+ - `learning_rate`: 3e-05
255
  - `weight_decay`: 0.01
256
  - `adam_beta1`: 0.9
257
  - `adam_beta2`: 0.999
 
363
  ### Training Logs
364
  | Epoch | Step | Training Loss | Validation Loss | cosine_accuracy |
365
  |:------:|:----:|:-------------:|:---------------:|:---------------:|
366
+ | 0.0004 | 1 | 3.9633 | - | - |
367
+ | 0.4232 | 1000 | 2.8713 | 1.4648 | 0.9557 |
368
+ | 0.8464 | 2000 | 1.9927 | 1.3537 | 0.9671 |
369
 
370
 
371
  ### Framework Versions
checkpoint-2363/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0bfd81d3a99358b96086f8c4c08e0a612fe6fa4b2a852053806dfbb70e36327b
3
  size 90864192
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef37ee5f36232d507045bb9c1c0ac1259ddd6f4aef5331faf8e4cf01d0389b50
3
  size 90864192
checkpoint-2363/modules.json CHANGED
@@ -10,11 +10,5 @@
10
  "name": "1",
11
  "path": "1_Pooling",
12
  "type": "sentence_transformers.models.Pooling"
13
- },
14
- {
15
- "idx": 2,
16
- "name": "2",
17
- "path": "2_Normalize",
18
- "type": "sentence_transformers.models.Normalize"
19
  }
20
  ]
 
10
  "name": "1",
11
  "path": "1_Pooling",
12
  "type": "sentence_transformers.models.Pooling"
 
 
 
 
 
 
13
  }
14
  ]
checkpoint-2363/optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7710f7a38919249222975ea38d3eaf510d44114e9986d2265947c0ff1df64b51
3
  size 180607738
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a717c6db455cabd70202220e3073dbfcd5aea9d07808176297b1c36e1dccd43
3
  size 180607738
checkpoint-2363/scaler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cd257ee7a28cefc72dc197f54c65ecb78ed1a580549188a704acf026882e53cd
3
  size 988
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ad395fbb9b78c24afd03f9d7c78851c2c0b1c7e115626d0420813c72da60efd
3
  size 988
checkpoint-2363/scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:02702bc659d0a8eb2b27aa12a195144ffda7a1f0c1231777e3c9f2716ff76df9
3
  size 1064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe8a14a9a9ed42c3cf9c03994730fc6588e84252cf8181264e6dfeb40adff13d
3
  size 1064
checkpoint-2363/sentence_bert_config.json CHANGED
@@ -1,4 +1,4 @@
1
  {
2
- "max_seq_length": 256,
3
  "do_lower_case": false
4
  }
 
1
  {
2
+ "max_seq_length": 512,
3
  "do_lower_case": false
4
  }
checkpoint-2363/tokenizer.json CHANGED
@@ -2,7 +2,7 @@
2
  "version": "1.0",
3
  "truncation": {
4
  "direction": "Right",
5
- "max_length": 256,
6
  "strategy": "LongestFirst",
7
  "stride": 0
8
  },
 
2
  "version": "1.0",
3
  "truncation": {
4
  "direction": "Right",
5
+ "max_length": 512,
6
  "strategy": "LongestFirst",
7
  "stride": 0
8
  },
checkpoint-2363/tokenizer_config.json CHANGED
@@ -47,19 +47,12 @@
47
  "do_lower_case": true,
48
  "extra_special_tokens": {},
49
  "mask_token": "[MASK]",
50
- "max_length": 128,
51
- "model_max_length": 256,
52
  "never_split": null,
53
- "pad_to_multiple_of": null,
54
  "pad_token": "[PAD]",
55
- "pad_token_type_id": 0,
56
- "padding_side": "right",
57
  "sep_token": "[SEP]",
58
- "stride": 0,
59
  "strip_accents": null,
60
  "tokenize_chinese_chars": true,
61
  "tokenizer_class": "BertTokenizer",
62
- "truncation_side": "right",
63
- "truncation_strategy": "longest_first",
64
  "unk_token": "[UNK]"
65
  }
 
47
  "do_lower_case": true,
48
  "extra_special_tokens": {},
49
  "mask_token": "[MASK]",
50
+ "model_max_length": 512,
 
51
  "never_split": null,
 
52
  "pad_token": "[PAD]",
 
 
53
  "sep_token": "[SEP]",
 
54
  "strip_accents": null,
55
  "tokenize_chinese_chars": true,
56
  "tokenizer_class": "BertTokenizer",
 
 
57
  "unk_token": "[UNK]"
58
  }
checkpoint-2363/trainer_state.json CHANGED
@@ -11,41 +11,41 @@
11
  "log_history": [
12
  {
13
  "epoch": 0.00042319085907744394,
14
- "grad_norm": 7.325957775115967,
15
  "learning_rate": 0.0,
16
- "loss": 3.5779,
17
  "step": 1
18
  },
19
  {
20
  "epoch": 0.4231908590774439,
21
- "grad_norm": 4.34982442855835,
22
- "learning_rate": 4.772727272727273e-05,
23
- "loss": 2.3665,
24
  "step": 1000
25
  },
26
  {
27
  "epoch": 0.4231908590774439,
28
- "eval_cosine_accuracy": 0.9702387452125549,
29
- "eval_loss": 1.3054696321487427,
30
- "eval_runtime": 22.3048,
31
- "eval_samples_per_second": 426.321,
32
- "eval_steps_per_second": 1.704,
33
  "step": 1000
34
  },
35
  {
36
  "epoch": 0.8463817181548878,
37
- "grad_norm": 3.8841800689697266,
38
- "learning_rate": 3.989028213166144e-05,
39
- "loss": 1.6303,
40
  "step": 2000
41
  },
42
  {
43
  "epoch": 0.8463817181548878,
44
- "eval_cosine_accuracy": 0.9763382077217102,
45
- "eval_loss": 1.2462551593780518,
46
- "eval_runtime": 22.6647,
47
- "eval_samples_per_second": 419.551,
48
- "eval_steps_per_second": 1.677,
49
  "step": 2000
50
  }
51
  ],
 
11
  "log_history": [
12
  {
13
  "epoch": 0.00042319085907744394,
14
+ "grad_norm": 7.138582706451416,
15
  "learning_rate": 0.0,
16
+ "loss": 3.9633,
17
  "step": 1
18
  },
19
  {
20
  "epoch": 0.4231908590774439,
21
+ "grad_norm": 4.7261762619018555,
22
+ "learning_rate": 2.8636363636363637e-05,
23
+ "loss": 2.8713,
24
  "step": 1000
25
  },
26
  {
27
  "epoch": 0.4231908590774439,
28
+ "eval_cosine_accuracy": 0.955726146697998,
29
+ "eval_loss": 1.4648357629776,
30
+ "eval_runtime": 21.9623,
31
+ "eval_samples_per_second": 432.968,
32
+ "eval_steps_per_second": 1.73,
33
  "step": 1000
34
  },
35
  {
36
  "epoch": 0.8463817181548878,
37
+ "grad_norm": 4.730334281921387,
38
+ "learning_rate": 2.3934169278996865e-05,
39
+ "loss": 1.9927,
40
  "step": 2000
41
  },
42
  {
43
  "epoch": 0.8463817181548878,
44
+ "eval_cosine_accuracy": 0.9670838117599487,
45
+ "eval_loss": 1.3537352085113525,
46
+ "eval_runtime": 22.1966,
47
+ "eval_samples_per_second": 428.399,
48
+ "eval_steps_per_second": 1.712,
49
  "step": 2000
50
  }
51
  ],
checkpoint-2363/training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f5d1bf9d84ad9960e265661047cdf67e46211355d49fa94ba9b701478c6da4ae
3
  size 5752
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:205941f36c3b4c9d679bfe2bc0c478b9f96e84c22521706b4a131ca81a09243f
3
  size 5752