Training in progress, epoch 1, checkpoint
Browse files- checkpoint-2363/README.md +17 -17
- checkpoint-2363/model.safetensors +1 -1
- checkpoint-2363/modules.json +0 -6
- checkpoint-2363/optimizer.pt +1 -1
- checkpoint-2363/scaler.pt +1 -1
- checkpoint-2363/scheduler.pt +1 -1
- checkpoint-2363/sentence_bert_config.json +1 -1
- checkpoint-2363/tokenizer.json +1 -1
- checkpoint-2363/tokenizer_config.json +1 -8
- checkpoint-2363/trainer_state.json +18 -18
- checkpoint-2363/training_args.bin +1 -1
checkpoint-2363/README.md
CHANGED
|
@@ -7,7 +7,7 @@ tags:
|
|
| 7 |
- generated_from_trainer
|
| 8 |
- dataset_size:604740
|
| 9 |
- loss:MultipleNegativesSymmetricRankingLoss
|
| 10 |
-
base_model: sentence-transformers/
|
| 11 |
widget:
|
| 12 |
- source_sentence: casa chandelier
|
| 13 |
sentences:
|
|
@@ -39,7 +39,7 @@ library_name: sentence-transformers
|
|
| 39 |
metrics:
|
| 40 |
- cosine_accuracy
|
| 41 |
model-index:
|
| 42 |
-
- name: SentenceTransformer based on sentence-transformers/
|
| 43 |
results:
|
| 44 |
- task:
|
| 45 |
type: triplet
|
|
@@ -49,20 +49,20 @@ model-index:
|
|
| 49 |
type: unknown
|
| 50 |
metrics:
|
| 51 |
- type: cosine_accuracy
|
| 52 |
-
value: 0.
|
| 53 |
name: Cosine Accuracy
|
| 54 |
---
|
| 55 |
|
| 56 |
-
# SentenceTransformer based on sentence-transformers/
|
| 57 |
|
| 58 |
-
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/
|
| 59 |
|
| 60 |
## Model Details
|
| 61 |
|
| 62 |
### Model Description
|
| 63 |
- **Model Type:** Sentence Transformer
|
| 64 |
-
- **Base model:** [sentence-transformers/
|
| 65 |
-
- **Maximum Sequence Length:**
|
| 66 |
- **Output Dimensionality:** 384 dimensions
|
| 67 |
- **Similarity Function:** Cosine Similarity
|
| 68 |
<!-- - **Training Dataset:** Unknown -->
|
|
@@ -79,9 +79,8 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [s
|
|
| 79 |
|
| 80 |
```
|
| 81 |
SentenceTransformer(
|
| 82 |
-
(0): Transformer({'max_seq_length':
|
| 83 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
| 84 |
-
(2): Normalize()
|
| 85 |
)
|
| 86 |
```
|
| 87 |
|
|
@@ -114,9 +113,9 @@ print(embeddings.shape)
|
|
| 114 |
# Get the similarity scores for the embeddings
|
| 115 |
similarities = model.similarity(embeddings, embeddings)
|
| 116 |
print(similarities)
|
| 117 |
-
# tensor([[1.0000, 0.
|
| 118 |
-
# [0.
|
| 119 |
-
# [0.
|
| 120 |
```
|
| 121 |
|
| 122 |
<!--
|
|
@@ -153,7 +152,7 @@ You can finetune this model on your own dataset.
|
|
| 153 |
|
| 154 |
| Metric | Value |
|
| 155 |
|:--------------------|:-----------|
|
| 156 |
-
| **cosine_accuracy** | **0.
|
| 157 |
|
| 158 |
<!--
|
| 159 |
## Bias, Risks and Limitations
|
|
@@ -227,6 +226,7 @@ You can finetune this model on your own dataset.
|
|
| 227 |
- `eval_strategy`: steps
|
| 228 |
- `per_device_train_batch_size`: 256
|
| 229 |
- `per_device_eval_batch_size`: 256
|
|
|
|
| 230 |
- `weight_decay`: 0.01
|
| 231 |
- `warmup_ratio`: 0.1
|
| 232 |
- `fp16`: True
|
|
@@ -251,7 +251,7 @@ You can finetune this model on your own dataset.
|
|
| 251 |
- `gradient_accumulation_steps`: 1
|
| 252 |
- `eval_accumulation_steps`: None
|
| 253 |
- `torch_empty_cache_steps`: None
|
| 254 |
-
- `learning_rate`:
|
| 255 |
- `weight_decay`: 0.01
|
| 256 |
- `adam_beta1`: 0.9
|
| 257 |
- `adam_beta2`: 0.999
|
|
@@ -363,9 +363,9 @@ You can finetune this model on your own dataset.
|
|
| 363 |
### Training Logs
|
| 364 |
| Epoch | Step | Training Loss | Validation Loss | cosine_accuracy |
|
| 365 |
|:------:|:----:|:-------------:|:---------------:|:---------------:|
|
| 366 |
-
| 0.0004 | 1 | 3.
|
| 367 |
-
| 0.4232 | 1000 | 2.
|
| 368 |
-
| 0.8464 | 2000 | 1.
|
| 369 |
|
| 370 |
|
| 371 |
### Framework Versions
|
|
|
|
| 7 |
- generated_from_trainer
|
| 8 |
- dataset_size:604740
|
| 9 |
- loss:MultipleNegativesSymmetricRankingLoss
|
| 10 |
+
base_model: sentence-transformers/msmarco-MiniLM-L6-v3
|
| 11 |
widget:
|
| 12 |
- source_sentence: casa chandelier
|
| 13 |
sentences:
|
|
|
|
| 39 |
metrics:
|
| 40 |
- cosine_accuracy
|
| 41 |
model-index:
|
| 42 |
+
- name: SentenceTransformer based on sentence-transformers/msmarco-MiniLM-L6-v3
|
| 43 |
results:
|
| 44 |
- task:
|
| 45 |
type: triplet
|
|
|
|
| 49 |
type: unknown
|
| 50 |
metrics:
|
| 51 |
- type: cosine_accuracy
|
| 52 |
+
value: 0.9670838117599487
|
| 53 |
name: Cosine Accuracy
|
| 54 |
---
|
| 55 |
|
| 56 |
+
# SentenceTransformer based on sentence-transformers/msmarco-MiniLM-L6-v3
|
| 57 |
|
| 58 |
+
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/msmarco-MiniLM-L6-v3](https://huggingface.co/sentence-transformers/msmarco-MiniLM-L6-v3). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
| 59 |
|
| 60 |
## Model Details
|
| 61 |
|
| 62 |
### Model Description
|
| 63 |
- **Model Type:** Sentence Transformer
|
| 64 |
+
- **Base model:** [sentence-transformers/msmarco-MiniLM-L6-v3](https://huggingface.co/sentence-transformers/msmarco-MiniLM-L6-v3) <!-- at revision fea93b3df3924e5649a4e322c345f951239d2c13 -->
|
| 65 |
+
- **Maximum Sequence Length:** 512 tokens
|
| 66 |
- **Output Dimensionality:** 384 dimensions
|
| 67 |
- **Similarity Function:** Cosine Similarity
|
| 68 |
<!-- - **Training Dataset:** Unknown -->
|
|
|
|
| 79 |
|
| 80 |
```
|
| 81 |
SentenceTransformer(
|
| 82 |
+
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
|
| 83 |
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
|
|
|
|
| 84 |
)
|
| 85 |
```
|
| 86 |
|
|
|
|
| 113 |
# Get the similarity scores for the embeddings
|
| 114 |
similarities = model.similarity(embeddings, embeddings)
|
| 115 |
print(similarities)
|
| 116 |
+
# tensor([[1.0000, 0.6259, 0.2012],
|
| 117 |
+
# [0.6259, 1.0000, 0.3276],
|
| 118 |
+
# [0.2012, 0.3276, 1.0000]])
|
| 119 |
```
|
| 120 |
|
| 121 |
<!--
|
|
|
|
| 152 |
|
| 153 |
| Metric | Value |
|
| 154 |
|:--------------------|:-----------|
|
| 155 |
+
| **cosine_accuracy** | **0.9671** |
|
| 156 |
|
| 157 |
<!--
|
| 158 |
## Bias, Risks and Limitations
|
|
|
|
| 226 |
- `eval_strategy`: steps
|
| 227 |
- `per_device_train_batch_size`: 256
|
| 228 |
- `per_device_eval_batch_size`: 256
|
| 229 |
+
- `learning_rate`: 3e-05
|
| 230 |
- `weight_decay`: 0.01
|
| 231 |
- `warmup_ratio`: 0.1
|
| 232 |
- `fp16`: True
|
|
|
|
| 251 |
- `gradient_accumulation_steps`: 1
|
| 252 |
- `eval_accumulation_steps`: None
|
| 253 |
- `torch_empty_cache_steps`: None
|
| 254 |
+
- `learning_rate`: 3e-05
|
| 255 |
- `weight_decay`: 0.01
|
| 256 |
- `adam_beta1`: 0.9
|
| 257 |
- `adam_beta2`: 0.999
|
|
|
|
| 363 |
### Training Logs
|
| 364 |
| Epoch | Step | Training Loss | Validation Loss | cosine_accuracy |
|
| 365 |
|:------:|:----:|:-------------:|:---------------:|:---------------:|
|
| 366 |
+
| 0.0004 | 1 | 3.9633 | - | - |
|
| 367 |
+
| 0.4232 | 1000 | 2.8713 | 1.4648 | 0.9557 |
|
| 368 |
+
| 0.8464 | 2000 | 1.9927 | 1.3537 | 0.9671 |
|
| 369 |
|
| 370 |
|
| 371 |
### Framework Versions
|
checkpoint-2363/model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 90864192
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ef37ee5f36232d507045bb9c1c0ac1259ddd6f4aef5331faf8e4cf01d0389b50
|
| 3 |
size 90864192
|
checkpoint-2363/modules.json
CHANGED
|
@@ -10,11 +10,5 @@
|
|
| 10 |
"name": "1",
|
| 11 |
"path": "1_Pooling",
|
| 12 |
"type": "sentence_transformers.models.Pooling"
|
| 13 |
-
},
|
| 14 |
-
{
|
| 15 |
-
"idx": 2,
|
| 16 |
-
"name": "2",
|
| 17 |
-
"path": "2_Normalize",
|
| 18 |
-
"type": "sentence_transformers.models.Normalize"
|
| 19 |
}
|
| 20 |
]
|
|
|
|
| 10 |
"name": "1",
|
| 11 |
"path": "1_Pooling",
|
| 12 |
"type": "sentence_transformers.models.Pooling"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
}
|
| 14 |
]
|
checkpoint-2363/optimizer.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 180607738
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1a717c6db455cabd70202220e3073dbfcd5aea9d07808176297b1c36e1dccd43
|
| 3 |
size 180607738
|
checkpoint-2363/scaler.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 988
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7ad395fbb9b78c24afd03f9d7c78851c2c0b1c7e115626d0420813c72da60efd
|
| 3 |
size 988
|
checkpoint-2363/scheduler.pt
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1064
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fe8a14a9a9ed42c3cf9c03994730fc6588e84252cf8181264e6dfeb40adff13d
|
| 3 |
size 1064
|
checkpoint-2363/sentence_bert_config.json
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
{
|
| 2 |
-
"max_seq_length":
|
| 3 |
"do_lower_case": false
|
| 4 |
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"max_seq_length": 512,
|
| 3 |
"do_lower_case": false
|
| 4 |
}
|
checkpoint-2363/tokenizer.json
CHANGED
|
@@ -2,7 +2,7 @@
|
|
| 2 |
"version": "1.0",
|
| 3 |
"truncation": {
|
| 4 |
"direction": "Right",
|
| 5 |
-
"max_length":
|
| 6 |
"strategy": "LongestFirst",
|
| 7 |
"stride": 0
|
| 8 |
},
|
|
|
|
| 2 |
"version": "1.0",
|
| 3 |
"truncation": {
|
| 4 |
"direction": "Right",
|
| 5 |
+
"max_length": 512,
|
| 6 |
"strategy": "LongestFirst",
|
| 7 |
"stride": 0
|
| 8 |
},
|
checkpoint-2363/tokenizer_config.json
CHANGED
|
@@ -47,19 +47,12 @@
|
|
| 47 |
"do_lower_case": true,
|
| 48 |
"extra_special_tokens": {},
|
| 49 |
"mask_token": "[MASK]",
|
| 50 |
-
"
|
| 51 |
-
"model_max_length": 256,
|
| 52 |
"never_split": null,
|
| 53 |
-
"pad_to_multiple_of": null,
|
| 54 |
"pad_token": "[PAD]",
|
| 55 |
-
"pad_token_type_id": 0,
|
| 56 |
-
"padding_side": "right",
|
| 57 |
"sep_token": "[SEP]",
|
| 58 |
-
"stride": 0,
|
| 59 |
"strip_accents": null,
|
| 60 |
"tokenize_chinese_chars": true,
|
| 61 |
"tokenizer_class": "BertTokenizer",
|
| 62 |
-
"truncation_side": "right",
|
| 63 |
-
"truncation_strategy": "longest_first",
|
| 64 |
"unk_token": "[UNK]"
|
| 65 |
}
|
|
|
|
| 47 |
"do_lower_case": true,
|
| 48 |
"extra_special_tokens": {},
|
| 49 |
"mask_token": "[MASK]",
|
| 50 |
+
"model_max_length": 512,
|
|
|
|
| 51 |
"never_split": null,
|
|
|
|
| 52 |
"pad_token": "[PAD]",
|
|
|
|
|
|
|
| 53 |
"sep_token": "[SEP]",
|
|
|
|
| 54 |
"strip_accents": null,
|
| 55 |
"tokenize_chinese_chars": true,
|
| 56 |
"tokenizer_class": "BertTokenizer",
|
|
|
|
|
|
|
| 57 |
"unk_token": "[UNK]"
|
| 58 |
}
|
checkpoint-2363/trainer_state.json
CHANGED
|
@@ -11,41 +11,41 @@
|
|
| 11 |
"log_history": [
|
| 12 |
{
|
| 13 |
"epoch": 0.00042319085907744394,
|
| 14 |
-
"grad_norm": 7.
|
| 15 |
"learning_rate": 0.0,
|
| 16 |
-
"loss": 3.
|
| 17 |
"step": 1
|
| 18 |
},
|
| 19 |
{
|
| 20 |
"epoch": 0.4231908590774439,
|
| 21 |
-
"grad_norm": 4.
|
| 22 |
-
"learning_rate":
|
| 23 |
-
"loss": 2.
|
| 24 |
"step": 1000
|
| 25 |
},
|
| 26 |
{
|
| 27 |
"epoch": 0.4231908590774439,
|
| 28 |
-
"eval_cosine_accuracy": 0.
|
| 29 |
-
"eval_loss": 1.
|
| 30 |
-
"eval_runtime":
|
| 31 |
-
"eval_samples_per_second":
|
| 32 |
-
"eval_steps_per_second": 1.
|
| 33 |
"step": 1000
|
| 34 |
},
|
| 35 |
{
|
| 36 |
"epoch": 0.8463817181548878,
|
| 37 |
-
"grad_norm":
|
| 38 |
-
"learning_rate":
|
| 39 |
-
"loss": 1.
|
| 40 |
"step": 2000
|
| 41 |
},
|
| 42 |
{
|
| 43 |
"epoch": 0.8463817181548878,
|
| 44 |
-
"eval_cosine_accuracy": 0.
|
| 45 |
-
"eval_loss": 1.
|
| 46 |
-
"eval_runtime": 22.
|
| 47 |
-
"eval_samples_per_second":
|
| 48 |
-
"eval_steps_per_second": 1.
|
| 49 |
"step": 2000
|
| 50 |
}
|
| 51 |
],
|
|
|
|
| 11 |
"log_history": [
|
| 12 |
{
|
| 13 |
"epoch": 0.00042319085907744394,
|
| 14 |
+
"grad_norm": 7.138582706451416,
|
| 15 |
"learning_rate": 0.0,
|
| 16 |
+
"loss": 3.9633,
|
| 17 |
"step": 1
|
| 18 |
},
|
| 19 |
{
|
| 20 |
"epoch": 0.4231908590774439,
|
| 21 |
+
"grad_norm": 4.7261762619018555,
|
| 22 |
+
"learning_rate": 2.8636363636363637e-05,
|
| 23 |
+
"loss": 2.8713,
|
| 24 |
"step": 1000
|
| 25 |
},
|
| 26 |
{
|
| 27 |
"epoch": 0.4231908590774439,
|
| 28 |
+
"eval_cosine_accuracy": 0.955726146697998,
|
| 29 |
+
"eval_loss": 1.4648357629776,
|
| 30 |
+
"eval_runtime": 21.9623,
|
| 31 |
+
"eval_samples_per_second": 432.968,
|
| 32 |
+
"eval_steps_per_second": 1.73,
|
| 33 |
"step": 1000
|
| 34 |
},
|
| 35 |
{
|
| 36 |
"epoch": 0.8463817181548878,
|
| 37 |
+
"grad_norm": 4.730334281921387,
|
| 38 |
+
"learning_rate": 2.3934169278996865e-05,
|
| 39 |
+
"loss": 1.9927,
|
| 40 |
"step": 2000
|
| 41 |
},
|
| 42 |
{
|
| 43 |
"epoch": 0.8463817181548878,
|
| 44 |
+
"eval_cosine_accuracy": 0.9670838117599487,
|
| 45 |
+
"eval_loss": 1.3537352085113525,
|
| 46 |
+
"eval_runtime": 22.1966,
|
| 47 |
+
"eval_samples_per_second": 428.399,
|
| 48 |
+
"eval_steps_per_second": 1.712,
|
| 49 |
"step": 2000
|
| 50 |
}
|
| 51 |
],
|
checkpoint-2363/training_args.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 5752
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:205941f36c3b4c9d679bfe2bc0c478b9f96e84c22521706b4a131ca81a09243f
|
| 3 |
size 5752
|