HariishHafiiz commited on
Commit
04ab559
·
verified ·
1 Parent(s): 965e618

SBERT FT - 3 epoch

Browse files
1_Pooling/config.json CHANGED
@@ -1,10 +1,5 @@
1
  {
2
- "word_embedding_dimension": 768,
3
- "pooling_mode_cls_token": false,
4
- "pooling_mode_mean_tokens": true,
5
- "pooling_mode_max_tokens": false,
6
- "pooling_mode_mean_sqrt_len_tokens": false,
7
- "pooling_mode_weightedmean_tokens": false,
8
- "pooling_mode_lasttoken": false,
9
  "include_prompt": true
10
  }
 
1
  {
2
+ "embedding_dimension": 768,
3
+ "pooling_mode": "mean",
 
 
 
 
 
4
  "include_prompt": true
5
  }
README.md CHANGED
@@ -3,7 +3,6 @@ tags:
3
  - sentence-transformers
4
  - sentence-similarity
5
  - feature-extraction
6
- - dense
7
  - generated_from_trainer
8
  - dataset_size:5920
9
  - loss:MultipleNegativesRankingLoss
@@ -255,7 +254,7 @@ library_name: sentence-transformers
255
 
256
  # SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
257
 
258
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
259
 
260
  ## Model Details
261
 
@@ -265,6 +264,7 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [s
265
  - **Maximum Sequence Length:** 256 tokens
266
  - **Output Dimensionality:** 768 dimensions
267
  - **Similarity Function:** Cosine Similarity
 
268
  <!-- - **Training Dataset:** Unknown -->
269
  <!-- - **Language:** Unknown -->
270
  <!-- - **License:** Unknown -->
@@ -279,9 +279,9 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [s
279
 
280
  ```
281
  SentenceTransformer(
282
- (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'MPNetModel'})
283
- (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
284
- (2): Normalize()
285
  )
286
  ```
287
 
@@ -294,7 +294,6 @@ First install the Sentence Transformers library:
294
  ```bash
295
  pip install -U sentence-transformers
296
  ```
297
-
298
  Then you can load this model and run inference.
299
  ```python
300
  from sentence_transformers import SentenceTransformer
@@ -314,11 +313,10 @@ print(embeddings.shape)
314
  # Get the similarity scores for the embeddings
315
  similarities = model.similarity(embeddings, embeddings)
316
  print(similarities)
317
- # tensor([[ 1.0000, 0.8299, -0.1454],
318
- # [ 0.8299, 1.0000, -0.0903],
319
- # [-0.1454, -0.0903, 1.0000]])
320
  ```
321
-
322
  <!--
323
  ### Direct Usage (Transformers)
324
 
@@ -379,7 +377,13 @@ You can finetune this model on your own dataset.
379
  {
380
  "scale": 20.0,
381
  "similarity_fct": "cos_sim",
382
- "gather_across_devices": false
 
 
 
 
 
 
383
  }
384
  ```
385
 
@@ -496,36 +500,39 @@ You can finetune this model on your own dataset.
496
  | Epoch | Step | Training Loss |
497
  |:------:|:----:|:-------------:|
498
  | 0.1351 | 50 | 0.3131 |
499
- | 0.2703 | 100 | 0.2310 |
500
- | 0.4054 | 150 | 0.3020 |
501
- | 0.5405 | 200 | 0.2301 |
502
- | 0.6757 | 250 | 0.2584 |
503
- | 0.8108 | 300 | 0.2278 |
504
- | 0.9459 | 350 | 0.2608 |
505
- | 1.0811 | 400 | 0.1618 |
506
- | 1.2162 | 450 | 0.0894 |
507
- | 1.3514 | 500 | 0.0825 |
508
- | 1.4865 | 550 | 0.1157 |
509
- | 1.6216 | 600 | 0.1062 |
510
- | 1.7568 | 650 | 0.0873 |
511
- | 1.8919 | 700 | 0.0677 |
512
- | 2.0270 | 750 | 0.0996 |
513
- | 2.1622 | 800 | 0.0387 |
514
- | 2.2973 | 850 | 0.0389 |
515
- | 2.4324 | 900 | 0.0469 |
516
- | 2.5676 | 950 | 0.0345 |
517
- | 2.7027 | 1000 | 0.0407 |
518
- | 2.8378 | 1050 | 0.0403 |
519
- | 2.9730 | 1100 | 0.0385 |
520
-
 
 
 
521
 
522
  ### Framework Versions
523
- - Python: 3.12.12
524
- - Sentence Transformers: 5.2.3
525
  - Transformers: 5.0.0
526
  - PyTorch: 2.10.0+cu128
527
- - Accelerate: 1.12.0
528
- - Datasets: 4.8.3
529
  - Tokenizers: 0.22.2
530
 
531
  ## Citation
@@ -547,13 +554,14 @@ You can finetune this model on your own dataset.
547
 
548
  #### MultipleNegativesRankingLoss
549
  ```bibtex
550
- @misc{henderson2017efficient,
551
- title={Efficient Natural Language Response Suggestion for Smart Reply},
552
- author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
553
- year={2017},
554
- eprint={1705.00652},
555
- archivePrefix={arXiv},
556
- primaryClass={cs.CL}
 
557
  }
558
  ```
559
 
 
3
  - sentence-transformers
4
  - sentence-similarity
5
  - feature-extraction
 
6
  - generated_from_trainer
7
  - dataset_size:5920
8
  - loss:MultipleNegativesRankingLoss
 
254
 
255
  # SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
256
 
257
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for retrieval.
258
 
259
  ## Model Details
260
 
 
264
  - **Maximum Sequence Length:** 256 tokens
265
  - **Output Dimensionality:** 768 dimensions
266
  - **Similarity Function:** Cosine Similarity
267
+ - **Supported Modality:** Text
268
  <!-- - **Training Dataset:** Unknown -->
269
  <!-- - **Language:** Unknown -->
270
  <!-- - **License:** Unknown -->
 
279
 
280
  ```
281
  SentenceTransformer(
282
+ (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'MPNetModel'})
283
+ (1): Pooling({'embedding_dimension': 768, 'pooling_mode': 'mean', 'include_prompt': True})
284
+ (2): Normalize({})
285
  )
286
  ```
287
 
 
294
  ```bash
295
  pip install -U sentence-transformers
296
  ```
 
297
  Then you can load this model and run inference.
298
  ```python
299
  from sentence_transformers import SentenceTransformer
 
313
  # Get the similarity scores for the embeddings
314
  similarities = model.similarity(embeddings, embeddings)
315
  print(similarities)
316
+ # tensor([[ 1.0000, 0.7935, -0.1523],
317
+ # [ 0.7935, 1.0000, -0.1127],
318
+ # [-0.1523, -0.1127, 1.0000]])
319
  ```
 
320
  <!--
321
  ### Direct Usage (Transformers)
322
 
 
377
  {
378
  "scale": 20.0,
379
  "similarity_fct": "cos_sim",
380
+ "gather_across_devices": false,
381
+ "directions": [
382
+ "query_to_doc"
383
+ ],
384
+ "partition_mode": "joint",
385
+ "hardness_mode": null,
386
+ "hardness_strength": 0.0
387
  }
388
  ```
389
 
 
500
  | Epoch | Step | Training Loss |
501
  |:------:|:----:|:-------------:|
502
  | 0.1351 | 50 | 0.3131 |
503
+ | 0.2703 | 100 | 0.2313 |
504
+ | 0.4054 | 150 | 0.3017 |
505
+ | 0.5405 | 200 | 0.2300 |
506
+ | 0.6757 | 250 | 0.2556 |
507
+ | 0.8108 | 300 | 0.2238 |
508
+ | 0.9459 | 350 | 0.2625 |
509
+ | 1.0811 | 400 | 0.1785 |
510
+ | 1.2162 | 450 | 0.0880 |
511
+ | 1.3514 | 500 | 0.0839 |
512
+ | 1.4865 | 550 | 0.1220 |
513
+ | 1.6216 | 600 | 0.0988 |
514
+ | 1.7568 | 650 | 0.0837 |
515
+ | 1.8919 | 700 | 0.0685 |
516
+ | 2.0270 | 750 | 0.0924 |
517
+ | 2.1622 | 800 | 0.0381 |
518
+ | 2.2973 | 850 | 0.0368 |
519
+ | 2.4324 | 900 | 0.0475 |
520
+ | 2.5676 | 950 | 0.0324 |
521
+ | 2.7027 | 1000 | 0.0439 |
522
+ | 2.8378 | 1050 | 0.0421 |
523
+ | 2.9730 | 1100 | 0.0403 |
524
+
525
+
526
+ ### Training Time
527
+ - **Training**: 16.8 minutes
528
 
529
  ### Framework Versions
530
+ - Python: 3.12.13
531
+ - Sentence Transformers: 5.4.0
532
  - Transformers: 5.0.0
533
  - PyTorch: 2.10.0+cu128
534
+ - Accelerate: 1.13.0
535
+ - Datasets: 4.8.5
536
  - Tokenizers: 0.22.2
537
 
538
  ## Citation
 
554
 
555
  #### MultipleNegativesRankingLoss
556
  ```bibtex
557
+ @misc{oord2019representationlearningcontrastivepredictive,
558
+ title={Representation Learning with Contrastive Predictive Coding},
559
+ author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
560
+ year={2019},
561
+ eprint={1807.03748},
562
+ archivePrefix={arXiv},
563
+ primaryClass={cs.LG},
564
+ url={https://arxiv.org/abs/1807.03748},
565
  }
566
  ```
567
 
config_sentence_transformers.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "__version__": {
3
- "sentence_transformers": "5.2.3",
4
- "transformers": "5.0.0",
5
- "pytorch": "2.10.0+cu128"
6
  },
 
7
  "model_type": "SentenceTransformer",
8
  "prompts": {
9
- "query": "",
10
- "document": ""
11
  },
12
- "default_prompt_name": null,
13
  "similarity_fn_name": "cosine"
14
  }
 
1
  {
2
  "__version__": {
3
+ "pytorch": "2.10.0+cu128",
4
+ "sentence_transformers": "5.4.0",
5
+ "transformers": "5.0.0"
6
  },
7
+ "default_prompt_name": null,
8
  "model_type": "SentenceTransformer",
9
  "prompts": {
10
+ "document": "",
11
+ "query": ""
12
  },
 
13
  "similarity_fn_name": "cosine"
14
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9a3f6b54fac9c258f46cc46422c22b4047b304b34d1da086c41b6bad4d916e51
3
  size 437967648
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:887a4612dd0e340d62ec36181ebec7bd101bec723ceec585be1fa7c7868ee2c0
3
  size 437967648
modules.json CHANGED
@@ -3,18 +3,18 @@
3
  "idx": 0,
4
  "name": "0",
5
  "path": "",
6
- "type": "sentence_transformers.models.Transformer"
7
  },
8
  {
9
  "idx": 1,
10
  "name": "1",
11
  "path": "1_Pooling",
12
- "type": "sentence_transformers.models.Pooling"
13
  },
14
  {
15
  "idx": 2,
16
  "name": "2",
17
  "path": "2_Normalize",
18
- "type": "sentence_transformers.models.Normalize"
19
  }
20
  ]
 
3
  "idx": 0,
4
  "name": "0",
5
  "path": "",
6
+ "type": "sentence_transformers.base.modules.transformer.Transformer"
7
  },
8
  {
9
  "idx": 1,
10
  "name": "1",
11
  "path": "1_Pooling",
12
+ "type": "sentence_transformers.sentence_transformer.modules.pooling.Pooling"
13
  },
14
  {
15
  "idx": 2,
16
  "name": "2",
17
  "path": "2_Normalize",
18
+ "type": "sentence_transformers.sentence_transformer.modules.normalize.Normalize"
19
  }
20
  ]
sentence_bert_config.json CHANGED
@@ -1,4 +1,10 @@
1
  {
2
- "max_seq_length": 256,
3
- "do_lower_case": false
 
 
 
 
 
 
4
  }
 
1
  {
2
+ "transformer_task": "feature-extraction",
3
+ "modality_config": {
4
+ "text": {
5
+ "method": "forward",
6
+ "method_output_name": "last_hidden_state"
7
+ }
8
+ },
9
+ "module_output_name": "token_embeddings"
10
  }
tokenizer_config.json CHANGED
@@ -6,7 +6,7 @@
6
  "eos_token": "</s>",
7
  "is_local": false,
8
  "mask_token": "<mask>",
9
- "model_max_length": 384,
10
  "pad_token": "<pad>",
11
  "sep_token": "</s>",
12
  "strip_accents": null,
 
6
  "eos_token": "</s>",
7
  "is_local": false,
8
  "mask_token": "<mask>",
9
+ "model_max_length": 256,
10
  "pad_token": "<pad>",
11
  "sep_token": "</s>",
12
  "strip_accents": null,