veton-berisha commited on
Commit
9cf3a86
·
verified ·
1 Parent(s): 84ff9db

mse=0.1016

Browse files
README.md CHANGED
@@ -3,20 +3,69 @@ tags:
3
  - sentence-transformers
4
  - sentence-similarity
5
  - feature-extraction
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  pipeline_tag: sentence-similarity
7
  library_name: sentence-transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
- # SentenceTransformer
11
 
12
- This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
13
 
14
  ## Model Details
15
 
16
  ### Model Description
17
  - **Model Type:** Sentence Transformer
18
- <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
19
- - **Maximum Sequence Length:** 128 tokens
20
  - **Output Dimensionality:** 768 dimensions
21
  - **Similarity Function:** Cosine Similarity
22
  <!-- - **Training Dataset:** Unknown -->
@@ -33,7 +82,7 @@ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps
33
 
34
  ```
35
  SentenceTransformer(
36
- (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
37
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
38
  )
39
  ```
@@ -56,9 +105,9 @@ from sentence_transformers import SentenceTransformer
56
  model = SentenceTransformer("sentence_transformers_model_id")
57
  # Run inference
58
  sentences = [
59
- 'The weather is lovely today.',
60
- "It's so sunny outside!",
61
- 'He drove to the stadium.',
62
  ]
63
  embeddings = model.encode(sentences)
64
  print(embeddings.shape)
@@ -94,6 +143,20 @@ You can finetune this model on your own dataset.
94
  *List how the model may foreseeably be misused and address what users ought not to do with the model.*
95
  -->
96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  <!--
98
  ## Bias, Risks and Limitations
99
 
@@ -108,6 +171,166 @@ You can finetune this model on your own dataset.
108
 
109
  ## Training Details
110
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
  ### Framework Versions
112
  - Python: 3.12.9
113
  - Sentence Transformers: 4.1.0
@@ -121,6 +344,31 @@ You can finetune this model on your own dataset.
121
 
122
  ### BibTeX
123
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
124
  <!--
125
  ## Glossary
126
 
 
3
  - sentence-transformers
4
  - sentence-similarity
5
  - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:1621
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: sentence-transformers/all-mpnet-base-v2
10
+ widget:
11
+ - source_sentence: Liveblocks, real-time collaboration infrastructure
12
+ sentences:
13
+ - Serverless routing patterns
14
+ - Socket.io for basic real-time features
15
+ - Neutral platform development only
16
+ - source_sentence: Positive attitude and team spirit
17
+ sentences:
18
+ - 6 years Android development, Java and Kotlin, Google Play publications
19
+ - Maintains team morale during challenging projects
20
+ - Lucky platforms only
21
+ - source_sentence: Experience with .NET Core and C# development required
22
+ sentences:
23
+ - Organized team building activities and fostered inclusive environment
24
+ - iptables, firewall rule management
25
+ - 10 years C# development with .NET Framework and .NET Core 3.1+
26
+ - source_sentence: Onion Routing, Tor support
27
+ sentences:
28
+ - Privacy-focused architecture design
29
+ - Led global teams across 6 countries effectively
30
+ - Business aware, context driven, strategic thinker
31
+ - source_sentence: Must have expertise in Angular and TypeScript
32
+ sentences:
33
+ - React developer with JavaScript ES6+ experience
34
+ - Mobile app developer with no AR/VR experience
35
+ - Owns errors, learns from mistakes, transparent
36
  pipeline_tag: sentence-similarity
37
  library_name: sentence-transformers
38
+ metrics:
39
+ - pearson_cosine
40
+ - spearman_cosine
41
+ model-index:
42
+ - name: SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
43
+ results:
44
+ - task:
45
+ type: semantic-similarity
46
+ name: Semantic Similarity
47
+ dataset:
48
+ name: val
49
+ type: val
50
+ metrics:
51
+ - type: pearson_cosine
52
+ value: 0.33261488496356484
53
+ name: Pearson Cosine
54
+ - type: spearman_cosine
55
+ value: 0.3462323228018911
56
+ name: Spearman Cosine
57
  ---
58
 
59
+ # SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
60
 
61
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
62
 
63
  ## Model Details
64
 
65
  ### Model Description
66
  - **Model Type:** Sentence Transformer
67
+ - **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) <!-- at revision 12e86a3c702fc3c50205a8db88f0ec7c0b6b94a0 -->
68
+ - **Maximum Sequence Length:** 256 tokens
69
  - **Output Dimensionality:** 768 dimensions
70
  - **Similarity Function:** Cosine Similarity
71
  <!-- - **Training Dataset:** Unknown -->
 
82
 
83
  ```
84
  SentenceTransformer(
85
+ (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: MPNetModel
86
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
87
  )
88
  ```
 
105
  model = SentenceTransformer("sentence_transformers_model_id")
106
  # Run inference
107
  sentences = [
108
+ 'Must have expertise in Angular and TypeScript',
109
+ 'React developer with JavaScript ES6+ experience',
110
+ 'Mobile app developer with no AR/VR experience',
111
  ]
112
  embeddings = model.encode(sentences)
113
  print(embeddings.shape)
 
143
  *List how the model may foreseeably be misused and address what users ought not to do with the model.*
144
  -->
145
 
146
+ ## Evaluation
147
+
148
+ ### Metrics
149
+
150
+ #### Semantic Similarity
151
+
152
+ * Dataset: `val`
153
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
154
+
155
+ | Metric | Value |
156
+ |:--------------------|:-----------|
157
+ | pearson_cosine | 0.3326 |
158
+ | **spearman_cosine** | **0.3462** |
159
+
160
  <!--
161
  ## Bias, Risks and Limitations
162
 
 
171
 
172
  ## Training Details
173
 
174
+ ### Training Dataset
175
+
176
+ #### Unnamed Dataset
177
+
178
+ * Size: 1,621 training samples
179
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
180
+ * Approximate statistics based on the first 1000 samples:
181
+ | | sentence_0 | sentence_1 | label |
182
+ |:--------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------|
183
+ | type | string | string | float |
184
+ | details | <ul><li>min: 4 tokens</li><li>mean: 8.46 tokens</li><li>max: 21 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 9.85 tokens</li><li>max: 24 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.59</li><li>max: 1.0</li></ul> |
185
+ * Samples:
186
+ | sentence_0 | sentence_1 | label |
187
+ |:----------------------------------------------------|:---------------------------------------------------------------------|:-----------------|
188
+ | <code>Authenticity in team relationships</code> | <code>Genuine connections, real person, authentic leader</code> | <code>0.9</code> |
189
+ | <code>Keyless SSL, private key security</code> | <code>HSM integration, key management</code> | <code>0.4</code> |
190
+ | <code>Need expertise in database replication</code> | <code>Set up master-slave replication with automatic failover</code> | <code>0.9</code> |
191
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
192
+ ```json
193
+ {
194
+ "scale": 20.0,
195
+ "similarity_fct": "cos_sim"
196
+ }
197
+ ```
198
+
199
+ ### Training Hyperparameters
200
+ #### Non-Default Hyperparameters
201
+
202
+ - `eval_strategy`: steps
203
+ - `per_device_train_batch_size`: 32
204
+ - `per_device_eval_batch_size`: 32
205
+ - `num_train_epochs`: 5
206
+ - `multi_dataset_batch_sampler`: round_robin
207
+
208
+ #### All Hyperparameters
209
+ <details><summary>Click to expand</summary>
210
+
211
+ - `overwrite_output_dir`: False
212
+ - `do_predict`: False
213
+ - `eval_strategy`: steps
214
+ - `prediction_loss_only`: True
215
+ - `per_device_train_batch_size`: 32
216
+ - `per_device_eval_batch_size`: 32
217
+ - `per_gpu_train_batch_size`: None
218
+ - `per_gpu_eval_batch_size`: None
219
+ - `gradient_accumulation_steps`: 1
220
+ - `eval_accumulation_steps`: None
221
+ - `torch_empty_cache_steps`: None
222
+ - `learning_rate`: 5e-05
223
+ - `weight_decay`: 0.0
224
+ - `adam_beta1`: 0.9
225
+ - `adam_beta2`: 0.999
226
+ - `adam_epsilon`: 1e-08
227
+ - `max_grad_norm`: 1
228
+ - `num_train_epochs`: 5
229
+ - `max_steps`: -1
230
+ - `lr_scheduler_type`: linear
231
+ - `lr_scheduler_kwargs`: {}
232
+ - `warmup_ratio`: 0.0
233
+ - `warmup_steps`: 0
234
+ - `log_level`: passive
235
+ - `log_level_replica`: warning
236
+ - `log_on_each_node`: True
237
+ - `logging_nan_inf_filter`: True
238
+ - `save_safetensors`: True
239
+ - `save_on_each_node`: False
240
+ - `save_only_model`: False
241
+ - `restore_callback_states_from_checkpoint`: False
242
+ - `no_cuda`: False
243
+ - `use_cpu`: False
244
+ - `use_mps_device`: False
245
+ - `seed`: 42
246
+ - `data_seed`: None
247
+ - `jit_mode_eval`: False
248
+ - `use_ipex`: False
249
+ - `bf16`: False
250
+ - `fp16`: False
251
+ - `fp16_opt_level`: O1
252
+ - `half_precision_backend`: auto
253
+ - `bf16_full_eval`: False
254
+ - `fp16_full_eval`: False
255
+ - `tf32`: None
256
+ - `local_rank`: 0
257
+ - `ddp_backend`: None
258
+ - `tpu_num_cores`: None
259
+ - `tpu_metrics_debug`: False
260
+ - `debug`: []
261
+ - `dataloader_drop_last`: False
262
+ - `dataloader_num_workers`: 0
263
+ - `dataloader_prefetch_factor`: None
264
+ - `past_index`: -1
265
+ - `disable_tqdm`: False
266
+ - `remove_unused_columns`: True
267
+ - `label_names`: None
268
+ - `load_best_model_at_end`: False
269
+ - `ignore_data_skip`: False
270
+ - `fsdp`: []
271
+ - `fsdp_min_num_params`: 0
272
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
273
+ - `tp_size`: 0
274
+ - `fsdp_transformer_layer_cls_to_wrap`: None
275
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
276
+ - `deepspeed`: None
277
+ - `label_smoothing_factor`: 0.0
278
+ - `optim`: adamw_torch
279
+ - `optim_args`: None
280
+ - `adafactor`: False
281
+ - `group_by_length`: False
282
+ - `length_column_name`: length
283
+ - `ddp_find_unused_parameters`: None
284
+ - `ddp_bucket_cap_mb`: None
285
+ - `ddp_broadcast_buffers`: False
286
+ - `dataloader_pin_memory`: True
287
+ - `dataloader_persistent_workers`: False
288
+ - `skip_memory_metrics`: True
289
+ - `use_legacy_prediction_loop`: False
290
+ - `push_to_hub`: False
291
+ - `resume_from_checkpoint`: None
292
+ - `hub_model_id`: None
293
+ - `hub_strategy`: every_save
294
+ - `hub_private_repo`: None
295
+ - `hub_always_push`: False
296
+ - `gradient_checkpointing`: False
297
+ - `gradient_checkpointing_kwargs`: None
298
+ - `include_inputs_for_metrics`: False
299
+ - `include_for_metrics`: []
300
+ - `eval_do_concat_batches`: True
301
+ - `fp16_backend`: auto
302
+ - `push_to_hub_model_id`: None
303
+ - `push_to_hub_organization`: None
304
+ - `mp_parameters`:
305
+ - `auto_find_batch_size`: False
306
+ - `full_determinism`: False
307
+ - `torchdynamo`: None
308
+ - `ray_scope`: last
309
+ - `ddp_timeout`: 1800
310
+ - `torch_compile`: False
311
+ - `torch_compile_backend`: None
312
+ - `torch_compile_mode`: None
313
+ - `include_tokens_per_second`: False
314
+ - `include_num_input_tokens_seen`: False
315
+ - `neftune_noise_alpha`: None
316
+ - `optim_target_modules`: None
317
+ - `batch_eval_metrics`: False
318
+ - `eval_on_start`: False
319
+ - `use_liger_kernel`: False
320
+ - `eval_use_gather_object`: False
321
+ - `average_tokens_across_devices`: False
322
+ - `prompts`: None
323
+ - `batch_sampler`: batch_sampler
324
+ - `multi_dataset_batch_sampler`: round_robin
325
+
326
+ </details>
327
+
328
+ ### Training Logs
329
+ | Epoch | Step | val_spearman_cosine |
330
+ |:------:|:----:|:-------------------:|
331
+ | 0.9804 | 50 | 0.3462 |
332
+
333
+
334
  ### Framework Versions
335
  - Python: 3.12.9
336
  - Sentence Transformers: 4.1.0
 
344
 
345
  ### BibTeX
346
 
347
+ #### Sentence Transformers
348
+ ```bibtex
349
+ @inproceedings{reimers-2019-sentence-bert,
350
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
351
+ author = "Reimers, Nils and Gurevych, Iryna",
352
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
353
+ month = "11",
354
+ year = "2019",
355
+ publisher = "Association for Computational Linguistics",
356
+ url = "https://arxiv.org/abs/1908.10084",
357
+ }
358
+ ```
359
+
360
+ #### MultipleNegativesRankingLoss
361
+ ```bibtex
362
+ @misc{henderson2017efficient,
363
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
364
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
365
+ year={2017},
366
+ eprint={1705.00652},
367
+ archivePrefix={arXiv},
368
+ primaryClass={cs.CL}
369
+ }
370
+ ```
371
+
372
  <!--
373
  ## Glossary
374
 
eval/similarity_evaluation_val_results.csv ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ epoch,steps,cosine_pearson,cosine_spearman
2
+ 1.0,51,0.333061348383918,0.34606382932875346
3
+ 2.0,102,0.2896842112210425,0.29871199430927403
4
+ 3.0,153,0.31861828044212254,0.32684568868246433
5
+ 4.0,204,0.298435297570077,0.3068966237124457
6
+ 5.0,255,0.28717771168468886,0.2960869240364453
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a6172635c9d5c46dc7779a4fe3e442816a1214be169ebea95d5ee46a9f4581dc
3
- size 1112197096
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f8ee34bf80e7a842dc955d3be4f15bac3990a4f92341572bfbf67713c2903c61
3
+ size 437967672