Yash911 commited on
Commit
4b02610
·
1 Parent(s): 3cc49ba

Upload folder using huggingface_hub

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md CHANGED
@@ -1,3 +1,470 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:50000
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: intfloat/e5-large-v2
10
+ widget:
11
+ - source_sentence: AVS Video Editor AVS Video Editor is a video editing software published
12
+ by Online Media Technologies Ltd. It is a part of AVS4YOU software suite which
13
+ includes video, audio, image editing and conversion, disc editing and burning,
14
+ document conversion and registry cleaner programs. It offers the opportunity to
15
+ create and edit videos with a vast variety of video and audio effects, text and
16
+ transitions; capture video from screen, web or DV cameras and VHS tape; record
17
+ voice; create menus for discs, as well as to save them to plenty of video file
18
+ formats, burn to discs or publish on Facebook, YouTube, Flickr, etc. 0
19
+ sentences:
20
+ - Adobe Premiere Rush is a video editing software that allows users to create and
21
+ edit videos for social media platforms such as YouTube, Instagram, and Facebook.
22
+ It is designed for quick and easy editing and can be accessed across multiple
23
+ devices. The software provides basic editing tools like trimming, cutting, transitions,
24
+ animations, and sound optimization. Its user-friendly interface makes it suitable
25
+ for beginners and requires minimal experience with video editing.
26
+ - H.245 is a protocol used for control signaling in videoconferencing systems. It
27
+ is responsible for negotiating the capabilities and settings of the system during
28
+ a call, such as video quality, audio codecs, and call setup. Understanding H.245
29
+ requires specialized knowledge of videoconferencing technologies and protocols.
30
+ - Coeliac disease is a long-term autoimmune disorder, primarily affecting the small
31
+ intestine, where individuals develop intolerance to gluten, present in foods such
32
+ as wheat, rye and barley. Classic symptoms include gastrointestinal problems such
33
+ as chronic diarrhoea, abdominal distention, malabsorption, loss of appetite, and
34
+ among children failure to grow normally. This often begins between six months
35
+ and two years of age. Non-classic symptoms are more common, especially in people
36
+ older than two years. There may be mild or absent gastrointestinal symptoms, a
37
+ wide number of symptoms involving any part of the body, or no obvious symptoms.
38
+ Coeliac disease was first described in childhood; however, it may develop at any
39
+ age. It is associated with other autoimmune diseases, such as Type 1 diabetes
40
+ mellitus and Hashimoto's thyroiditis, among others.
41
+ - source_sentence: AWS App Mesh AWS App Mesh is a managed service that provides application-level
42
+ networking for microservices deployed on AWS or on-premises environments. It allows
43
+ for centrally managed traffic routing, service discovery, and observability across
44
+ multiple microservices. This software can help simplify the complexity of microservices
45
+ architectures and improve application resiliency, scalability, and performance.
46
+ 0
47
+ sentences:
48
+ - A protocol analyzer is a tool used to capture and analyze signals and data traffic
49
+ over a communication channel. Such a channel varies from a local computer bus
50
+ to a satellite link, that provides a means of communication using a standard communication
51
+ protocol. Each type of communication protocol has a different tool to collect
52
+ and analyze signals and data.
53
+ - AWS App Mesh is a managed service that provides application-level networking for
54
+ microservices deployed on AWS or on-premises environments. It allows for centrally
55
+ managed traffic routing, service discovery, and observability across multiple
56
+ microservices. This software can help simplify the complexity of microservices
57
+ architectures and improve application resiliency, scalability, and performance.
58
+ - French is a Romance language of the Indo-European family. It descended from the
59
+ Vulgar Latin of the Roman Empire, as did all Romance languages. French evolved
60
+ from Gallo-Romance, the Latin spoken in Gaul, and more specifically in Northern
61
+ Gaul. Its closest relatives are the other langues d'oïl—languages historically
62
+ spoken in northern France and in southern Belgium, which French (Francien) largely
63
+ supplanted. French was also influenced by native Celtic languages of Northern
64
+ Roman Gaul like Gallia Belgica and by the (Germanic) Frankish language of the
65
+ post-Roman Frankish invaders. Today, owing to France's past overseas expansion,
66
+ there are numerous French-based creole languages, most notably Haitian Creole.
67
+ A French-speaking person or nation may be referred to as Francophone in both English
68
+ and French.
69
+ - source_sentence: AVS Video Editor AVS Video Editor is a video editing software published
70
+ by Online Media Technologies Ltd. It is a part of AVS4YOU software suite which
71
+ includes video, audio, image editing and conversion, disc editing and burning,
72
+ document conversion and registry cleaner programs. It offers the opportunity to
73
+ create and edit videos with a vast variety of video and audio effects, text and
74
+ transitions; capture video from screen, web or DV cameras and VHS tape; record
75
+ voice; create menus for discs, as well as to save them to plenty of video file
76
+ formats, burn to discs or publish on Facebook, YouTube, Flickr, etc. 0
77
+ sentences:
78
+ - Neuropsychiatry or Organic Psychiatry is a branch of medicine that deals with
79
+ mental disorders attributable to diseases of the nervous system. It preceded the
80
+ current disciplines of psychiatry and neurology, which had common training, however,
81
+ psychiatry and neurology have subsequently split apart and are typically practiced
82
+ separately. Nevertheless, neuropsychiatry has become a growing subspecialty of
83
+ psychiatry and it is also closely related to the fields of neuropsychology and
84
+ behavioral neurology.
85
+ - 'Capital program management software (CPMS) refers to the systems that are currently
86
+ available that help building owner/operators, program managers, and construction
87
+ managers, control and manage the vast amount of information that capital construction
88
+ projects create. A collection, or portfolio of projects only makes this a bigger
89
+ challenge. These systems go by different names: capital project management software,
90
+ construction management software, project management information systems.'
91
+ - Video editing is the manipulation and arrangement of video shots. Video editing
92
+ is used to structure and present all video information, including films and television
93
+ shows, video advertisements and video essays. Video editing has been dramatically
94
+ democratized in recent years by editing software available for personal computers.
95
+ Editing video can be difficult and tedious, so several technologies have been
96
+ produced to aid people in this task. Pen based video editing software was developed
97
+ in order to give people a more intuitive and fast way to edit video.
98
+ - source_sentence: AVEVA Plant SCADA Citect is now a group of industrial software
99
+ products sold by Aveva, but started as a software development company specialising
100
+ in the Automation and Control industry. The main software products developed by
101
+ Citect included CitectSCADA, CitectSCADA Reports, and Ampla. 0
102
+ sentences:
103
+ - A Bluetooth stack is software that refers to an implementation of the Bluetooth
104
+ protocol stack.
105
+ - Semikhah traditionally refers to the ordination of a rabbi within Judaism.
106
+ - Automation Studio is a circuit design, simulation and project documentation software
107
+ for fluid power systems and electrical projects conceived by Famic Technologies
108
+ Inc.. It is used for CAD, maintenance, and training purposes. Mainly used by engineers,
109
+ trainers, and service and maintenance personnel. Automation Studio can be applied
110
+ in the design, training and troubleshooting of hydraulics, pneumatics, HMI, and
111
+ electrical control systems.
112
+ - source_sentence: AWS User Pools AWS User Pools is a fully managed user directory
113
+ service that allows application developers to easily add registration and login
114
+ functionality to their apps. It provides features such as multi-factor authentication,
115
+ password policies, social sign-in, and customizable email templates. AWS User
116
+ Pools allows developers to focus on building their applications while providing
117
+ secure and scalable user authentication and authorization. 0
118
+ sentences:
119
+ - AVS Video Editor is a video editing software published by Online Media Technologies
120
+ Ltd. It is a part of AVS4YOU software suite which includes video, audio, image
121
+ editing and conversion, disc editing and burning, document conversion and registry
122
+ cleaner programs. It offers the opportunity to create and edit videos with a vast
123
+ variety of video and audio effects, text and transitions; capture video from screen,
124
+ web or DV cameras and VHS tape; record voice; create menus for discs, as well
125
+ as to save them to plenty of video file formats, burn to discs or publish on Facebook,
126
+ YouTube, Flickr, etc.
127
+ - Oracle Waveset is an identity management system developed by Oracle Corporation.
128
+ It provides a centralized platform for managing user identities and access to
129
+ resources within an organization. It helps to streamline and automate the process
130
+ of user provisioning, de-provisioning, and managing access privileges. Oracle
131
+ Waveset also supports password management, authentication, and integration with
132
+ various authentication systems, such as LDAP and Active Directory. It is commonly
133
+ used in large-scale enterprises and organizations that require strict access control
134
+ and compliance with regulatory requirements.
135
+ - Gastritis is inflammation of the lining of the stomach. It may occur as a short
136
+ episode or may be of a long duration. There may be no symptoms but, when symptoms
137
+ are present, the most common is upper abdominal pain. Other possible symptoms
138
+ include nausea and vomiting, bloating, loss of appetite and heartburn. Complications
139
+ may include stomach bleeding, stomach ulcers, and stomach tumors. When due to
140
+ autoimmune problems, low red blood cells due to not enough vitamin B12 may occur,
141
+ a condition known as pernicious anemia.
142
+ pipeline_tag: sentence-similarity
143
+ library_name: sentence-transformers
144
+ ---
145
+
146
+ # SentenceTransformer based on intfloat/e5-large-v2
147
+
148
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/e5-large-v2](https://huggingface.co/intfloat/e5-large-v2). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
149
+
150
+ ## Model Details
151
+
152
+ ### Model Description
153
+ - **Model Type:** Sentence Transformer
154
+ - **Base model:** [intfloat/e5-large-v2](https://huggingface.co/intfloat/e5-large-v2) <!-- at revision f169b11e22de13617baa190a028a32f3493550b6 -->
155
+ - **Maximum Sequence Length:** 512 tokens
156
+ - **Output Dimensionality:** 1024 dimensions
157
+ - **Similarity Function:** Cosine Similarity
158
+ <!-- - **Training Dataset:** Unknown -->
159
+ <!-- - **Language:** Unknown -->
160
+ <!-- - **License:** Unknown -->
161
+
162
+ ### Model Sources
163
+
164
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
165
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
166
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
167
+
168
+ ### Full Model Architecture
169
+
170
+ ```
171
+ SentenceTransformer(
172
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
173
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
174
+ (2): Normalize()
175
+ )
176
+ ```
177
+
178
+ ## Usage
179
+
180
+ ### Direct Usage (Sentence Transformers)
181
+
182
+ First install the Sentence Transformers library:
183
+
184
+ ```bash
185
+ pip install -U sentence-transformers
186
+ ```
187
+
188
+ Then you can load this model and run inference.
189
+ ```python
190
+ from sentence_transformers import SentenceTransformer
191
+
192
+ # Download from the 🤗 Hub
193
+ model = SentenceTransformer("sentence_transformers_model_id")
194
+ # Run inference
195
+ sentences = [
196
+ 'AWS User Pools AWS User Pools is a fully managed user directory service that allows application developers to easily add registration and login functionality to their apps. It provides features such as multi-factor authentication, password policies, social sign-in, and customizable email templates. AWS User Pools allows developers to focus on building their applications while providing secure and scalable user authentication and authorization. 0',
197
+ 'Oracle Waveset is an identity management system developed by Oracle Corporation. It provides a centralized platform for managing user identities and access to resources within an organization. It helps to streamline and automate the process of user provisioning, de-provisioning, and managing access privileges. Oracle Waveset also supports password management, authentication, and integration with various authentication systems, such as LDAP and Active Directory. It is commonly used in large-scale enterprises and organizations that require strict access control and compliance with regulatory requirements.',
198
+ 'Gastritis is inflammation of the lining of the stomach. It may occur as a short episode or may be of a long duration. There may be no symptoms but, when symptoms are present, the most common is upper abdominal pain. Other possible symptoms include nausea and vomiting, bloating, loss of appetite and heartburn. Complications may include stomach bleeding, stomach ulcers, and stomach tumors. When due to autoimmune problems, low red blood cells due to not enough vitamin B12 may occur, a condition known as pernicious anemia.',
199
+ ]
200
+ embeddings = model.encode(sentences)
201
+ print(embeddings.shape)
202
+ # [3, 1024]
203
+
204
+ # Get the similarity scores for the embeddings
205
+ similarities = model.similarity(embeddings, embeddings)
206
+ print(similarities.shape)
207
+ # [3, 3]
208
+ ```
209
+
210
+ <!--
211
+ ### Direct Usage (Transformers)
212
+
213
+ <details><summary>Click to see the direct usage in Transformers</summary>
214
+
215
+ </details>
216
+ -->
217
+
218
+ <!--
219
+ ### Downstream Usage (Sentence Transformers)
220
+
221
+ You can finetune this model on your own dataset.
222
+
223
+ <details><summary>Click to expand</summary>
224
+
225
+ </details>
226
+ -->
227
+
228
+ <!--
229
+ ### Out-of-Scope Use
230
+
231
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
232
+ -->
233
+
234
+ <!--
235
+ ## Bias, Risks and Limitations
236
+
237
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
238
+ -->
239
+
240
+ <!--
241
+ ### Recommendations
242
+
243
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
244
+ -->
245
+
246
+ ## Training Details
247
+
248
+ ### Training Dataset
249
+
250
+ #### Unnamed Dataset
251
+
252
+ * Size: 50,000 training samples
253
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, <code>sentence_2</code>, and <code>label</code>
254
+ * Approximate statistics based on the first 1000 samples:
255
+ | | sentence_0 | sentence_1 | sentence_2 | label |
256
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------|
257
+ | type | string | string | string | int |
258
+ | details | <ul><li>min: 36 tokens</li><li>mean: 90.1 tokens</li><li>max: 145 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 81.16 tokens</li><li>max: 202 tokens</li></ul> | <ul><li>min: 16 tokens</li><li>mean: 83.93 tokens</li><li>max: 214 tokens</li></ul> | <ul><li>1: 100.00%</li></ul> |
259
+ * Samples:
260
+ | sentence_0 | sentence_1 | sentence_2 | label |
261
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
262
+ | <code>Burroughs MCP The MCP is the proprietary operating system of the Burroughs small, medium and large systems, including the Unisys Clearpath/MCP systems. 0</code> | <code>The MCP is the proprietary operating system of the Burroughs small, medium and large systems, including the Unisys Clearpath/MCP systems.</code> | <code>Yellow fever is a viral disease of typically short duration. In most cases, symptoms include fever, chills, loss of appetite, nausea, muscle pains particularly in the back, and headaches. Symptoms typically improve within five days. In about 15% of people, within a day of improving the fever comes back, abdominal pain occurs, and liver damage begins causing yellow skin. If this occurs, the risk of bleeding and kidney problems is increased.</code> | <code>1</code> |
263
+ | <code>Burroughs MCP The MCP is the proprietary operating system of the Burroughs small, medium and large systems, including the Unisys Clearpath/MCP systems. 0</code> | <code>The MCP is the proprietary operating system of the Burroughs small, medium and large systems, including the Unisys Clearpath/MCP systems.</code> | <code>A wax sculpture is a depiction made using a waxy substance. Often these are effigies, usually of a notable individual, but there are also death masks and scenes with many figures, mostly in relief.</code> | <code>1</code> |
264
+ | <code>Burroughs MCP The MCP is the proprietary operating system of the Burroughs small, medium and large systems, including the Unisys Clearpath/MCP systems. 0</code> | <code>The MCP is the proprietary operating system of the Burroughs small, medium and large systems, including the Unisys Clearpath/MCP systems.</code> | <code>Hyperemesis gravidarum (HG) is a pregnancy complication that is characterized by severe nausea, vomiting, weight loss, and possibly dehydration. Feeling faint may also occur. It is considered more severe than morning sickness. Symptoms often get better after the 20th week of pregnancy but may last the entire pregnancy duration.</code> | <code>1</code> |
265
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
266
+ ```json
267
+ {
268
+ "scale": 20.0,
269
+ "similarity_fct": "cos_sim"
270
+ }
271
+ ```
272
+
273
+ ### Training Hyperparameters
274
+ #### Non-Default Hyperparameters
275
+
276
+ - `num_train_epochs`: 1
277
+ - `fp16`: True
278
+ - `multi_dataset_batch_sampler`: round_robin
279
+
280
+ #### All Hyperparameters
281
+ <details><summary>Click to expand</summary>
282
+
283
+ - `overwrite_output_dir`: False
284
+ - `do_predict`: False
285
+ - `eval_strategy`: no
286
+ - `prediction_loss_only`: True
287
+ - `per_device_train_batch_size`: 8
288
+ - `per_device_eval_batch_size`: 8
289
+ - `per_gpu_train_batch_size`: None
290
+ - `per_gpu_eval_batch_size`: None
291
+ - `gradient_accumulation_steps`: 1
292
+ - `eval_accumulation_steps`: None
293
+ - `torch_empty_cache_steps`: None
294
+ - `learning_rate`: 5e-05
295
+ - `weight_decay`: 0.0
296
+ - `adam_beta1`: 0.9
297
+ - `adam_beta2`: 0.999
298
+ - `adam_epsilon`: 1e-08
299
+ - `max_grad_norm`: 1
300
+ - `num_train_epochs`: 1
301
+ - `max_steps`: -1
302
+ - `lr_scheduler_type`: linear
303
+ - `lr_scheduler_kwargs`: {}
304
+ - `warmup_ratio`: 0.0
305
+ - `warmup_steps`: 0
306
+ - `log_level`: passive
307
+ - `log_level_replica`: warning
308
+ - `log_on_each_node`: True
309
+ - `logging_nan_inf_filter`: True
310
+ - `save_safetensors`: True
311
+ - `save_on_each_node`: False
312
+ - `save_only_model`: False
313
+ - `restore_callback_states_from_checkpoint`: False
314
+ - `no_cuda`: False
315
+ - `use_cpu`: False
316
+ - `use_mps_device`: False
317
+ - `seed`: 42
318
+ - `data_seed`: None
319
+ - `jit_mode_eval`: False
320
+ - `use_ipex`: False
321
+ - `bf16`: False
322
+ - `fp16`: True
323
+ - `fp16_opt_level`: O1
324
+ - `half_precision_backend`: auto
325
+ - `bf16_full_eval`: False
326
+ - `fp16_full_eval`: False
327
+ - `tf32`: None
328
+ - `local_rank`: 0
329
+ - `ddp_backend`: None
330
+ - `tpu_num_cores`: None
331
+ - `tpu_metrics_debug`: False
332
+ - `debug`: []
333
+ - `dataloader_drop_last`: False
334
+ - `dataloader_num_workers`: 0
335
+ - `dataloader_prefetch_factor`: None
336
+ - `past_index`: -1
337
+ - `disable_tqdm`: False
338
+ - `remove_unused_columns`: True
339
+ - `label_names`: None
340
+ - `load_best_model_at_end`: False
341
+ - `ignore_data_skip`: False
342
+ - `fsdp`: []
343
+ - `fsdp_min_num_params`: 0
344
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
345
+ - `fsdp_transformer_layer_cls_to_wrap`: None
346
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
347
+ - `deepspeed`: None
348
+ - `label_smoothing_factor`: 0.0
349
+ - `optim`: adamw_torch
350
+ - `optim_args`: None
351
+ - `adafactor`: False
352
+ - `group_by_length`: False
353
+ - `length_column_name`: length
354
+ - `ddp_find_unused_parameters`: None
355
+ - `ddp_bucket_cap_mb`: None
356
+ - `ddp_broadcast_buffers`: False
357
+ - `dataloader_pin_memory`: True
358
+ - `dataloader_persistent_workers`: False
359
+ - `skip_memory_metrics`: True
360
+ - `use_legacy_prediction_loop`: False
361
+ - `push_to_hub`: False
362
+ - `resume_from_checkpoint`: None
363
+ - `hub_model_id`: None
364
+ - `hub_strategy`: every_save
365
+ - `hub_private_repo`: None
366
+ - `hub_always_push`: False
367
+ - `gradient_checkpointing`: False
368
+ - `gradient_checkpointing_kwargs`: None
369
+ - `include_inputs_for_metrics`: False
370
+ - `include_for_metrics`: []
371
+ - `eval_do_concat_batches`: True
372
+ - `fp16_backend`: auto
373
+ - `push_to_hub_model_id`: None
374
+ - `push_to_hub_organization`: None
375
+ - `mp_parameters`:
376
+ - `auto_find_batch_size`: False
377
+ - `full_determinism`: False
378
+ - `torchdynamo`: None
379
+ - `ray_scope`: last
380
+ - `ddp_timeout`: 1800
381
+ - `torch_compile`: False
382
+ - `torch_compile_backend`: None
383
+ - `torch_compile_mode`: None
384
+ - `include_tokens_per_second`: False
385
+ - `include_num_input_tokens_seen`: False
386
+ - `neftune_noise_alpha`: None
387
+ - `optim_target_modules`: None
388
+ - `batch_eval_metrics`: False
389
+ - `eval_on_start`: False
390
+ - `use_liger_kernel`: False
391
+ - `eval_use_gather_object`: False
392
+ - `average_tokens_across_devices`: False
393
+ - `prompts`: None
394
+ - `batch_sampler`: batch_sampler
395
+ - `multi_dataset_batch_sampler`: round_robin
396
+
397
+ </details>
398
+
399
+ ### Training Logs
400
+ | Epoch | Step | Training Loss |
401
+ |:-----:|:----:|:-------------:|
402
+ | 0.08 | 500 | 0.3751 |
403
+ | 0.16 | 1000 | 0.1414 |
404
+ | 0.24 | 1500 | 0.1219 |
405
+ | 0.32 | 2000 | 0.0979 |
406
+ | 0.4 | 2500 | 0.083 |
407
+ | 0.48 | 3000 | 0.067 |
408
+ | 0.56 | 3500 | 0.0645 |
409
+ | 0.64 | 4000 | 0.0578 |
410
+ | 0.72 | 4500 | 0.0454 |
411
+ | 0.8 | 5000 | 0.0404 |
412
+ | 0.88 | 5500 | 0.0419 |
413
+ | 0.96 | 6000 | 0.0402 |
414
+
415
+
416
+ ### Framework Versions
417
+ - Python: 3.11.13
418
+ - Sentence Transformers: 4.1.0
419
+ - Transformers: 4.52.4
420
+ - PyTorch: 2.6.0+cu124
421
+ - Accelerate: 1.8.1
422
+ - Datasets: 3.6.0
423
+ - Tokenizers: 0.21.2
424
+
425
+ ## Citation
426
+
427
+ ### BibTeX
428
+
429
+ #### Sentence Transformers
430
+ ```bibtex
431
+ @inproceedings{reimers-2019-sentence-bert,
432
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
433
+ author = "Reimers, Nils and Gurevych, Iryna",
434
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
435
+ month = "11",
436
+ year = "2019",
437
+ publisher = "Association for Computational Linguistics",
438
+ url = "https://arxiv.org/abs/1908.10084",
439
+ }
440
+ ```
441
+
442
+ #### MultipleNegativesRankingLoss
443
+ ```bibtex
444
+ @misc{henderson2017efficient,
445
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
446
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
447
+ year={2017},
448
+ eprint={1705.00652},
449
+ archivePrefix={arXiv},
450
+ primaryClass={cs.CL}
451
+ }
452
+ ```
453
+
454
+ <!--
455
+ ## Glossary
456
+
457
+ *Clearly define terms in order to be accessible across audiences.*
458
+ -->
459
+
460
+ <!--
461
+ ## Model Card Authors
462
+
463
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
464
+ -->
465
+
466
+ <!--
467
+ ## Model Card Contact
468
+
469
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
470
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 1024,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 4096,
12
+ "layer_norm_eps": 1e-12,
13
+ "max_position_embeddings": 512,
14
+ "model_type": "bert",
15
+ "num_attention_heads": 16,
16
+ "num_hidden_layers": 24,
17
+ "pad_token_id": 0,
18
+ "position_embedding_type": "absolute",
19
+ "torch_dtype": "float32",
20
+ "transformers_version": "4.52.4",
21
+ "type_vocab_size": 2,
22
+ "use_cache": true,
23
+ "vocab_size": 30522
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.1.0",
4
+ "transformers": "4.52.4",
5
+ "pytorch": "2.6.0+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "BertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff