danthepol commited on
Commit
aabceb1
·
verified ·
1 Parent(s): d033aac

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,410 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:28050
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: BAAI/bge-base-en-v1.5
10
+ widget:
11
+ - source_sentence: What helps an insectivorous plant attract and digest insects?
12
+ sentences:
13
+ - This investigation examined the accuracy of several generalizable anthropometric
14
+ (ANTHRO) and bioelectrical impedance (BIA) regression equations to estimate %
15
+ body fat (%BF) in women with either upper body (UB) or lower body (LB) fat distribution
16
+ patterns.
17
+ - Bacteria can also be chemotrophs. Chemosynthetic bacteria, or chemotrophs , obtain
18
+ energy by breaking down chemical compounds in their environment. An example of
19
+ one of these chemicals broken down by bacteria is nitrogen-containing ammonia.
20
+ These bacteria are important because they help cycle nitrogen through the environment
21
+ for other living things to use. Nitrogen cannot be made by living organisms, so
22
+ it must be continually recycled. Organisms need nitrogen to make organic compounds,
23
+ such as DNA.
24
+ - Insectivorous Plants An insectivorous plant has specialized leaves to attract
25
+ and digest insects. The Venus flytrap is popularly known for its insectivorous
26
+ mode of nutrition, and has leaves that work as traps (Figure 31.16). The minerals
27
+ it obtains from prey compensate for those lacking in the boggy (low pH) soil of
28
+ its native North Carolina coastal plains. There are three sensitive hairs in the
29
+ center of each half of each leaf. The edges of each leaf are covered with long
30
+ spines. Nectar secreted by the plant attracts flies to the leaf. When a fly touches
31
+ the sensory hairs, the leaf immediately closes. Next, fluids and enzymes break
32
+ down the prey and minerals are absorbed by the leaf. Since this plant is popular
33
+ in the horticultural trade, it is threatened in its original habitat.
34
+ - source_sentence: When carbon atoms are not bonded to as many hydrogen atoms as possible,
35
+ what kind of hydrocarbon results?
36
+ sentences:
37
+ - Unsaturated hydrocarbons have at least one double or triple bond between carbon
38
+ atoms, so the carbon atoms are not bonded to as many hydrogen atoms as possible.
39
+ In other words, they are unsaturated with hydrogen atoms.
40
+ - Endoscopic radiofrequency ablation (RFA) is a promising new treatment of Barrett's
41
+ esophagus (BE). Adjunctive intra-esophageal pH control with proton pump inhibitors
42
+ and/or anti-reflux surgery is generally recommended to optimize squamous re-epithelialization
43
+ after ablation.
44
+ - The cell wall is located outside the cell membrane. It consists mainly of cellulose
45
+ and may also contain lignin, which makes it more rigid. The cell wall shapes,
46
+ supports, and protects the cell. It prevents the cell from absorbing too much
47
+ water and bursting. It also keeps large, damaging molecules out of the cell.
48
+ - source_sentence: Do comparison of ambulance dispatch protocols for nontraumatic
49
+ abdominal pain?
50
+ sentences:
51
+ - KIOM-79, a combination of four plant extracts, has a preventive effect on diabetic
52
+ nephropathy and retinopathy in diabetic animal models. In this study, we have
53
+ investigated the inhibitory effects of KIOM-79 on diabetic cataractogenesis.
54
+ - To compare rates of undertriage and overtriage of six ambulance dispatch protocols
55
+ for the presenting complaint of nontraumatic abdominal pain, and to identify the
56
+ optimal protocol.
57
+ - a flower is a source of nectar
58
+ - source_sentence: Does altered fractalkine cleavage potentially promote local inflammation
59
+ in NOD salivary gland?
60
+ sentences:
61
+ - In France, when physicians in ambulances take care of patients, they report medical
62
+ status to the dispatch centre. Then the dispatching physician search for the available
63
+ and appropriate hospital service to agree in directly receiving the patient. We
64
+ attempted to evaluate this direct admission dispatch, in a urban area, with many
65
+ health care facilities.
66
+ - Despite the high prevalence of cannabis use in schizophrenia, few studies have
67
+ examined the potential relationship between cannabis exposure and brain structural
68
+ abnormalities in schizophrenia.
69
+ - In the nonobese diabetic (NOD) mouse model of Sjögren's syndrome, lymphocytic
70
+ infiltration is preceded by an accumulation of dendritic cells in the submandibular
71
+ glands (SMGs). NOD mice also exhibit an increased frequency of mature, fractalkine
72
+ receptor (CX3C chemokine receptor [CX3CR]1) expressing monocytes, which are considered
73
+ to be precursors for tissue dendritic cells. To unravel further the role played
74
+ by fractalkine-CX3CR1 interactions in the salivary gland inflammation, we studied
75
+ the expression of fractalkine in NOD SMGs.
76
+ - source_sentence: The smallest cyclic ether is called what?
77
+ sentences:
78
+ - Most human traits have more complex modes of inheritance than simple Mendelian
79
+ inheritance. For example, the traits may be controlled by multiple alleles or
80
+ multiple genes.
81
+ - Neonatal stress impairs postnatal bone mineralization. Evidence suggests that
82
+ mechanical tactile stimulation (MTS) in early life decreases stress hormones and
83
+ improves bone mineralization. Insulin-like growth factor (IGF1) is impacted by
84
+ stress and essential to bone development. We hypothesized that MTS administered
85
+ during neonatal stress would improve bone phenotype in later life. We also predicted
86
+ an increase in bone specific mRNA expression of IGF1 related pathways.
87
+ - The smallest cyclic ether is called an epoxide. Draw its structure.
88
+ pipeline_tag: sentence-similarity
89
+ library_name: sentence-transformers
90
+ ---
91
+
92
+ # SentenceTransformer based on BAAI/bge-base-en-v1.5
93
+
94
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
95
+
96
+ ## Model Details
97
+
98
+ ### Model Description
99
+ - **Model Type:** Sentence Transformer
100
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
101
+ - **Maximum Sequence Length:** 512 tokens
102
+ - **Output Dimensionality:** 768 dimensions
103
+ - **Similarity Function:** Cosine Similarity
104
+ <!-- - **Training Dataset:** Unknown -->
105
+ <!-- - **Language:** Unknown -->
106
+ <!-- - **License:** Unknown -->
107
+
108
+ ### Model Sources
109
+
110
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
111
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
112
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
113
+
114
+ ### Full Model Architecture
115
+
116
+ ```
117
+ SentenceTransformer(
118
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
119
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
120
+ (2): Normalize()
121
+ )
122
+ ```
123
+
124
+ ## Usage
125
+
126
+ ### Direct Usage (Sentence Transformers)
127
+
128
+ First install the Sentence Transformers library:
129
+
130
+ ```bash
131
+ pip install -U sentence-transformers
132
+ ```
133
+
134
+ Then you can load this model and run inference.
135
+ ```python
136
+ from sentence_transformers import SentenceTransformer
137
+
138
+ # Download from the 🤗 Hub
139
+ model = SentenceTransformer("danthepol/mcqa_embedder_v2")
140
+ # Run inference
141
+ sentences = [
142
+ 'The smallest cyclic ether is called what?',
143
+ 'The smallest cyclic ether is called an epoxide. Draw its structure.',
144
+ 'Neonatal stress impairs postnatal bone mineralization. Evidence suggests that mechanical tactile stimulation (MTS) in early life decreases stress hormones and improves bone mineralization. Insulin-like growth factor (IGF1) is impacted by stress and essential to bone development. We hypothesized that MTS administered during neonatal stress would improve bone phenotype in later life. We also predicted an increase in bone specific mRNA expression of IGF1 related pathways.',
145
+ ]
146
+ embeddings = model.encode(sentences)
147
+ print(embeddings.shape)
148
+ # [3, 768]
149
+
150
+ # Get the similarity scores for the embeddings
151
+ similarities = model.similarity(embeddings, embeddings)
152
+ print(similarities.shape)
153
+ # [3, 3]
154
+ ```
155
+
156
+ <!--
157
+ ### Direct Usage (Transformers)
158
+
159
+ <details><summary>Click to see the direct usage in Transformers</summary>
160
+
161
+ </details>
162
+ -->
163
+
164
+ <!--
165
+ ### Downstream Usage (Sentence Transformers)
166
+
167
+ You can finetune this model on your own dataset.
168
+
169
+ <details><summary>Click to expand</summary>
170
+
171
+ </details>
172
+ -->
173
+
174
+ <!--
175
+ ### Out-of-Scope Use
176
+
177
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
178
+ -->
179
+
180
+ <!--
181
+ ## Bias, Risks and Limitations
182
+
183
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
184
+ -->
185
+
186
+ <!--
187
+ ### Recommendations
188
+
189
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
190
+ -->
191
+
192
+ ## Training Details
193
+
194
+ ### Training Dataset
195
+
196
+ #### Unnamed Dataset
197
+
198
+ * Size: 28,050 training samples
199
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
200
+ * Approximate statistics based on the first 1000 samples:
201
+ | | sentence_0 | sentence_1 |
202
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
203
+ | type | string | string |
204
+ | details | <ul><li>min: 6 tokens</li><li>mean: 23.02 tokens</li><li>max: 63 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 81.53 tokens</li><li>max: 512 tokens</li></ul> |
205
+ * Samples:
206
+ | sentence_0 | sentence_1 |
207
+ |:-------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
208
+ | <code>Ectotherms undergo a variety of changes at the cellular level to acclimatize to shifts in what?</code> | <code>There are 44 autosomes and 2 sex chromosomes in the human genome, for a total of 46 chromosomes (23 pairs). Sex chromosomes specify an organism's genetic sex. Humans can have two different sex chromosomes, one called X and the other Y. Normal females possess two X chromosomes and normal males one X and one Y. An autosome is any chromosome other than a sex chromosome. The Figure below shows a representation of the 24 different human chromosomes. Figure below shows a karyotype of the human genome. A karyotype depicts, usually in a photograph, the chromosomal complement of an individual, including the number of chromosomes and any large chromosomal abnormalities. Karyotypes use chromosomes from the metaphase stage of mitosis.</code> |
209
+ | <code>All polar compounds contain what type of bonds?</code> | <code>Polar compounds, such as water, are compounds that have a partial negative charge on one side of each molecule and a partial positive charge on the other side. All polar compounds contain polar bonds (although not all compounds that contain polar bonds are polar. ) In a polar bond, two atoms share electrons unequally. One atom attracts the shared electrons more strongly, so it has a partial negative charge. The other atom attracts the shared electrons less strongly, so it is has a partial positive charge. In a water molecule, the oxygen atom attracts the shared electrons more strongly than the hydrogen atoms do. This explains why the oxygen side of the water molecule has a partial negative charge and the hydrogen side of the molecule has a partial positive charge.</code> |
210
+ | <code>Do lateral cephalometric radiograph for the planning of maxillary implant reconstruction?</code> | <code>To present a simple and objective method for the planning of maxillary implant reconstruction with autogenous bone graft in maxilla atrophy.</code> |
211
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
212
+ ```json
213
+ {
214
+ "scale": 20.0,
215
+ "similarity_fct": "cos_sim"
216
+ }
217
+ ```
218
+
219
+ ### Training Hyperparameters
220
+ #### Non-Default Hyperparameters
221
+
222
+ - `per_device_train_batch_size`: 32
223
+ - `per_device_eval_batch_size`: 32
224
+ - `multi_dataset_batch_sampler`: round_robin
225
+
226
+ #### All Hyperparameters
227
+ <details><summary>Click to expand</summary>
228
+
229
+ - `overwrite_output_dir`: False
230
+ - `do_predict`: False
231
+ - `eval_strategy`: no
232
+ - `prediction_loss_only`: True
233
+ - `per_device_train_batch_size`: 32
234
+ - `per_device_eval_batch_size`: 32
235
+ - `per_gpu_train_batch_size`: None
236
+ - `per_gpu_eval_batch_size`: None
237
+ - `gradient_accumulation_steps`: 1
238
+ - `eval_accumulation_steps`: None
239
+ - `torch_empty_cache_steps`: None
240
+ - `learning_rate`: 5e-05
241
+ - `weight_decay`: 0.0
242
+ - `adam_beta1`: 0.9
243
+ - `adam_beta2`: 0.999
244
+ - `adam_epsilon`: 1e-08
245
+ - `max_grad_norm`: 1
246
+ - `num_train_epochs`: 3
247
+ - `max_steps`: -1
248
+ - `lr_scheduler_type`: linear
249
+ - `lr_scheduler_kwargs`: {}
250
+ - `warmup_ratio`: 0.0
251
+ - `warmup_steps`: 0
252
+ - `log_level`: passive
253
+ - `log_level_replica`: warning
254
+ - `log_on_each_node`: True
255
+ - `logging_nan_inf_filter`: True
256
+ - `save_safetensors`: True
257
+ - `save_on_each_node`: False
258
+ - `save_only_model`: False
259
+ - `restore_callback_states_from_checkpoint`: False
260
+ - `no_cuda`: False
261
+ - `use_cpu`: False
262
+ - `use_mps_device`: False
263
+ - `seed`: 42
264
+ - `data_seed`: None
265
+ - `jit_mode_eval`: False
266
+ - `use_ipex`: False
267
+ - `bf16`: False
268
+ - `fp16`: False
269
+ - `fp16_opt_level`: O1
270
+ - `half_precision_backend`: auto
271
+ - `bf16_full_eval`: False
272
+ - `fp16_full_eval`: False
273
+ - `tf32`: None
274
+ - `local_rank`: 0
275
+ - `ddp_backend`: None
276
+ - `tpu_num_cores`: None
277
+ - `tpu_metrics_debug`: False
278
+ - `debug`: []
279
+ - `dataloader_drop_last`: False
280
+ - `dataloader_num_workers`: 0
281
+ - `dataloader_prefetch_factor`: None
282
+ - `past_index`: -1
283
+ - `disable_tqdm`: False
284
+ - `remove_unused_columns`: True
285
+ - `label_names`: None
286
+ - `load_best_model_at_end`: False
287
+ - `ignore_data_skip`: False
288
+ - `fsdp`: []
289
+ - `fsdp_min_num_params`: 0
290
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
291
+ - `tp_size`: 0
292
+ - `fsdp_transformer_layer_cls_to_wrap`: None
293
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
294
+ - `deepspeed`: None
295
+ - `label_smoothing_factor`: 0.0
296
+ - `optim`: adamw_torch
297
+ - `optim_args`: None
298
+ - `adafactor`: False
299
+ - `group_by_length`: False
300
+ - `length_column_name`: length
301
+ - `ddp_find_unused_parameters`: None
302
+ - `ddp_bucket_cap_mb`: None
303
+ - `ddp_broadcast_buffers`: False
304
+ - `dataloader_pin_memory`: True
305
+ - `dataloader_persistent_workers`: False
306
+ - `skip_memory_metrics`: True
307
+ - `use_legacy_prediction_loop`: False
308
+ - `push_to_hub`: False
309
+ - `resume_from_checkpoint`: None
310
+ - `hub_model_id`: None
311
+ - `hub_strategy`: every_save
312
+ - `hub_private_repo`: None
313
+ - `hub_always_push`: False
314
+ - `gradient_checkpointing`: False
315
+ - `gradient_checkpointing_kwargs`: None
316
+ - `include_inputs_for_metrics`: False
317
+ - `include_for_metrics`: []
318
+ - `eval_do_concat_batches`: True
319
+ - `fp16_backend`: auto
320
+ - `push_to_hub_model_id`: None
321
+ - `push_to_hub_organization`: None
322
+ - `mp_parameters`:
323
+ - `auto_find_batch_size`: False
324
+ - `full_determinism`: False
325
+ - `torchdynamo`: None
326
+ - `ray_scope`: last
327
+ - `ddp_timeout`: 1800
328
+ - `torch_compile`: False
329
+ - `torch_compile_backend`: None
330
+ - `torch_compile_mode`: None
331
+ - `include_tokens_per_second`: False
332
+ - `include_num_input_tokens_seen`: False
333
+ - `neftune_noise_alpha`: None
334
+ - `optim_target_modules`: None
335
+ - `batch_eval_metrics`: False
336
+ - `eval_on_start`: False
337
+ - `use_liger_kernel`: False
338
+ - `eval_use_gather_object`: False
339
+ - `average_tokens_across_devices`: False
340
+ - `prompts`: None
341
+ - `batch_sampler`: batch_sampler
342
+ - `multi_dataset_batch_sampler`: round_robin
343
+
344
+ </details>
345
+
346
+ ### Training Logs
347
+ | Epoch | Step | Training Loss |
348
+ |:------:|:----:|:-------------:|
349
+ | 0.5701 | 500 | 0.064 |
350
+ | 1.1403 | 1000 | 0.0455 |
351
+ | 1.7104 | 1500 | 0.0254 |
352
+ | 2.2805 | 2000 | 0.0189 |
353
+ | 2.8506 | 2500 | 0.0155 |
354
+
355
+
356
+ ### Framework Versions
357
+ - Python: 3.12.8
358
+ - Sentence Transformers: 3.4.1
359
+ - Transformers: 4.51.3
360
+ - PyTorch: 2.3.0+cu121
361
+ - Accelerate: 1.3.0
362
+ - Datasets: 3.6.0
363
+ - Tokenizers: 0.21.1
364
+
365
+ ## Citation
366
+
367
+ ### BibTeX
368
+
369
+ #### Sentence Transformers
370
+ ```bibtex
371
+ @inproceedings{reimers-2019-sentence-bert,
372
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
373
+ author = "Reimers, Nils and Gurevych, Iryna",
374
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
375
+ month = "11",
376
+ year = "2019",
377
+ publisher = "Association for Computational Linguistics",
378
+ url = "https://arxiv.org/abs/1908.10084",
379
+ }
380
+ ```
381
+
382
+ #### MultipleNegativesRankingLoss
383
+ ```bibtex
384
+ @misc{henderson2017efficient,
385
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
386
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
387
+ year={2017},
388
+ eprint={1705.00652},
389
+ archivePrefix={arXiv},
390
+ primaryClass={cs.CL}
391
+ }
392
+ ```
393
+
394
+ <!--
395
+ ## Glossary
396
+
397
+ *Clearly define terms in order to be accessible across audiences.*
398
+ -->
399
+
400
+ <!--
401
+ ## Model Card Authors
402
+
403
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
404
+ -->
405
+
406
+ <!--
407
+ ## Model Card Contact
408
+
409
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
410
+ -->
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.51.3",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522
31
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.51.3",
5
+ "pytorch": "2.3.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7b90c40c2e43a3c54129cc9371442f91f8f82f5ad9d8a0464f66ac8c9c94d695
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "max_length": 512,
51
+ "model_max_length": 512,
52
+ "never_split": null,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "[PAD]",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "[SEP]",
58
+ "stride": 0,
59
+ "strip_accents": null,
60
+ "tokenize_chinese_chars": true,
61
+ "tokenizer_class": "BertTokenizer",
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "[UNK]"
65
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff