dpshade22 commited on
Commit
ffd4367
·
verified ·
1 Parent(s): 5019a32

Upload e5-base-john embedding model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,389 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:2633
9
+ - loss:CosineSimilarityLoss
10
+ base_model: intfloat/e5-base-v2
11
+ widget:
12
+ - source_sentence: Many therefore of his disciples, when they had heard this, said,
13
+ This is an hard saying; who can hear it?
14
+ sentences:
15
+ - If ye keep my commandments, ye shall abide in my love; even as I have kept my
16
+ Father's commandments, and abide in his love.
17
+ - When Jesus knew in himself that his disciples murmured at it, he said unto them,
18
+ Doth this offend you?
19
+ - He said, I am the voice of one crying in the wilderness, Make straight the way
20
+ of the Lord, as said the prophet Esaias.
21
+ - source_sentence: 'Jesus and Nicodemus | participants: jesus_905, nicodemus_2204'
22
+ sentences:
23
+ - 'And as Moses lifted up the serpent in the wilderness, even so must the Son of
24
+ man be lifted up:'
25
+ - Then when Mary was come where Jesus was, and saw him, she fell down at his feet,
26
+ saying unto him, Lord, if thou hadst been here, my brother had not died.
27
+ - They answered him, Jesus of Nazareth. Jesus saith unto them, I am he. And Judas
28
+ also, which betrayed him, stood with them.
29
+ - source_sentence: 'For he whom God hath sent speaketh the words of God: for God giveth
30
+ not the Spirit by measure unto him.'
31
+ sentences:
32
+ - Then said Jesus unto the twelve, Will ye also go away?
33
+ - The Father loveth the Son, and hath given all things into his hand.
34
+ - 'Why askest thou me? ask them which heard me, what I have said unto them: behold,
35
+ they know what I said.'
36
+ - source_sentence: 'Lazarus Raised form the Dead | participants: jesus_905, mary_1939,
37
+ lazarus_1812'
38
+ sentences:
39
+ - But he saith unto them, It is I; be not afraid.
40
+ - But some of them went their ways to the Pharisees, and told them what things Jesus
41
+ had done.
42
+ - Jesus answered and said unto them, Destroy this temple, and in three days I will
43
+ raise it up.
44
+ - source_sentence: 'God: (A.S. and Dutch God; Dan. Gud; Ger. Gott), the name of the
45
+ Divine Being. It is the rendering (1) of the Hebrew <i> ''El</i> , from a word
46
+ meaning to be strong; (2) of <i> ''Eloah_, plural _''Elohim</i> . The singular
47
+ form, <i> Eloah</i> , is used only in poetry. The plural form is more commonly
48
+ used in all parts of the Bible, The Hebrew word Jehovah (q.v.), the only other
49
+ word generally employed to denote the Supreme Being, is uniformly rendered in
50
+ the Authorized Version by "LORD," printed in small capitals. The existence of
51
+ God is taken for granted in the Bible. There is nowhere any argument to prove
52
+ it. He who disbelieves this truth is spoken of as one devoid of understanding
53
+ ( Psalms 14:1 ). The arguments generally adduced by theologians in proof
54
+ of the being of God are: <li> The a priori argument, which is the testimony
55
+ afforded by reason. <li> The a posteriori argument, by which we proceed logically
56
+ from the facts of experience to causes. These arguments are, (a) The cosmological,
57
+ by which it is proved that there must be a First Cause of all things, for every
58
+ effect must have a cause. (b) The teleological, or the argument from design.
59
+ We see everywhere the operations of an intelligent Cause in nature. (c) The
60
+ moral argument, called also the anthropological argument, based on the moral consciousness
61
+ and the history of mankind, which exhibits a moral order and purpose which can
62
+ only be explained on the supposition of the existence of God. Conscience and human
63
+ history testify that "verily there is a God that judgeth in the earth." The
64
+ attributes of God are set forth in order by Moses in Exodus 34:6 Exodus 34:7 .
65
+ (see also Deuteronomy 6:4 ; 10:17 ; Numbers 16:22 ; Exodus 15:11 ; 33:19 ; Isaiah
66
+ 44:6 ; Habakkuk 3:6 ; Psalms 102:26 ; Job 34:12 .) They are also systematically
67
+ classified in Revelation 5:12 and 7:12 . God''s attributes are spoken
68
+ of by some as absolute, i.e., such as belong to his essence as Jehovah, Jah, etc.;
69
+ and relative, i.e., such as are ascribed to him with relation to his creatures.
70
+ Others distinguish them into communicable, i.e., those which can be imparted in
71
+ degree to his creatures: goodness, holiness, wisdom, etc.; and incommunicable,
72
+ which cannot be so imparted: independence, immutability, immensity, and eternity.
73
+ They are by some also divided into natural attributes, eternity, immensity, etc.;
74
+ and moral, holiness, goodness, etc.'
75
+ sentences:
76
+ - As he spake these words, many believed on him.
77
+ - 'Jesus said unto them, If God were your Father, ye would love me: for I proceeded
78
+ forth and came from God; neither came I of myself, but he sent me.'
79
+ - 'Jesus answered them, I told you, and ye believed not: the works that I do in
80
+ my Father''s name, they bear witness of me.'
81
+ pipeline_tag: sentence-similarity
82
+ library_name: sentence-transformers
83
+ ---
84
+
85
+ # SentenceTransformer based on intfloat/e5-base-v2
86
+
87
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
88
+
89
+ ## Model Details
90
+
91
+ ### Model Description
92
+ - **Model Type:** Sentence Transformer
93
+ - **Base model:** [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) <!-- at revision f52bf8ec8c7124536f0efb74aca902b2995e5bcd -->
94
+ - **Maximum Sequence Length:** 256 tokens
95
+ - **Output Dimensionality:** 768 dimensions
96
+ - **Similarity Function:** Cosine Similarity
97
+ <!-- - **Training Dataset:** Unknown -->
98
+ <!-- - **Language:** Unknown -->
99
+ <!-- - **License:** Unknown -->
100
+
101
+ ### Model Sources
102
+
103
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
104
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
105
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
106
+
107
+ ### Full Model Architecture
108
+
109
+ ```
110
+ SentenceTransformer(
111
+ (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
112
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
113
+ (2): Normalize()
114
+ )
115
+ ```
116
+
117
+ ## Usage
118
+
119
+ ### Direct Usage (Sentence Transformers)
120
+
121
+ First install the Sentence Transformers library:
122
+
123
+ ```bash
124
+ pip install -U sentence-transformers
125
+ ```
126
+
127
+ Then you can load this model and run inference.
128
+ ```python
129
+ from sentence_transformers import SentenceTransformer
130
+
131
+ # Download from the 🤗 Hub
132
+ model = SentenceTransformer("sentence_transformers_model_id")
133
+ # Run inference
134
+ sentences = [
135
+ 'God: (A.S. and Dutch God; Dan. Gud; Ger. Gott), the name of the Divine Being. It is the rendering (1) of the Hebrew <i> \'El</i> , from a word meaning to be strong; (2) of <i> \'Eloah_, plural _\'Elohim</i> . The singular form, <i> Eloah</i> , is used only in poetry. The plural form is more commonly used in all parts of the Bible, The Hebrew word Jehovah (q.v.), the only other word generally employed to denote the Supreme Being, is uniformly rendered in the Authorized Version by "LORD," printed in small capitals. The existence of God is taken for granted in the Bible. There is nowhere any argument to prove it. He who disbelieves this truth is spoken of as one devoid of understanding ( Psalms 14:1 ). The arguments generally adduced by theologians in proof of the being of God are: <li> The a priori argument, which is the testimony afforded by reason. <li> The a posteriori argument, by which we proceed logically from the facts of experience to causes. These arguments are, (a) The cosmological, by which it is proved that there must be a First Cause of all things, for every effect must have a cause. (b) The teleological, or the argument from design. We see everywhere the operations of an intelligent Cause in nature. (c) The moral argument, called also the anthropological argument, based on the moral consciousness and the history of mankind, which exhibits a moral order and purpose which can only be explained on the supposition of the existence of God. Conscience and human history testify that "verily there is a God that judgeth in the earth." The attributes of God are set forth in order by Moses in Exodus 34:6 Exodus 34:7 . (see also Deuteronomy 6:4 ; 10:17 ; Numbers 16:22 ; Exodus 15:11 ; 33:19 ; Isaiah 44:6 ; Habakkuk 3:6 ; Psalms 102:26 ; Job 34:12 .) They are also systematically classified in Revelation 5:12 and 7:12 . God\'s attributes are spoken of by some as absolute, i.e., such as belong to his essence as Jehovah, Jah, etc.; and relative, i.e., such as are ascribed to him with relation to his creatures. Others distinguish them into communicable, i.e., those which can be imparted in degree to his creatures: goodness, holiness, wisdom, etc.; and incommunicable, which cannot be so imparted: independence, immutability, immensity, and eternity. They are by some also divided into natural attributes, eternity, immensity, etc.; and moral, holiness, goodness, etc.',
136
+ 'Jesus said unto them, If God were your Father, ye would love me: for I proceeded forth and came from God; neither came I of myself, but he sent me.',
137
+ 'As he spake these words, many believed on him.',
138
+ ]
139
+ embeddings = model.encode(sentences)
140
+ print(embeddings.shape)
141
+ # [3, 768]
142
+
143
+ # Get the similarity scores for the embeddings
144
+ similarities = model.similarity(embeddings, embeddings)
145
+ print(similarities)
146
+ # tensor([[1.0000, 0.7557, 0.7462],
147
+ # [0.7557, 1.0000, 0.7852],
148
+ # [0.7462, 0.7852, 1.0000]])
149
+ ```
150
+
151
+ <!--
152
+ ### Direct Usage (Transformers)
153
+
154
+ <details><summary>Click to see the direct usage in Transformers</summary>
155
+
156
+ </details>
157
+ -->
158
+
159
+ <!--
160
+ ### Downstream Usage (Sentence Transformers)
161
+
162
+ You can finetune this model on your own dataset.
163
+
164
+ <details><summary>Click to expand</summary>
165
+
166
+ </details>
167
+ -->
168
+
169
+ <!--
170
+ ### Out-of-Scope Use
171
+
172
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
173
+ -->
174
+
175
+ <!--
176
+ ## Bias, Risks and Limitations
177
+
178
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
179
+ -->
180
+
181
+ <!--
182
+ ### Recommendations
183
+
184
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
185
+ -->
186
+
187
+ ## Training Details
188
+
189
+ ### Training Dataset
190
+
191
+ #### Unnamed Dataset
192
+
193
+ * Size: 2,633 training samples
194
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
195
+ * Approximate statistics based on the first 1000 samples:
196
+ | | sentence_0 | sentence_1 | label |
197
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------|
198
+ | type | string | string | float |
199
+ | details | <ul><li>min: 3 tokens</li><li>mean: 81.92 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 30.06 tokens</li><li>max: 73 tokens</li></ul> | <ul><li>min: 1.0</li><li>mean: 1.0</li><li>max: 1.0</li></ul> |
200
+ * Samples:
201
+ | sentence_0 | sentence_1 | label |
202
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
203
+ | <code>God: (A.S. and Dutch God; Dan. Gud; Ger. Gott), the name of the Divine Being. It is the rendering (1) of the Hebrew <i> 'El</i> , from a word meaning to be strong; (2) of <i> 'Eloah_, plural _'Elohim</i> . The singular form, <i> Eloah</i> , is used only in poetry. The plural form is more commonly used in all parts of the Bible, The Hebrew word Jehovah (q.v.), the only other word generally employed to denote the Supreme Being, is uniformly rendered in the Authorized Version by "LORD," printed in small capitals. The existence of God is taken for granted in the Bible. There is nowhere any argument to prove it. He who disbelieves this truth is spoken of as one devoid of understanding ( Psalms 14:1 ). The arguments generally adduced by theologians in proof of the being of God are: <li> The a priori argument, which is the testimony afforded by reason. <li> The a posteriori argument, by which we proceed logically from the facts of experience to causes. These arguments are, (a) T...</code> | <code>For as the Father hath life in himself; so hath he given to the Son to have life in himself;</code> | <code>1.0</code> |
204
+ | <code>Bread of Life Sermon \| participants: jesus_905, peter_2745</code> | <code>Jesus therefore answered and said unto them, Murmur not among yourselves.</code> | <code>1.0</code> |
205
+ | <code>Verily, verily, I say unto thee, When thou wast young, thou girdest thyself, and walkedst whither thou wouldest: but when thou shalt be old, thou shalt stretch forth thy hands, and another shall gird thee, and carry thee whither thou wouldest not.</code> | <code>This spake he, signifying by what death he should glorify God. And when he had spoken this, he saith unto him, Follow me.</code> | <code>1.0</code> |
206
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
207
+ ```json
208
+ {
209
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
210
+ }
211
+ ```
212
+
213
+ ### Training Hyperparameters
214
+ #### Non-Default Hyperparameters
215
+
216
+ - `per_device_train_batch_size`: 16
217
+ - `per_device_eval_batch_size`: 16
218
+ - `num_train_epochs`: 1
219
+ - `max_steps`: 5
220
+ - `multi_dataset_batch_sampler`: round_robin
221
+
222
+ #### All Hyperparameters
223
+ <details><summary>Click to expand</summary>
224
+
225
+ - `overwrite_output_dir`: False
226
+ - `do_predict`: False
227
+ - `eval_strategy`: no
228
+ - `prediction_loss_only`: True
229
+ - `per_device_train_batch_size`: 16
230
+ - `per_device_eval_batch_size`: 16
231
+ - `per_gpu_train_batch_size`: None
232
+ - `per_gpu_eval_batch_size`: None
233
+ - `gradient_accumulation_steps`: 1
234
+ - `eval_accumulation_steps`: None
235
+ - `torch_empty_cache_steps`: None
236
+ - `learning_rate`: 5e-05
237
+ - `weight_decay`: 0.0
238
+ - `adam_beta1`: 0.9
239
+ - `adam_beta2`: 0.999
240
+ - `adam_epsilon`: 1e-08
241
+ - `max_grad_norm`: 1
242
+ - `num_train_epochs`: 1
243
+ - `max_steps`: 5
244
+ - `lr_scheduler_type`: linear
245
+ - `lr_scheduler_kwargs`: None
246
+ - `warmup_ratio`: 0.0
247
+ - `warmup_steps`: 0
248
+ - `log_level`: passive
249
+ - `log_level_replica`: warning
250
+ - `log_on_each_node`: True
251
+ - `logging_nan_inf_filter`: True
252
+ - `save_safetensors`: True
253
+ - `save_on_each_node`: False
254
+ - `save_only_model`: False
255
+ - `restore_callback_states_from_checkpoint`: False
256
+ - `no_cuda`: False
257
+ - `use_cpu`: False
258
+ - `use_mps_device`: False
259
+ - `seed`: 42
260
+ - `data_seed`: None
261
+ - `jit_mode_eval`: False
262
+ - `bf16`: False
263
+ - `fp16`: False
264
+ - `fp16_opt_level`: O1
265
+ - `half_precision_backend`: auto
266
+ - `bf16_full_eval`: False
267
+ - `fp16_full_eval`: False
268
+ - `tf32`: None
269
+ - `local_rank`: 0
270
+ - `ddp_backend`: None
271
+ - `tpu_num_cores`: None
272
+ - `tpu_metrics_debug`: False
273
+ - `debug`: []
274
+ - `dataloader_drop_last`: False
275
+ - `dataloader_num_workers`: 0
276
+ - `dataloader_prefetch_factor`: None
277
+ - `past_index`: -1
278
+ - `disable_tqdm`: False
279
+ - `remove_unused_columns`: True
280
+ - `label_names`: None
281
+ - `load_best_model_at_end`: False
282
+ - `ignore_data_skip`: False
283
+ - `fsdp`: []
284
+ - `fsdp_min_num_params`: 0
285
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
286
+ - `fsdp_transformer_layer_cls_to_wrap`: None
287
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
288
+ - `parallelism_config`: None
289
+ - `deepspeed`: None
290
+ - `label_smoothing_factor`: 0.0
291
+ - `optim`: adamw_torch_fused
292
+ - `optim_args`: None
293
+ - `adafactor`: False
294
+ - `group_by_length`: False
295
+ - `length_column_name`: length
296
+ - `project`: huggingface
297
+ - `trackio_space_id`: trackio
298
+ - `ddp_find_unused_parameters`: None
299
+ - `ddp_bucket_cap_mb`: None
300
+ - `ddp_broadcast_buffers`: False
301
+ - `dataloader_pin_memory`: True
302
+ - `dataloader_persistent_workers`: False
303
+ - `skip_memory_metrics`: True
304
+ - `use_legacy_prediction_loop`: False
305
+ - `push_to_hub`: False
306
+ - `resume_from_checkpoint`: None
307
+ - `hub_model_id`: None
308
+ - `hub_strategy`: every_save
309
+ - `hub_private_repo`: None
310
+ - `hub_always_push`: False
311
+ - `hub_revision`: None
312
+ - `gradient_checkpointing`: False
313
+ - `gradient_checkpointing_kwargs`: None
314
+ - `include_inputs_for_metrics`: False
315
+ - `include_for_metrics`: []
316
+ - `eval_do_concat_batches`: True
317
+ - `fp16_backend`: auto
318
+ - `push_to_hub_model_id`: None
319
+ - `push_to_hub_organization`: None
320
+ - `mp_parameters`:
321
+ - `auto_find_batch_size`: False
322
+ - `full_determinism`: False
323
+ - `torchdynamo`: None
324
+ - `ray_scope`: last
325
+ - `ddp_timeout`: 1800
326
+ - `torch_compile`: False
327
+ - `torch_compile_backend`: None
328
+ - `torch_compile_mode`: None
329
+ - `include_tokens_per_second`: False
330
+ - `include_num_input_tokens_seen`: no
331
+ - `neftune_noise_alpha`: None
332
+ - `optim_target_modules`: None
333
+ - `batch_eval_metrics`: False
334
+ - `eval_on_start`: False
335
+ - `use_liger_kernel`: False
336
+ - `liger_kernel_config`: None
337
+ - `eval_use_gather_object`: False
338
+ - `average_tokens_across_devices`: True
339
+ - `prompts`: None
340
+ - `batch_sampler`: batch_sampler
341
+ - `multi_dataset_batch_sampler`: round_robin
342
+ - `router_mapping`: {}
343
+ - `learning_rate_mapping`: {}
344
+
345
+ </details>
346
+
347
+ ### Framework Versions
348
+ - Python: 3.13.11
349
+ - Sentence Transformers: 5.2.0
350
+ - Transformers: 4.57.6
351
+ - PyTorch: 2.10.0+cpu
352
+ - Accelerate: 1.12.0
353
+ - Datasets: 4.5.0
354
+ - Tokenizers: 0.22.2
355
+
356
+ ## Citation
357
+
358
+ ### BibTeX
359
+
360
+ #### Sentence Transformers
361
+ ```bibtex
362
+ @inproceedings{reimers-2019-sentence-bert,
363
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
364
+ author = "Reimers, Nils and Gurevych, Iryna",
365
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
366
+ month = "11",
367
+ year = "2019",
368
+ publisher = "Association for Computational Linguistics",
369
+ url = "https://arxiv.org/abs/1908.10084",
370
+ }
371
+ ```
372
+
373
+ <!--
374
+ ## Glossary
375
+
376
+ *Clearly define terms in order to be accessible across audiences.*
377
+ -->
378
+
379
+ <!--
380
+ ## Model Card Authors
381
+
382
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
383
+ -->
384
+
385
+ <!--
386
+ ## Model Card Contact
387
+
388
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
389
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "dtype": "float32",
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "transformers_version": "4.57.6",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.2.0",
5
+ "transformers": "4.57.6",
6
+ "pytorch": "2.10.0+cpu"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0c5c6fe3d2c0a353c6994006c4ef77e6723cba4120e2bb49b58acd34b3614af0
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "BertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff