dpshade22 commited on
Commit
36de1e5
·
verified ·
1 Parent(s): e0f0899

Upload e5-base-bible embedding model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,396 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:70323
9
+ - loss:CosineSimilarityLoss
10
+ base_model: intfloat/e5-base-v2
11
+ widget:
12
+ - source_sentence: 'Prophecies of Ezekiel | participants: ezekiel_1237'
13
+ sentences:
14
+ - 'I would seek unto God, and unto God would I commit my cause:'
15
+ - 'And he shall deliver their kings into thine hand, and thou shalt destroy their
16
+ name from under heaven: there shall no man be able to stand before thee, until
17
+ thou have destroyed them.'
18
+ - 'And I will set my jealousy against thee, and they shall deal furiously with thee:
19
+ they shall take away thy nose and thine ears; and thy remnant shall fall by the
20
+ sword: they shall take thy sons and thy daughters; and thy residue shall be devoured
21
+ by the fire.'
22
+ - source_sentence: 'And the men which were expressed by name rose up, and took the
23
+ captives, and with the spoil clothed all that were naked among them, and arrayed
24
+ them, and shod them, and gave them to eat and to drink, and anointed them, and
25
+ carried all the feeble of them upon asses, and brought them to Jericho, the city
26
+ of palm trees, to their brethren: then they returned to Samaria.'
27
+ sentences:
28
+ - That every man should let his manservant, and every man his maidservant, being
29
+ an Hebrew or an Hebrewess, go free; that none should serve himself of them, to
30
+ wit, of a Jew his brother.
31
+ - At that time did king Ahaz send unto the kings of Assyria to help him.
32
+ - Woe unto thee, Chorazin! woe unto thee, Bethsaida! for if the mighty works had
33
+ been done in Tyre and Sidon, which have been done in you, they had a great while
34
+ ago repented, sitting in sackcloth and ashes.
35
+ - source_sentence: 'The Transfiguation | participants: jesus_905, peter_2745, moses_2108,
36
+ elijah_1131, john_1677, james_717'
37
+ sentences:
38
+ - 'The waters saw thee, O God, the waters saw thee; they were afraid: the depths
39
+ also were troubled.'
40
+ - And all these blessings shall come on thee, and overtake thee, if thou shalt hearken
41
+ unto the voice of the Lord thy God.
42
+ - 'And it came to pass, as they departed from him, Peter said unto Jesus, Master,
43
+ it is good for us to be here: and let us make three tabernacles; one for thee,
44
+ and one for Moses, and one for Elias: not knowing what he said.'
45
+ - source_sentence: 'Jesus Christ: anointed, the Greek translation of the Hebrew word
46
+ rendered "Messiah" (q.v.), the official title of our Lord, occurring five hundred
47
+ and fourteen times in the New Testament. It denotes that he was anointed or consecrated
48
+ to his great redemptive work as Prophet, Priest, and King of his people. He is
49
+ Jesus the Christ ( Acts 17:3 ; 18:5 ; Matthew 22:42 ), the Anointed One.
50
+ He is thus spoken of by ( Isaiah 61:1 ), and by ( Daniel 9:24-26 ), who styles
51
+ him "Messiah the Prince." The Messiah is the same person as "the seed of the
52
+ woman" ( Genesis 3:15 ), "the seed of Abraham" ( Genesis 22:18 ), the "Prophet
53
+ like unto Moses" ( Deuteronomy 18:15 ), "the priest after the order of Melchizedek"
54
+ ( Psalms 110:4 ), "the rod out of the stem of Jesse" ( Isaiah 11:1 Isaiah
55
+ 11:10 ), the "Immanuel," the virgin''s son ( Isaiah 7:14 ), "the branch of
56
+ Jehovah" ( Isaiah 4:2 ), and "the messenger of the covenant" ( Malachi 3:1 ).
57
+ This is he "of whom Moses in the law and the prophets did write." The Old Testament
58
+ Scripture is full of prophetic declarations regarding the Great Deliverer and
59
+ the work he was to accomplish. Jesus the Christ is Jesus the Great Deliverer,
60
+ the Anointed One, the Saviour of men. This name denotes that Jesus was divinely
61
+ appointed, commissioned, and accredited as the Saviour of men ( Hebrews 5:4 ; Isaiah
62
+ 11:2-4 ; 49:6 ; John 5:37 ; Acts 2:22 ). To believe that "Jesus is
63
+ the Christ" is to believe that he is the Anointed, the Messiah of the prophets,
64
+ the Saviour sent of God, that he was, in a word, what he claimed to be. This is
65
+ to believe the gospel, by the faith of which alone men can be brought unto God.
66
+ That Jesus is the Christ is the testimony of God, and the faith of this constitutes
67
+ a Christian ( 1 Corinthians 12:3 ; 1 John 5:1 ).'
68
+ sentences:
69
+ - Then if any man shall say unto you, Lo, here is Christ, or there; believe it not.
70
+ - 'But was rebuked for his iniquity: the dumb ass speaking with man''s voice forbad
71
+ the madness of the prophet.'
72
+ - 'Doubtless thou art our father, though Abraham be ignorant of us, and Israel acknowledge
73
+ us not: thou, O Lord, art our father, our redeemer; thy name is from everlasting.'
74
+ - source_sentence: But he answered and said, It is not meet to take the children's
75
+ bread, and to cast it to dogs.
76
+ sentences:
77
+ - 'And she said, Truth, Lord: yet the dogs eat of the crumbs which fall from their
78
+ masters'' table.'
79
+ - 'And he delivered them into the hands of the Gibeonites, and they hanged them
80
+ in the hill before the Lord: and they fell all seven together, and were put to
81
+ death in the days of harvest, in the first days, in the beginning of barley harvest.'
82
+ - And they were all amazed, and were in doubt, saying one to another, What meaneth
83
+ this?
84
+ pipeline_tag: sentence-similarity
85
+ library_name: sentence-transformers
86
+ ---
87
+
88
+ # SentenceTransformer based on intfloat/e5-base-v2
89
+
90
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
91
+
92
+ ## Model Details
93
+
94
+ ### Model Description
95
+ - **Model Type:** Sentence Transformer
96
+ - **Base model:** [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) <!-- at revision f52bf8ec8c7124536f0efb74aca902b2995e5bcd -->
97
+ - **Maximum Sequence Length:** 128 tokens
98
+ - **Output Dimensionality:** 768 dimensions
99
+ - **Similarity Function:** Cosine Similarity
100
+ <!-- - **Training Dataset:** Unknown -->
101
+ <!-- - **Language:** Unknown -->
102
+ <!-- - **License:** Unknown -->
103
+
104
+ ### Model Sources
105
+
106
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
107
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
108
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
109
+
110
+ ### Full Model Architecture
111
+
112
+ ```
113
+ SentenceTransformer(
114
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'BertModel'})
115
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
116
+ (2): Normalize()
117
+ )
118
+ ```
119
+
120
+ ## Usage
121
+
122
+ ### Direct Usage (Sentence Transformers)
123
+
124
+ First install the Sentence Transformers library:
125
+
126
+ ```bash
127
+ pip install -U sentence-transformers
128
+ ```
129
+
130
+ Then you can load this model and run inference.
131
+ ```python
132
+ from sentence_transformers import SentenceTransformer
133
+
134
+ # Download from the 🤗 Hub
135
+ model = SentenceTransformer("sentence_transformers_model_id")
136
+ # Run inference
137
+ sentences = [
138
+ "But he answered and said, It is not meet to take the children's bread, and to cast it to dogs.",
139
+ "And she said, Truth, Lord: yet the dogs eat of the crumbs which fall from their masters' table.",
140
+ 'And he delivered them into the hands of the Gibeonites, and they hanged them in the hill before the Lord: and they fell all seven together, and were put to death in the days of harvest, in the first days, in the beginning of barley harvest.',
141
+ ]
142
+ embeddings = model.encode(sentences)
143
+ print(embeddings.shape)
144
+ # [3, 768]
145
+
146
+ # Get the similarity scores for the embeddings
147
+ similarities = model.similarity(embeddings, embeddings)
148
+ print(similarities)
149
+ # tensor([[1.0000, 0.9936, 0.9925],
150
+ # [0.9936, 1.0000, 0.9940],
151
+ # [0.9925, 0.9940, 1.0000]])
152
+ ```
153
+
154
+ <!--
155
+ ### Direct Usage (Transformers)
156
+
157
+ <details><summary>Click to see the direct usage in Transformers</summary>
158
+
159
+ </details>
160
+ -->
161
+
162
+ <!--
163
+ ### Downstream Usage (Sentence Transformers)
164
+
165
+ You can finetune this model on your own dataset.
166
+
167
+ <details><summary>Click to expand</summary>
168
+
169
+ </details>
170
+ -->
171
+
172
+ <!--
173
+ ### Out-of-Scope Use
174
+
175
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
176
+ -->
177
+
178
+ <!--
179
+ ## Bias, Risks and Limitations
180
+
181
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
182
+ -->
183
+
184
+ <!--
185
+ ### Recommendations
186
+
187
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
188
+ -->
189
+
190
+ ## Training Details
191
+
192
+ ### Training Dataset
193
+
194
+ #### Unnamed Dataset
195
+
196
+ * Size: 70,323 training samples
197
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
198
+ * Approximate statistics based on the first 1000 samples:
199
+ | | sentence_0 | sentence_1 | label |
200
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
201
+ | type | string | string | float |
202
+ | details | <ul><li>min: 3 tokens</li><li>mean: 49.18 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 35.56 tokens</li><li>max: 88 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.99</li><li>max: 1.0</li></ul> |
203
+ * Samples:
204
+ | sentence_0 | sentence_1 | label |
205
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
206
+ | <code>And Herod said, John have I beheaded: but who is this, of whom I hear such things? And he desired to see him.</code> | <code>And the apostles, when they were returned, told him all that they had done. And he took them, and went aside privately into a desert place belonging to the city called Bethsaida.</code> | <code>1.0</code> |
207
+ | <code>Egypt: the land of the Nile and the pyramids, the oldest kingdom of which we have any record, holds a place of great significance in Scripture. The Egyptians belonged to the white race, and their original home is still a matter of dispute. Many scholars believe that it was in Southern Arabia, and recent excavations have shown that the valley of the Nile was originally inhabited by a low-class population, perhaps belonging to the Nigritian stock, before the Egyptians of history entered it. The ancient Egyptian language, of which the latest form is Coptic, is distantly connected with the Semitic family of speech. Egypt consists geographically of two halves, the northern being the Delta, and the southern Upper Egypt, between Cairo and the First Cataract. In the Old Testament, Northern or Lower Egypt is called Mazor, "the fortified land" ( Isaiah 19:6 ; 37: : 25 , where the A.V. mistranslates "defence" and "besieged places"); while Southern or Upper Egypt is Pathros, the Egyptian...</code> | <code>And they did so; for Aaron stretched out his hand with his rod, and smote the dust of the earth, and it became lice in man, and in beast; all the dust of the land became lice throughout all the land of Egypt.</code> | <code>1.0</code> |
208
+ | <code>Prophecies of Ezekiel \| participants: ezekiel_1237</code> | <code>By thy great wisdom and by thy traffick hast thou increased thy riches, and thine heart is lifted up because of thy riches:</code> | <code>1.0</code> |
209
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
210
+ ```json
211
+ {
212
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
213
+ }
214
+ ```
215
+
216
+ ### Training Hyperparameters
217
+ #### Non-Default Hyperparameters
218
+
219
+ - `num_train_epochs`: 1
220
+ - `max_steps`: 500
221
+ - `multi_dataset_batch_sampler`: round_robin
222
+
223
+ #### All Hyperparameters
224
+ <details><summary>Click to expand</summary>
225
+
226
+ - `overwrite_output_dir`: False
227
+ - `do_predict`: False
228
+ - `eval_strategy`: no
229
+ - `prediction_loss_only`: True
230
+ - `per_device_train_batch_size`: 8
231
+ - `per_device_eval_batch_size`: 8
232
+ - `per_gpu_train_batch_size`: None
233
+ - `per_gpu_eval_batch_size`: None
234
+ - `gradient_accumulation_steps`: 1
235
+ - `eval_accumulation_steps`: None
236
+ - `torch_empty_cache_steps`: None
237
+ - `learning_rate`: 5e-05
238
+ - `weight_decay`: 0.0
239
+ - `adam_beta1`: 0.9
240
+ - `adam_beta2`: 0.999
241
+ - `adam_epsilon`: 1e-08
242
+ - `max_grad_norm`: 1
243
+ - `num_train_epochs`: 1
244
+ - `max_steps`: 500
245
+ - `lr_scheduler_type`: linear
246
+ - `lr_scheduler_kwargs`: None
247
+ - `warmup_ratio`: 0.0
248
+ - `warmup_steps`: 0
249
+ - `log_level`: passive
250
+ - `log_level_replica`: warning
251
+ - `log_on_each_node`: True
252
+ - `logging_nan_inf_filter`: True
253
+ - `save_safetensors`: True
254
+ - `save_on_each_node`: False
255
+ - `save_only_model`: False
256
+ - `restore_callback_states_from_checkpoint`: False
257
+ - `no_cuda`: False
258
+ - `use_cpu`: False
259
+ - `use_mps_device`: False
260
+ - `seed`: 42
261
+ - `data_seed`: None
262
+ - `jit_mode_eval`: False
263
+ - `bf16`: False
264
+ - `fp16`: False
265
+ - `fp16_opt_level`: O1
266
+ - `half_precision_backend`: auto
267
+ - `bf16_full_eval`: False
268
+ - `fp16_full_eval`: False
269
+ - `tf32`: None
270
+ - `local_rank`: 0
271
+ - `ddp_backend`: None
272
+ - `tpu_num_cores`: None
273
+ - `tpu_metrics_debug`: False
274
+ - `debug`: []
275
+ - `dataloader_drop_last`: False
276
+ - `dataloader_num_workers`: 0
277
+ - `dataloader_prefetch_factor`: None
278
+ - `past_index`: -1
279
+ - `disable_tqdm`: False
280
+ - `remove_unused_columns`: True
281
+ - `label_names`: None
282
+ - `load_best_model_at_end`: False
283
+ - `ignore_data_skip`: False
284
+ - `fsdp`: []
285
+ - `fsdp_min_num_params`: 0
286
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
287
+ - `fsdp_transformer_layer_cls_to_wrap`: None
288
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
289
+ - `parallelism_config`: None
290
+ - `deepspeed`: None
291
+ - `label_smoothing_factor`: 0.0
292
+ - `optim`: adamw_torch_fused
293
+ - `optim_args`: None
294
+ - `adafactor`: False
295
+ - `group_by_length`: False
296
+ - `length_column_name`: length
297
+ - `project`: huggingface
298
+ - `trackio_space_id`: trackio
299
+ - `ddp_find_unused_parameters`: None
300
+ - `ddp_bucket_cap_mb`: None
301
+ - `ddp_broadcast_buffers`: False
302
+ - `dataloader_pin_memory`: True
303
+ - `dataloader_persistent_workers`: False
304
+ - `skip_memory_metrics`: True
305
+ - `use_legacy_prediction_loop`: False
306
+ - `push_to_hub`: False
307
+ - `resume_from_checkpoint`: None
308
+ - `hub_model_id`: None
309
+ - `hub_strategy`: every_save
310
+ - `hub_private_repo`: None
311
+ - `hub_always_push`: False
312
+ - `hub_revision`: None
313
+ - `gradient_checkpointing`: False
314
+ - `gradient_checkpointing_kwargs`: None
315
+ - `include_inputs_for_metrics`: False
316
+ - `include_for_metrics`: []
317
+ - `eval_do_concat_batches`: True
318
+ - `fp16_backend`: auto
319
+ - `push_to_hub_model_id`: None
320
+ - `push_to_hub_organization`: None
321
+ - `mp_parameters`:
322
+ - `auto_find_batch_size`: False
323
+ - `full_determinism`: False
324
+ - `torchdynamo`: None
325
+ - `ray_scope`: last
326
+ - `ddp_timeout`: 1800
327
+ - `torch_compile`: False
328
+ - `torch_compile_backend`: None
329
+ - `torch_compile_mode`: None
330
+ - `include_tokens_per_second`: False
331
+ - `include_num_input_tokens_seen`: no
332
+ - `neftune_noise_alpha`: None
333
+ - `optim_target_modules`: None
334
+ - `batch_eval_metrics`: False
335
+ - `eval_on_start`: False
336
+ - `use_liger_kernel`: False
337
+ - `liger_kernel_config`: None
338
+ - `eval_use_gather_object`: False
339
+ - `average_tokens_across_devices`: True
340
+ - `prompts`: None
341
+ - `batch_sampler`: batch_sampler
342
+ - `multi_dataset_batch_sampler`: round_robin
343
+ - `router_mapping`: {}
344
+ - `learning_rate_mapping`: {}
345
+
346
+ </details>
347
+
348
+ ### Training Logs
349
+ | Epoch | Step | Training Loss |
350
+ |:------:|:----:|:-------------:|
351
+ | 0.0569 | 500 | 0.0116 |
352
+
353
+
354
+ ### Framework Versions
355
+ - Python: 3.11.14
356
+ - Sentence Transformers: 5.2.0
357
+ - Transformers: 4.57.6
358
+ - PyTorch: 2.10.0+cpu
359
+ - Accelerate: 1.12.0
360
+ - Datasets: 4.5.0
361
+ - Tokenizers: 0.22.2
362
+
363
+ ## Citation
364
+
365
+ ### BibTeX
366
+
367
+ #### Sentence Transformers
368
+ ```bibtex
369
+ @inproceedings{reimers-2019-sentence-bert,
370
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
371
+ author = "Reimers, Nils and Gurevych, Iryna",
372
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
373
+ month = "11",
374
+ year = "2019",
375
+ publisher = "Association for Computational Linguistics",
376
+ url = "https://arxiv.org/abs/1908.10084",
377
+ }
378
+ ```
379
+
380
+ <!--
381
+ ## Glossary
382
+
383
+ *Clearly define terms in order to be accessible across audiences.*
384
+ -->
385
+
386
+ <!--
387
+ ## Model Card Authors
388
+
389
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
390
+ -->
391
+
392
+ <!--
393
+ ## Model Card Contact
394
+
395
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
396
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "dtype": "float32",
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "transformers_version": "4.57.6",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.2.0",
5
+ "transformers": "4.57.6",
6
+ "pytorch": "2.10.0+cpu"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a19557163e7d3f6eb49d7cc95d9a8f8002bde07cd514d6e74f084b8d549b996d
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "BertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff