dpshade22 commited on
Commit
c63a053
·
verified ·
1 Parent(s): 6a5904d

Upload e5-base-john-10 embedding model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,386 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:70323
9
+ - loss:CosineSimilarityLoss
10
+ base_model: intfloat/e5-base-v2
11
+ widget:
12
+ - source_sentence: Suffer me that I may speak; and after that I have spoken, mock
13
+ on.
14
+ sentences:
15
+ - And Peleg lived after he begat Reu two hundred and nine years, and begat sons
16
+ and daughters.
17
+ - And to offer a sacrifice according to that which is said in the law of the Lord,
18
+ A pair of turtledoves, or two young pigeons.
19
+ - As for me, is my complaint to man? and if it were so, why should not my spirit
20
+ be troubled?
21
+ - source_sentence: 'Jesus Christ: anointed, the Greek translation of the Hebrew word
22
+ rendered "Messiah" (q.v.), the official title of our Lord, occurring five hundred
23
+ and fourteen times in the New Testament. It denotes that he was anointed or consecrated
24
+ to his great redemptive work as Prophet, Priest, and King of his people. He is
25
+ Jesus the Christ ( Acts 17:3 ; 18:5 ; Matthew 22:42 ), the Anointed One.
26
+ He is thus spoken of by ( Isaiah 61:1 ), and by ( Daniel 9:24-26 ), who styles
27
+ him "Messiah the Prince." The Messiah is the same person as "the seed of the
28
+ woman" ( Genesis 3:15 ), "the seed of Abraham" ( Genesis 22:18 ), the "Prophet
29
+ like unto Moses" ( Deuteronomy 18:15 ), "the priest after the order of Melchizedek"
30
+ ( Psalms 110:4 ), "the rod out of the stem of Jesse" ( Isaiah 11:1 Isaiah
31
+ 11:10 ), the "Immanuel," the virgin''s son ( Isaiah 7:14 ), "the branch of
32
+ Jehovah" ( Isaiah 4:2 ), and "the messenger of the covenant" ( Malachi 3:1 ).
33
+ This is he "of whom Moses in the law and the prophets did write." The Old Testament
34
+ Scripture is full of prophetic declarations regarding the Great Deliverer and
35
+ the work he was to accomplish. Jesus the Christ is Jesus the Great Deliverer,
36
+ the Anointed One, the Saviour of men. This name denotes that Jesus was divinely
37
+ appointed, commissioned, and accredited as the Saviour of men ( Hebrews 5:4 ; Isaiah
38
+ 11:2-4 ; 49:6 ; John 5:37 ; Acts 2:22 ). To believe that "Jesus is
39
+ the Christ" is to believe that he is the Anointed, the Messiah of the prophets,
40
+ the Saviour sent of God, that he was, in a word, what he claimed to be. This is
41
+ to believe the gospel, by the faith of which alone men can be brought unto God.
42
+ That Jesus is the Christ is the testimony of God, and the faith of this constitutes
43
+ a Christian ( 1 Corinthians 12:3 ; 1 John 5:1 ).'
44
+ sentences:
45
+ - 'And he took thereof in his hands, and went on eating, and came to his father
46
+ and mother, and he gave them, and they did eat: but he told not them that he had
47
+ taken the honey out of the carcase of the lion.'
48
+ - 'And Jesus said unto him, Forbid him not: for he that is not against us is for
49
+ us.'
50
+ - And thou shalt put it under the compass of the altar beneath, that the net may
51
+ be even to the midst of the altar.
52
+ - source_sentence: And, behold, seven thin ears and blasted with the east wind sprung
53
+ up after them.
54
+ sentences:
55
+ - When they were but a few men in number; yea, very few, and strangers in it.
56
+ - Till the Lord look down, and behold from heaven.
57
+ - And the seven thin ears devoured the seven rank and full ears. And Pharaoh awoke,
58
+ and, behold, it was a dream.
59
+ - source_sentence: 'And he shall dwell in that city, until he stand before the congregation
60
+ for judgment, and until the death of the high priest that shall be in those days:
61
+ then shall the slayer return, and come unto his own city, and unto his own house,
62
+ unto the city from whence he fled.'
63
+ sentences:
64
+ - And they appointed Kedesh in Galilee in mount Naphtali, and Shechem in mount Ephraim,
65
+ and Kirjatharba, which is Hebron, in the mountain of Judah.
66
+ - 'For the time past of our life may suffice us to have wrought the will of the
67
+ Gentiles, when we walked in lasciviousness, lusts, excess of wine, revellings,
68
+ banquetings, and abominable idolatries:'
69
+ - Where are the gods of Hamath and Arphad? where are the gods of Sepharvaim? and
70
+ have they delivered Samaria out of my hand?
71
+ - source_sentence: Gath
72
+ sentences:
73
+ - And the cities which the Philistines had taken from Israel were restored to Israel,
74
+ from Ekron even unto Gath; and the coasts thereof did Israel deliver out of the
75
+ hands of the Philistines. And there was peace between Israel and the Amorites.
76
+ - And as we tarried there many days, there came down from Judaea a certain prophet,
77
+ named Agabus.
78
+ - And the priests consented to receive no more money of the people, neither to repair
79
+ the breaches of the house.
80
+ pipeline_tag: sentence-similarity
81
+ library_name: sentence-transformers
82
+ ---
83
+
84
+ # SentenceTransformer based on intfloat/e5-base-v2
85
+
86
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
87
+
88
+ ## Model Details
89
+
90
+ ### Model Description
91
+ - **Model Type:** Sentence Transformer
92
+ - **Base model:** [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) <!-- at revision f52bf8ec8c7124536f0efb74aca902b2995e5bcd -->
93
+ - **Maximum Sequence Length:** 128 tokens
94
+ - **Output Dimensionality:** 768 dimensions
95
+ - **Similarity Function:** Cosine Similarity
96
+ <!-- - **Training Dataset:** Unknown -->
97
+ <!-- - **Language:** Unknown -->
98
+ <!-- - **License:** Unknown -->
99
+
100
+ ### Model Sources
101
+
102
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
103
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
104
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
105
+
106
+ ### Full Model Architecture
107
+
108
+ ```
109
+ SentenceTransformer(
110
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'BertModel'})
111
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
112
+ (2): Normalize()
113
+ )
114
+ ```
115
+
116
+ ## Usage
117
+
118
+ ### Direct Usage (Sentence Transformers)
119
+
120
+ First install the Sentence Transformers library:
121
+
122
+ ```bash
123
+ pip install -U sentence-transformers
124
+ ```
125
+
126
+ Then you can load this model and run inference.
127
+ ```python
128
+ from sentence_transformers import SentenceTransformer
129
+
130
+ # Download from the 🤗 Hub
131
+ model = SentenceTransformer("sentence_transformers_model_id")
132
+ # Run inference
133
+ sentences = [
134
+ 'Gath',
135
+ 'And the cities which the Philistines had taken from Israel were restored to Israel, from Ekron even unto Gath; and the coasts thereof did Israel deliver out of the hands of the Philistines. And there was peace between Israel and the Amorites.',
136
+ 'And as we tarried there many days, there came down from Judaea a certain prophet, named Agabus.',
137
+ ]
138
+ embeddings = model.encode(sentences)
139
+ print(embeddings.shape)
140
+ # [3, 768]
141
+
142
+ # Get the similarity scores for the embeddings
143
+ similarities = model.similarity(embeddings, embeddings)
144
+ print(similarities)
145
+ # tensor([[1.0000, 0.7385, 0.7175],
146
+ # [0.7385, 1.0000, 0.7856],
147
+ # [0.7175, 0.7856, 1.0000]])
148
+ ```
149
+
150
+ <!--
151
+ ### Direct Usage (Transformers)
152
+
153
+ <details><summary>Click to see the direct usage in Transformers</summary>
154
+
155
+ </details>
156
+ -->
157
+
158
+ <!--
159
+ ### Downstream Usage (Sentence Transformers)
160
+
161
+ You can finetune this model on your own dataset.
162
+
163
+ <details><summary>Click to expand</summary>
164
+
165
+ </details>
166
+ -->
167
+
168
+ <!--
169
+ ### Out-of-Scope Use
170
+
171
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
172
+ -->
173
+
174
+ <!--
175
+ ## Bias, Risks and Limitations
176
+
177
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
178
+ -->
179
+
180
+ <!--
181
+ ### Recommendations
182
+
183
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
184
+ -->
185
+
186
+ ## Training Details
187
+
188
+ ### Training Dataset
189
+
190
+ #### Unnamed Dataset
191
+
192
+ * Size: 70,323 training samples
193
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
194
+ * Approximate statistics based on the first 1000 samples:
195
+ | | sentence_0 | sentence_1 | label |
196
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
197
+ | type | string | string | float |
198
+ | details | <ul><li>min: 3 tokens</li><li>mean: 55.11 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 35.91 tokens</li><li>max: 91 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.99</li><li>max: 1.0</li></ul> |
199
+ * Samples:
200
+ | sentence_0 | sentence_1 | label |
201
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
202
+ | <code>The family of the house of Levi apart, and their wives apart; the family of Shimei apart, and their wives apart;</code> | <code>All the families that remain, every family apart, and their wives apart.</code> | <code>1.0</code> |
203
+ | <code>And I will make thee to pass with thine enemies into a land which thou knowest not: for a fire is kindled in mine anger, which shall burn upon you.</code> | <code>O Lord, thou knowest: remember me, and visit me, and revenge me of my persecutors; take me not away in thy longsuffering: know that for thy sake I have suffered rebuke.</code> | <code>1.0</code> |
204
+ | <code>God: (A.S. and Dutch God; Dan. Gud; Ger. Gott), the name of the Divine Being. It is the rendering (1) of the Hebrew <i> 'El</i> , from a word meaning to be strong; (2) of <i> 'Eloah_, plural _'Elohim</i> . The singular form, <i> Eloah</i> , is used only in poetry. The plural form is more commonly used in all parts of the Bible, The Hebrew word Jehovah (q.v.), the only other word generally employed to denote the Supreme Being, is uniformly rendered in the Authorized Version by "LORD," printed in small capitals. The existence of God is taken for granted in the Bible. There is nowhere any argument to prove it. He who disbelieves this truth is spoken of as one devoid of understanding ( Psalms 14:1 ). The arguments generally adduced by theologians in proof of the being of God are: <li> The a priori argument, which is the testimony afforded by reason. <li> The a posteriori argument, by which we proceed logically from the facts of experience to causes. These arguments are, (a) T...</code> | <code>Thou hast forsaken me, saith the Lord, thou art gone backward: therefore will I stretch out my hand against thee, and destroy thee; I am weary with repenting.</code> | <code>1.0</code> |
205
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
206
+ ```json
207
+ {
208
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
209
+ }
210
+ ```
211
+
212
+ ### Training Hyperparameters
213
+ #### Non-Default Hyperparameters
214
+
215
+ - `num_train_epochs`: 1
216
+ - `max_steps`: 10
217
+ - `multi_dataset_batch_sampler`: round_robin
218
+
219
+ #### All Hyperparameters
220
+ <details><summary>Click to expand</summary>
221
+
222
+ - `overwrite_output_dir`: False
223
+ - `do_predict`: False
224
+ - `eval_strategy`: no
225
+ - `prediction_loss_only`: True
226
+ - `per_device_train_batch_size`: 8
227
+ - `per_device_eval_batch_size`: 8
228
+ - `per_gpu_train_batch_size`: None
229
+ - `per_gpu_eval_batch_size`: None
230
+ - `gradient_accumulation_steps`: 1
231
+ - `eval_accumulation_steps`: None
232
+ - `torch_empty_cache_steps`: None
233
+ - `learning_rate`: 5e-05
234
+ - `weight_decay`: 0.0
235
+ - `adam_beta1`: 0.9
236
+ - `adam_beta2`: 0.999
237
+ - `adam_epsilon`: 1e-08
238
+ - `max_grad_norm`: 1
239
+ - `num_train_epochs`: 1
240
+ - `max_steps`: 10
241
+ - `lr_scheduler_type`: linear
242
+ - `lr_scheduler_kwargs`: None
243
+ - `warmup_ratio`: 0.0
244
+ - `warmup_steps`: 0
245
+ - `log_level`: passive
246
+ - `log_level_replica`: warning
247
+ - `log_on_each_node`: True
248
+ - `logging_nan_inf_filter`: True
249
+ - `save_safetensors`: True
250
+ - `save_on_each_node`: False
251
+ - `save_only_model`: False
252
+ - `restore_callback_states_from_checkpoint`: False
253
+ - `no_cuda`: False
254
+ - `use_cpu`: False
255
+ - `use_mps_device`: False
256
+ - `seed`: 42
257
+ - `data_seed`: None
258
+ - `jit_mode_eval`: False
259
+ - `bf16`: False
260
+ - `fp16`: False
261
+ - `fp16_opt_level`: O1
262
+ - `half_precision_backend`: auto
263
+ - `bf16_full_eval`: False
264
+ - `fp16_full_eval`: False
265
+ - `tf32`: None
266
+ - `local_rank`: 0
267
+ - `ddp_backend`: None
268
+ - `tpu_num_cores`: None
269
+ - `tpu_metrics_debug`: False
270
+ - `debug`: []
271
+ - `dataloader_drop_last`: False
272
+ - `dataloader_num_workers`: 0
273
+ - `dataloader_prefetch_factor`: None
274
+ - `past_index`: -1
275
+ - `disable_tqdm`: False
276
+ - `remove_unused_columns`: True
277
+ - `label_names`: None
278
+ - `load_best_model_at_end`: False
279
+ - `ignore_data_skip`: False
280
+ - `fsdp`: []
281
+ - `fsdp_min_num_params`: 0
282
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
283
+ - `fsdp_transformer_layer_cls_to_wrap`: None
284
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
285
+ - `parallelism_config`: None
286
+ - `deepspeed`: None
287
+ - `label_smoothing_factor`: 0.0
288
+ - `optim`: adamw_torch_fused
289
+ - `optim_args`: None
290
+ - `adafactor`: False
291
+ - `group_by_length`: False
292
+ - `length_column_name`: length
293
+ - `project`: huggingface
294
+ - `trackio_space_id`: trackio
295
+ - `ddp_find_unused_parameters`: None
296
+ - `ddp_bucket_cap_mb`: None
297
+ - `ddp_broadcast_buffers`: False
298
+ - `dataloader_pin_memory`: True
299
+ - `dataloader_persistent_workers`: False
300
+ - `skip_memory_metrics`: True
301
+ - `use_legacy_prediction_loop`: False
302
+ - `push_to_hub`: False
303
+ - `resume_from_checkpoint`: None
304
+ - `hub_model_id`: None
305
+ - `hub_strategy`: every_save
306
+ - `hub_private_repo`: None
307
+ - `hub_always_push`: False
308
+ - `hub_revision`: None
309
+ - `gradient_checkpointing`: False
310
+ - `gradient_checkpointing_kwargs`: None
311
+ - `include_inputs_for_metrics`: False
312
+ - `include_for_metrics`: []
313
+ - `eval_do_concat_batches`: True
314
+ - `fp16_backend`: auto
315
+ - `push_to_hub_model_id`: None
316
+ - `push_to_hub_organization`: None
317
+ - `mp_parameters`:
318
+ - `auto_find_batch_size`: False
319
+ - `full_determinism`: False
320
+ - `torchdynamo`: None
321
+ - `ray_scope`: last
322
+ - `ddp_timeout`: 1800
323
+ - `torch_compile`: False
324
+ - `torch_compile_backend`: None
325
+ - `torch_compile_mode`: None
326
+ - `include_tokens_per_second`: False
327
+ - `include_num_input_tokens_seen`: no
328
+ - `neftune_noise_alpha`: None
329
+ - `optim_target_modules`: None
330
+ - `batch_eval_metrics`: False
331
+ - `eval_on_start`: False
332
+ - `use_liger_kernel`: False
333
+ - `liger_kernel_config`: None
334
+ - `eval_use_gather_object`: False
335
+ - `average_tokens_across_devices`: True
336
+ - `prompts`: None
337
+ - `batch_sampler`: batch_sampler
338
+ - `multi_dataset_batch_sampler`: round_robin
339
+ - `router_mapping`: {}
340
+ - `learning_rate_mapping`: {}
341
+
342
+ </details>
343
+
344
+ ### Framework Versions
345
+ - Python: 3.13.11
346
+ - Sentence Transformers: 5.2.0
347
+ - Transformers: 4.57.6
348
+ - PyTorch: 2.10.0+cpu
349
+ - Accelerate: 1.12.0
350
+ - Datasets: 4.5.0
351
+ - Tokenizers: 0.22.2
352
+
353
+ ## Citation
354
+
355
+ ### BibTeX
356
+
357
+ #### Sentence Transformers
358
+ ```bibtex
359
+ @inproceedings{reimers-2019-sentence-bert,
360
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
361
+ author = "Reimers, Nils and Gurevych, Iryna",
362
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
363
+ month = "11",
364
+ year = "2019",
365
+ publisher = "Association for Computational Linguistics",
366
+ url = "https://arxiv.org/abs/1908.10084",
367
+ }
368
+ ```
369
+
370
+ <!--
371
+ ## Glossary
372
+
373
+ *Clearly define terms in order to be accessible across audiences.*
374
+ -->
375
+
376
+ <!--
377
+ ## Model Card Authors
378
+
379
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
380
+ -->
381
+
382
+ <!--
383
+ ## Model Card Contact
384
+
385
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
386
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "dtype": "float32",
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "transformers_version": "4.57.6",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.2.0",
5
+ "transformers": "4.57.6",
6
+ "pytorch": "2.10.0+cpu"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e88ef30926d092005c35686915d4e9450c9292a5bb2c4d5f10ccae7374de5eb
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "BertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff