dpshade22 commited on
Commit
30cee0f
·
verified ·
1 Parent(s): 35425b2

Upload hf-e5-bible-500 embedding model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,402 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:262023
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: intfloat/e5-base-v2
11
+ widget:
12
+ - source_sentence: 'query: Heir meaning'
13
+ sentences:
14
+ - 'passage: This is what the Lord commands for Zelophehad’s daughters: They may
15
+ marry anyone they please as long as they marry within their father’s tribal clan.'
16
+ - 'passage: The second one married the widow, but he also died, leaving no child.
17
+ It was the same with the third.'
18
+ - 'passage: and because the Lord loved him, he sent word through Nathan the prophet
19
+ to name him Jedidiah.'
20
+ - source_sentence: 'query: story of wilderness wanderings'
21
+ sentences:
22
+ - 'passage: So Moses said to Aaron, “Take a jar and put an omer of manna in it.
23
+ Then place it before the Lord to be kept for the generations to come.”'
24
+ - 'passage: Sheba and Dedan and the merchants of Tarshish and all her villages will
25
+ say to you, “Have you come to plunder? Have you gathered your hordes to loot,
26
+ to carry off silver and gold, to take away livestock and goods and to seize much
27
+ plunder?”’'
28
+ - 'passage: “It was because your hearts were hard that Moses wrote you this law,”
29
+ Jesus replied.'
30
+ - source_sentence: 'query: Alexandria in the Bible'
31
+ sentences:
32
+ - 'passage: And if the Spirit of him who raised Jesus from the dead is living in
33
+ you, he who raised Christ from the dead will also give life to your mortal bodies
34
+ because of his Spirit who lives in you.'
35
+ - 'passage: After three months we put out to sea in a ship that had wintered in
36
+ the island—it was an Alexandrian ship with the figurehead of the twin gods Castor
37
+ and Pollux.'
38
+ - 'passage: They should collect all the food of these good years that are coming
39
+ and store up the grain under the authority of Pharaoh, to be kept in the cities
40
+ for food.'
41
+ - source_sentence: 'query: Dragon: Heb. tannim, plural of tan. The name of some unknown
42
+ creature inhabiting desert places and ruins (Job 30:29; Ps. 44:19; Isa. 13:22;
43
+ 34:13; 43:20; Jer. 10:22; Micah 1:8; Mal. 1:3); probably, as translated in the
44
+ Revised Version, the jackal (q.v.).'
45
+ sentences:
46
+ - "passage: “But as a mountain erodes and crumbles\n and as a rock is moved from\
47
+ \ its place,"
48
+ - "passage: Speak to him and say: ‘This is what the Sovereign Lord says:\n“‘I am\
49
+ \ against you, Pharaoh king of Egypt,\n you great monster lying among your\
50
+ \ streams.\nYou say, “The Nile belongs to me;\n I made it for myself.”"
51
+ - "passage: But you crushed us and made us a haunt for jackals;\n you covered\
52
+ \ us over with deep darkness."
53
+ - source_sentence: 'query: Jacob (Israel): the name conferred on Jacob after the
54
+ great prayer-struggle at Peniel ( Genesis 32:28 ), because "as a prince he had
55
+ power with God and prevailed." (See JACOB .) This is the common name given to
56
+ Jacob''s descendants. The whole people of the twelve tribes are called "Israelites,"
57
+ the "children of Israel" ( Joshua 3:17 ; 7:25 ; Judges 8:27 ; Jeremiah
58
+ 3:21 ), and the "house of Israel" ( Exodus 16:31 ; 40:38 ). This name
59
+ Israel is sometimes used emphatically for the true Israel ( Psalms 73:1 : Isaiah
60
+ 45:17 ; 49:3 ; John 1:47 ; Romans 9:6 ; 11:26 ). After the death
61
+ of Saul the ten tribes arrogated to themselves this name, as if they were the
62
+ whole nation ( 2 Samuel 2:9 2 Samuel 2:10 2 Samuel 2:17 2 Samuel 2:28 ; 2
63
+ Samuel 3:10 2 Samuel 3:17 ; 19:40-43 ), and the kings of the ten tribes
64
+ were called "kings of Israel," while the kings of the two tribes were called "kings
65
+ of Judah." After the Exile the name Israel was assumed as designating the entire
66
+ nation.'
67
+ sentences:
68
+ - 'passage: Greet Ampliatus, my dear friend in the Lord.'
69
+ - 'passage: Jeremiah had written on a scroll about all the disasters that would
70
+ come upon Babylon—all that had been recorded concerning Babylon.'
71
+ - 'passage: then I will reject the descendants of Jacob and David my servant and
72
+ will not choose one of his sons to rule over the descendants of Abraham, Isaac
73
+ and Jacob. For I will restore their fortunes and have compassion on them.’”'
74
+ pipeline_tag: sentence-similarity
75
+ library_name: sentence-transformers
76
+ ---
77
+
78
+ # SentenceTransformer based on intfloat/e5-base-v2
79
+
80
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
81
+
82
+ ## Model Details
83
+
84
+ ### Model Description
85
+ - **Model Type:** Sentence Transformer
86
+ - **Base model:** [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) <!-- at revision f52bf8ec8c7124536f0efb74aca902b2995e5bcd -->
87
+ - **Maximum Sequence Length:** 256 tokens
88
+ - **Output Dimensionality:** 768 dimensions
89
+ - **Similarity Function:** Cosine Similarity
90
+ <!-- - **Training Dataset:** Unknown -->
91
+ <!-- - **Language:** Unknown -->
92
+ <!-- - **License:** Unknown -->
93
+
94
+ ### Model Sources
95
+
96
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
97
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
98
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
99
+
100
+ ### Full Model Architecture
101
+
102
+ ```
103
+ SentenceTransformer(
104
+ (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
105
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
106
+ (2): Normalize()
107
+ )
108
+ ```
109
+
110
+ ## Usage
111
+
112
+ ### Direct Usage (Sentence Transformers)
113
+
114
+ First install the Sentence Transformers library:
115
+
116
+ ```bash
117
+ pip install -U sentence-transformers
118
+ ```
119
+
120
+ Then you can load this model and run inference.
121
+ ```python
122
+ from sentence_transformers import SentenceTransformer
123
+
124
+ # Download from the 🤗 Hub
125
+ model = SentenceTransformer("sentence_transformers_model_id")
126
+ # Run inference
127
+ sentences = [
128
+ 'query: Jacob (Israel): the name conferred on Jacob after the great prayer-struggle at Peniel ( Genesis 32:28 ), because "as a prince he had power with God and prevailed." (See JACOB .) This is the common name given to Jacob\'s descendants. The whole people of the twelve tribes are called "Israelites," the "children of Israel" ( Joshua 3:17 ; 7:25 ; Judges 8:27 ; Jeremiah 3:21 ), and the "house of Israel" ( Exodus 16:31 ; 40:38 ). This name Israel is sometimes used emphatically for the true Israel ( Psalms 73:1 : Isaiah 45:17 ; 49:3 ; John 1:47 ; Romans 9:6 ; 11:26 ). After the death of Saul the ten tribes arrogated to themselves this name, as if they were the whole nation ( 2 Samuel 2:9 2 Samuel 2:10 2 Samuel 2:17 2 Samuel 2:28 ; 2 Samuel 3:10 2 Samuel 3:17 ; 19:40-43 ), and the kings of the ten tribes were called "kings of Israel," while the kings of the two tribes were called "kings of Judah." After the Exile the name Israel was assumed as designating the entire nation.',
129
+ 'passage: then I will reject the descendants of Jacob and David my servant and will not choose one of his sons to rule over the descendants of Abraham, Isaac and Jacob. For I will restore their fortunes and have compassion on them.’”',
130
+ 'passage: Greet Ampliatus, my dear friend in the Lord.',
131
+ ]
132
+ embeddings = model.encode(sentences)
133
+ print(embeddings.shape)
134
+ # [3, 768]
135
+
136
+ # Get the similarity scores for the embeddings
137
+ similarities = model.similarity(embeddings, embeddings)
138
+ print(similarities)
139
+ # tensor([[1.0000, 0.4831, 0.1291],
140
+ # [0.4831, 1.0000, 0.2341],
141
+ # [0.1291, 0.2341, 1.0000]])
142
+ ```
143
+
144
+ <!--
145
+ ### Direct Usage (Transformers)
146
+
147
+ <details><summary>Click to see the direct usage in Transformers</summary>
148
+
149
+ </details>
150
+ -->
151
+
152
+ <!--
153
+ ### Downstream Usage (Sentence Transformers)
154
+
155
+ You can finetune this model on your own dataset.
156
+
157
+ <details><summary>Click to expand</summary>
158
+
159
+ </details>
160
+ -->
161
+
162
+ <!--
163
+ ### Out-of-Scope Use
164
+
165
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
166
+ -->
167
+
168
+ <!--
169
+ ## Bias, Risks and Limitations
170
+
171
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
172
+ -->
173
+
174
+ <!--
175
+ ### Recommendations
176
+
177
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
178
+ -->
179
+
180
+ ## Training Details
181
+
182
+ ### Training Dataset
183
+
184
+ #### Unnamed Dataset
185
+
186
+ * Size: 262,023 training samples
187
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
188
+ * Approximate statistics based on the first 1000 samples:
189
+ | | sentence_0 | sentence_1 | label |
190
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------|
191
+ | type | string | string | float |
192
+ | details | <ul><li>min: 5 tokens</li><li>mean: 28.18 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 36.17 tokens</li><li>max: 86 tokens</li></ul> | <ul><li>min: 1.0</li><li>mean: 1.0</li><li>max: 1.0</li></ul> |
193
+ * Samples:
194
+ | sentence_0 | sentence_1 | label |
195
+ |:------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
196
+ | <code>query: Holy Week in the Bible</code> | <code>passage: The master of that servant will come on a day when he does not expect him and at an hour he is not aware of.</code> | <code>1.0</code> |
197
+ | <code>query: what happened at prophecies of jeremiah</code> | <code>passage: They go up the hill to Luhith,<br> weeping bitterly as they go;<br>on the road down to Horonaim<br> anguished cries over the destruction are heard.</code> | <code>1.0</code> |
198
+ | <code>query: Holy Week</code> | <code>passage: How dreadful it will be in those days for pregnant women and nursing mothers!</code> | <code>1.0</code> |
199
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
200
+ ```json
201
+ {
202
+ "scale": 20.0,
203
+ "similarity_fct": "cos_sim",
204
+ "gather_across_devices": false
205
+ }
206
+ ```
207
+
208
+ ### Training Hyperparameters
209
+ #### Non-Default Hyperparameters
210
+
211
+ - `per_device_train_batch_size`: 32
212
+ - `per_device_eval_batch_size`: 32
213
+ - `num_train_epochs`: 1
214
+ - `max_steps`: 500
215
+ - `multi_dataset_batch_sampler`: round_robin
216
+
217
+ #### All Hyperparameters
218
+ <details><summary>Click to expand</summary>
219
+
220
+ - `overwrite_output_dir`: False
221
+ - `do_predict`: False
222
+ - `eval_strategy`: no
223
+ - `prediction_loss_only`: True
224
+ - `per_device_train_batch_size`: 32
225
+ - `per_device_eval_batch_size`: 32
226
+ - `per_gpu_train_batch_size`: None
227
+ - `per_gpu_eval_batch_size`: None
228
+ - `gradient_accumulation_steps`: 1
229
+ - `eval_accumulation_steps`: None
230
+ - `torch_empty_cache_steps`: None
231
+ - `learning_rate`: 5e-05
232
+ - `weight_decay`: 0.0
233
+ - `adam_beta1`: 0.9
234
+ - `adam_beta2`: 0.999
235
+ - `adam_epsilon`: 1e-08
236
+ - `max_grad_norm`: 1
237
+ - `num_train_epochs`: 1
238
+ - `max_steps`: 500
239
+ - `lr_scheduler_type`: linear
240
+ - `lr_scheduler_kwargs`: None
241
+ - `warmup_ratio`: 0.0
242
+ - `warmup_steps`: 0
243
+ - `log_level`: passive
244
+ - `log_level_replica`: warning
245
+ - `log_on_each_node`: True
246
+ - `logging_nan_inf_filter`: True
247
+ - `save_safetensors`: True
248
+ - `save_on_each_node`: False
249
+ - `save_only_model`: False
250
+ - `restore_callback_states_from_checkpoint`: False
251
+ - `no_cuda`: False
252
+ - `use_cpu`: False
253
+ - `use_mps_device`: False
254
+ - `seed`: 42
255
+ - `data_seed`: None
256
+ - `jit_mode_eval`: False
257
+ - `bf16`: False
258
+ - `fp16`: False
259
+ - `fp16_opt_level`: O1
260
+ - `half_precision_backend`: auto
261
+ - `bf16_full_eval`: False
262
+ - `fp16_full_eval`: False
263
+ - `tf32`: None
264
+ - `local_rank`: 0
265
+ - `ddp_backend`: None
266
+ - `tpu_num_cores`: None
267
+ - `tpu_metrics_debug`: False
268
+ - `debug`: []
269
+ - `dataloader_drop_last`: False
270
+ - `dataloader_num_workers`: 0
271
+ - `dataloader_prefetch_factor`: None
272
+ - `past_index`: -1
273
+ - `disable_tqdm`: False
274
+ - `remove_unused_columns`: True
275
+ - `label_names`: None
276
+ - `load_best_model_at_end`: False
277
+ - `ignore_data_skip`: False
278
+ - `fsdp`: []
279
+ - `fsdp_min_num_params`: 0
280
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
281
+ - `fsdp_transformer_layer_cls_to_wrap`: None
282
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
283
+ - `parallelism_config`: None
284
+ - `deepspeed`: None
285
+ - `label_smoothing_factor`: 0.0
286
+ - `optim`: adamw_torch_fused
287
+ - `optim_args`: None
288
+ - `adafactor`: False
289
+ - `group_by_length`: False
290
+ - `length_column_name`: length
291
+ - `project`: huggingface
292
+ - `trackio_space_id`: trackio
293
+ - `ddp_find_unused_parameters`: None
294
+ - `ddp_bucket_cap_mb`: None
295
+ - `ddp_broadcast_buffers`: False
296
+ - `dataloader_pin_memory`: True
297
+ - `dataloader_persistent_workers`: False
298
+ - `skip_memory_metrics`: True
299
+ - `use_legacy_prediction_loop`: False
300
+ - `push_to_hub`: False
301
+ - `resume_from_checkpoint`: None
302
+ - `hub_model_id`: None
303
+ - `hub_strategy`: every_save
304
+ - `hub_private_repo`: None
305
+ - `hub_always_push`: False
306
+ - `hub_revision`: None
307
+ - `gradient_checkpointing`: False
308
+ - `gradient_checkpointing_kwargs`: None
309
+ - `include_inputs_for_metrics`: False
310
+ - `include_for_metrics`: []
311
+ - `eval_do_concat_batches`: True
312
+ - `fp16_backend`: auto
313
+ - `push_to_hub_model_id`: None
314
+ - `push_to_hub_organization`: None
315
+ - `mp_parameters`:
316
+ - `auto_find_batch_size`: False
317
+ - `full_determinism`: False
318
+ - `torchdynamo`: None
319
+ - `ray_scope`: last
320
+ - `ddp_timeout`: 1800
321
+ - `torch_compile`: False
322
+ - `torch_compile_backend`: None
323
+ - `torch_compile_mode`: None
324
+ - `include_tokens_per_second`: False
325
+ - `include_num_input_tokens_seen`: no
326
+ - `neftune_noise_alpha`: None
327
+ - `optim_target_modules`: None
328
+ - `batch_eval_metrics`: False
329
+ - `eval_on_start`: False
330
+ - `use_liger_kernel`: False
331
+ - `liger_kernel_config`: None
332
+ - `eval_use_gather_object`: False
333
+ - `average_tokens_across_devices`: True
334
+ - `prompts`: None
335
+ - `batch_sampler`: batch_sampler
336
+ - `multi_dataset_batch_sampler`: round_robin
337
+ - `router_mapping`: {}
338
+ - `learning_rate_mapping`: {}
339
+
340
+ </details>
341
+
342
+ ### Training Logs
343
+ | Epoch | Step | Training Loss |
344
+ |:------:|:----:|:-------------:|
345
+ | 0.0611 | 500 | 1.9442 |
346
+
347
+
348
+ ### Framework Versions
349
+ - Python: 3.11.14
350
+ - Sentence Transformers: 5.2.0
351
+ - Transformers: 4.57.6
352
+ - PyTorch: 2.10.0+cpu
353
+ - Accelerate: 1.12.0
354
+ - Datasets: 4.5.0
355
+ - Tokenizers: 0.22.2
356
+
357
+ ## Citation
358
+
359
+ ### BibTeX
360
+
361
+ #### Sentence Transformers
362
+ ```bibtex
363
+ @inproceedings{reimers-2019-sentence-bert,
364
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
365
+ author = "Reimers, Nils and Gurevych, Iryna",
366
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
367
+ month = "11",
368
+ year = "2019",
369
+ publisher = "Association for Computational Linguistics",
370
+ url = "https://arxiv.org/abs/1908.10084",
371
+ }
372
+ ```
373
+
374
+ #### MultipleNegativesRankingLoss
375
+ ```bibtex
376
+ @misc{henderson2017efficient,
377
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
378
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
379
+ year={2017},
380
+ eprint={1705.00652},
381
+ archivePrefix={arXiv},
382
+ primaryClass={cs.CL}
383
+ }
384
+ ```
385
+
386
+ <!--
387
+ ## Glossary
388
+
389
+ *Clearly define terms in order to be accessible across audiences.*
390
+ -->
391
+
392
+ <!--
393
+ ## Model Card Authors
394
+
395
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
396
+ -->
397
+
398
+ <!--
399
+ ## Model Card Contact
400
+
401
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
402
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "dtype": "float32",
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "transformers_version": "4.57.6",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.2.0",
5
+ "transformers": "4.57.6",
6
+ "pytorch": "2.10.0+cpu"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:740012c04137e189510bb4acd291f468a80e64336dc14c61d12d2440ba570696
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "BertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff