dpshade22 commited on
Commit
254f1e4
·
verified ·
1 Parent(s): 36e74ce

Upload hf-e5-bible-150 embedding model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,383 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:262023
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: intfloat/e5-base-v2
11
+ widget:
12
+ - source_sentence: 'query: Ezekiel Prophecies of Ezekiel'
13
+ sentences:
14
+ - 'passage: Then he went to the east gate. He climbed its steps and measured the
15
+ threshold of the gate; it was one rod deep.'
16
+ - 'passage: But if you do not obey the Lord, and if you rebel against his commands,
17
+ his hand will be against you, as it was against your ancestors.'
18
+ - 'passage: When you were dead in your sins and in the uncircumcision of your flesh,
19
+ God made you alive with Christ. He forgave us all our sins,'
20
+ - source_sentence: 'query: The event ''Prophecies of Nahum'' as recorded in Scripture,
21
+ involving Nahum.'
22
+ sentences:
23
+ - "passage: Nothing can heal you;\n your wound is fatal.\nAll who hear the news\
24
+ \ about you\n clap their hands at your fall,\nfor who has not felt\n your\
25
+ \ endless cruelty?"
26
+ - 'passage: When David was told of this, he gathered all Israel and crossed the
27
+ Jordan; he advanced against them and formed his battle lines opposite them. David
28
+ formed his lines to meet the Arameans in battle, and they fought against him.'
29
+ - 'passage: Then the king of Assyria sent his field commander with a large army
30
+ from Lachish to King Hezekiah at Jerusalem. When the commander stopped at the
31
+ aqueduct of the Upper Pool, on the road to the Launderer’s Field,'
32
+ - source_sentence: 'query: what happened to Job'
33
+ sentences:
34
+ - "passage: If I hold my head high, you stalk me like a lion\n and again display\
35
+ \ your awesome power against me."
36
+ - "passage: But Job has not marshaled his words against me,\n and I will not\
37
+ \ answer him with your arguments."
38
+ - "passage: I will pronounce my judgments on my people\n because of their wickedness\
39
+ \ in forsaking me,\nin burning incense to other gods\n and in worshiping what\
40
+ \ their hands have made."
41
+ - source_sentence: 'query: what happened at peter meets cornelius'
42
+ sentences:
43
+ - 'passage: From the descendants of Bani:
44
+
45
+ Maadai, Amram, Uel,'
46
+ - 'passage: until I come and take you to a land like your own—a land of grain and
47
+ new wine, a land of bread and vineyards.'
48
+ - 'passage: So get up and go downstairs. Do not hesitate to go with them, for I
49
+ have sent them.”'
50
+ - source_sentence: 'query: Ahaz'
51
+ sentences:
52
+ - 'passage: We boarded a ship from Adramyttium about to sail for ports along the
53
+ coast of the province of Asia, and we put out to sea. Aristarchus, a Macedonian
54
+ from Thessalonica, was with us.'
55
+ - 'passage: This is what the Lord says: “If those who do not deserve to drink the
56
+ cup must drink it, why should you go unpunished? You will not go unpunished, but
57
+ must drink it.'
58
+ - 'passage: Ahaz sent messengers to say to Tiglath-Pileser king of Assyria, “I am
59
+ your servant and vassal. Come up and save me out of the hand of the king of Aram
60
+ and of the king of Israel, who are attacking me.”'
61
+ pipeline_tag: sentence-similarity
62
+ library_name: sentence-transformers
63
+ ---
64
+
65
+ # SentenceTransformer based on intfloat/e5-base-v2
66
+
67
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
68
+
69
+ ## Model Details
70
+
71
+ ### Model Description
72
+ - **Model Type:** Sentence Transformer
73
+ - **Base model:** [intfloat/e5-base-v2](https://huggingface.co/intfloat/e5-base-v2) <!-- at revision f52bf8ec8c7124536f0efb74aca902b2995e5bcd -->
74
+ - **Maximum Sequence Length:** 256 tokens
75
+ - **Output Dimensionality:** 768 dimensions
76
+ - **Similarity Function:** Cosine Similarity
77
+ <!-- - **Training Dataset:** Unknown -->
78
+ <!-- - **Language:** Unknown -->
79
+ <!-- - **License:** Unknown -->
80
+
81
+ ### Model Sources
82
+
83
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
84
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
85
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
86
+
87
+ ### Full Model Architecture
88
+
89
+ ```
90
+ SentenceTransformer(
91
+ (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
92
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
93
+ (2): Normalize()
94
+ )
95
+ ```
96
+
97
+ ## Usage
98
+
99
+ ### Direct Usage (Sentence Transformers)
100
+
101
+ First install the Sentence Transformers library:
102
+
103
+ ```bash
104
+ pip install -U sentence-transformers
105
+ ```
106
+
107
+ Then you can load this model and run inference.
108
+ ```python
109
+ from sentence_transformers import SentenceTransformer
110
+
111
+ # Download from the 🤗 Hub
112
+ model = SentenceTransformer("sentence_transformers_model_id")
113
+ # Run inference
114
+ sentences = [
115
+ 'query: Ahaz',
116
+ 'passage: Ahaz sent messengers to say to Tiglath-Pileser king of Assyria, “I am your servant and vassal. Come up and save me out of the hand of the king of Aram and of the king of Israel, who are attacking me.”',
117
+ 'passage: We boarded a ship from Adramyttium about to sail for ports along the coast of the province of Asia, and we put out to sea. Aristarchus, a Macedonian from Thessalonica, was with us.',
118
+ ]
119
+ embeddings = model.encode(sentences)
120
+ print(embeddings.shape)
121
+ # [3, 768]
122
+
123
+ # Get the similarity scores for the embeddings
124
+ similarities = model.similarity(embeddings, embeddings)
125
+ print(similarities)
126
+ # tensor([[1.0000, 0.5851, 0.2630],
127
+ # [0.5851, 1.0000, 0.3747],
128
+ # [0.2630, 0.3747, 1.0000]])
129
+ ```
130
+
131
+ <!--
132
+ ### Direct Usage (Transformers)
133
+
134
+ <details><summary>Click to see the direct usage in Transformers</summary>
135
+
136
+ </details>
137
+ -->
138
+
139
+ <!--
140
+ ### Downstream Usage (Sentence Transformers)
141
+
142
+ You can finetune this model on your own dataset.
143
+
144
+ <details><summary>Click to expand</summary>
145
+
146
+ </details>
147
+ -->
148
+
149
+ <!--
150
+ ### Out-of-Scope Use
151
+
152
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
153
+ -->
154
+
155
+ <!--
156
+ ## Bias, Risks and Limitations
157
+
158
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
159
+ -->
160
+
161
+ <!--
162
+ ### Recommendations
163
+
164
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
165
+ -->
166
+
167
+ ## Training Details
168
+
169
+ ### Training Dataset
170
+
171
+ #### Unnamed Dataset
172
+
173
+ * Size: 262,023 training samples
174
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
175
+ * Approximate statistics based on the first 1000 samples:
176
+ | | sentence_0 | sentence_1 | label |
177
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------|
178
+ | type | string | string | float |
179
+ | details | <ul><li>min: 5 tokens</li><li>mean: 26.46 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 34.73 tokens</li><li>max: 82 tokens</li></ul> | <ul><li>min: 1.0</li><li>mean: 1.0</li><li>max: 1.0</li></ul> |
180
+ * Samples:
181
+ | sentence_0 | sentence_1 | label |
182
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
183
+ | <code>query: Gilead</code> | <code>passage: Now Elijah the Tishbite, from Tishbe in Gilead, said to Ahab, “As the Lord, the God of Israel, lives, whom I serve, there will be neither dew nor rain in the next few years except at my word.”</code> | <code>1.0</code> |
184
+ | <code>query: Canaanites: The descendants of Canaan, the son of Ham. Migrating from their original home, they seem to have reached the Persian Gulf, and to have there sojourned for some time. They thence “spread to the west, across the mountain chain of Lebanon to the very edge of the Mediterranean Sea, occupying all the land which later became Palestine, also to the north-west as far as the mountain chain of Taurus.</code> | <code>passage: She makes linen garments and sells them,<br> and supplies the merchants with sashes.</code> | <code>1.0</code> |
185
+ | <code>query: who is God</code> | <code>passage: “‘Observe my Sabbaths and have reverence for my sanctuary. I am the Lord.</code> | <code>1.0</code> |
186
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
187
+ ```json
188
+ {
189
+ "scale": 20.0,
190
+ "similarity_fct": "cos_sim",
191
+ "gather_across_devices": false
192
+ }
193
+ ```
194
+
195
+ ### Training Hyperparameters
196
+ #### Non-Default Hyperparameters
197
+
198
+ - `per_device_train_batch_size`: 32
199
+ - `per_device_eval_batch_size`: 32
200
+ - `num_train_epochs`: 1
201
+ - `max_steps`: 150
202
+ - `multi_dataset_batch_sampler`: round_robin
203
+
204
+ #### All Hyperparameters
205
+ <details><summary>Click to expand</summary>
206
+
207
+ - `overwrite_output_dir`: False
208
+ - `do_predict`: False
209
+ - `eval_strategy`: no
210
+ - `prediction_loss_only`: True
211
+ - `per_device_train_batch_size`: 32
212
+ - `per_device_eval_batch_size`: 32
213
+ - `per_gpu_train_batch_size`: None
214
+ - `per_gpu_eval_batch_size`: None
215
+ - `gradient_accumulation_steps`: 1
216
+ - `eval_accumulation_steps`: None
217
+ - `torch_empty_cache_steps`: None
218
+ - `learning_rate`: 5e-05
219
+ - `weight_decay`: 0.0
220
+ - `adam_beta1`: 0.9
221
+ - `adam_beta2`: 0.999
222
+ - `adam_epsilon`: 1e-08
223
+ - `max_grad_norm`: 1
224
+ - `num_train_epochs`: 1
225
+ - `max_steps`: 150
226
+ - `lr_scheduler_type`: linear
227
+ - `lr_scheduler_kwargs`: None
228
+ - `warmup_ratio`: 0.0
229
+ - `warmup_steps`: 0
230
+ - `log_level`: passive
231
+ - `log_level_replica`: warning
232
+ - `log_on_each_node`: True
233
+ - `logging_nan_inf_filter`: True
234
+ - `save_safetensors`: True
235
+ - `save_on_each_node`: False
236
+ - `save_only_model`: False
237
+ - `restore_callback_states_from_checkpoint`: False
238
+ - `no_cuda`: False
239
+ - `use_cpu`: False
240
+ - `use_mps_device`: False
241
+ - `seed`: 42
242
+ - `data_seed`: None
243
+ - `jit_mode_eval`: False
244
+ - `bf16`: False
245
+ - `fp16`: False
246
+ - `fp16_opt_level`: O1
247
+ - `half_precision_backend`: auto
248
+ - `bf16_full_eval`: False
249
+ - `fp16_full_eval`: False
250
+ - `tf32`: None
251
+ - `local_rank`: 0
252
+ - `ddp_backend`: None
253
+ - `tpu_num_cores`: None
254
+ - `tpu_metrics_debug`: False
255
+ - `debug`: []
256
+ - `dataloader_drop_last`: False
257
+ - `dataloader_num_workers`: 0
258
+ - `dataloader_prefetch_factor`: None
259
+ - `past_index`: -1
260
+ - `disable_tqdm`: False
261
+ - `remove_unused_columns`: True
262
+ - `label_names`: None
263
+ - `load_best_model_at_end`: False
264
+ - `ignore_data_skip`: False
265
+ - `fsdp`: []
266
+ - `fsdp_min_num_params`: 0
267
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
268
+ - `fsdp_transformer_layer_cls_to_wrap`: None
269
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
270
+ - `parallelism_config`: None
271
+ - `deepspeed`: None
272
+ - `label_smoothing_factor`: 0.0
273
+ - `optim`: adamw_torch_fused
274
+ - `optim_args`: None
275
+ - `adafactor`: False
276
+ - `group_by_length`: False
277
+ - `length_column_name`: length
278
+ - `project`: huggingface
279
+ - `trackio_space_id`: trackio
280
+ - `ddp_find_unused_parameters`: None
281
+ - `ddp_bucket_cap_mb`: None
282
+ - `ddp_broadcast_buffers`: False
283
+ - `dataloader_pin_memory`: True
284
+ - `dataloader_persistent_workers`: False
285
+ - `skip_memory_metrics`: True
286
+ - `use_legacy_prediction_loop`: False
287
+ - `push_to_hub`: False
288
+ - `resume_from_checkpoint`: None
289
+ - `hub_model_id`: None
290
+ - `hub_strategy`: every_save
291
+ - `hub_private_repo`: None
292
+ - `hub_always_push`: False
293
+ - `hub_revision`: None
294
+ - `gradient_checkpointing`: False
295
+ - `gradient_checkpointing_kwargs`: None
296
+ - `include_inputs_for_metrics`: False
297
+ - `include_for_metrics`: []
298
+ - `eval_do_concat_batches`: True
299
+ - `fp16_backend`: auto
300
+ - `push_to_hub_model_id`: None
301
+ - `push_to_hub_organization`: None
302
+ - `mp_parameters`:
303
+ - `auto_find_batch_size`: False
304
+ - `full_determinism`: False
305
+ - `torchdynamo`: None
306
+ - `ray_scope`: last
307
+ - `ddp_timeout`: 1800
308
+ - `torch_compile`: False
309
+ - `torch_compile_backend`: None
310
+ - `torch_compile_mode`: None
311
+ - `include_tokens_per_second`: False
312
+ - `include_num_input_tokens_seen`: no
313
+ - `neftune_noise_alpha`: None
314
+ - `optim_target_modules`: None
315
+ - `batch_eval_metrics`: False
316
+ - `eval_on_start`: False
317
+ - `use_liger_kernel`: False
318
+ - `liger_kernel_config`: None
319
+ - `eval_use_gather_object`: False
320
+ - `average_tokens_across_devices`: True
321
+ - `prompts`: None
322
+ - `batch_sampler`: batch_sampler
323
+ - `multi_dataset_batch_sampler`: round_robin
324
+ - `router_mapping`: {}
325
+ - `learning_rate_mapping`: {}
326
+
327
+ </details>
328
+
329
+ ### Framework Versions
330
+ - Python: 3.11.14
331
+ - Sentence Transformers: 5.2.0
332
+ - Transformers: 4.57.6
333
+ - PyTorch: 2.10.0+cpu
334
+ - Accelerate: 1.12.0
335
+ - Datasets: 4.5.0
336
+ - Tokenizers: 0.22.2
337
+
338
+ ## Citation
339
+
340
+ ### BibTeX
341
+
342
+ #### Sentence Transformers
343
+ ```bibtex
344
+ @inproceedings{reimers-2019-sentence-bert,
345
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
346
+ author = "Reimers, Nils and Gurevych, Iryna",
347
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
348
+ month = "11",
349
+ year = "2019",
350
+ publisher = "Association for Computational Linguistics",
351
+ url = "https://arxiv.org/abs/1908.10084",
352
+ }
353
+ ```
354
+
355
+ #### MultipleNegativesRankingLoss
356
+ ```bibtex
357
+ @misc{henderson2017efficient,
358
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
359
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
360
+ year={2017},
361
+ eprint={1705.00652},
362
+ archivePrefix={arXiv},
363
+ primaryClass={cs.CL}
364
+ }
365
+ ```
366
+
367
+ <!--
368
+ ## Glossary
369
+
370
+ *Clearly define terms in order to be accessible across audiences.*
371
+ -->
372
+
373
+ <!--
374
+ ## Model Card Authors
375
+
376
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
377
+ -->
378
+
379
+ <!--
380
+ ## Model Card Contact
381
+
382
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
383
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "dtype": "float32",
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "transformers_version": "4.57.6",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.2.0",
5
+ "transformers": "4.57.6",
6
+ "pytorch": "2.10.0+cpu"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9ec5c4f312865ed4c01a04a0180dd2565e03ad4278626e206baccecfdb7346c
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "BertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff