minhleduc commited on
Commit
a105062
·
verified ·
1 Parent(s): 586613c

Add new SentenceTransformer model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,490 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:28990
9
+ - loss:CoSENTLoss
10
+ base_model: jhu-clsp/mmBERT-base
11
+ widget:
12
+ - source_sentence: Bộ Y tế Trung Quốc cho biết thêm 5 người đã chết vì Sars và thêm
13
+ 159 người bị nhiễm bệnh.
14
+ sentences:
15
+ - Hôm thứ Hai, Trung Quốc cho biết thêm 9 người đã chết vì SARS và 160 người nữa
16
+ bị nhiễm virus này.
17
+ - Cộng hòa Hồi giáo là nước áp dụng nhiều hình phạt tử hình nhất trên thế giới sau
18
+ Trung Quốc.
19
+ - Chỉ số Dow Jones Industrial Average tăng 194,14 điểm, hoặc 2,09%, lên 9.469,20
20
+ sau khi giảm hơn 1% một ngày trước đó.
21
+ - source_sentence: Thừa kế là một khái niệm cơ bản trong lập trình hướng đối tượng.
22
+ sentences:
23
+ - Thừa kế trong lập trình hướng đối tượng là một cách để tạo các lớp mới bằng cách
24
+ sử dụng các lớp đã được định nghĩa.
25
+ - Một con chó màu nâu và trắng đang cầm một quả bóng chày trong miệng
26
+ - Tôi cũng đang gặp đúng vấn đề tương tự.
27
+ - source_sentence: hành động bảo tồn hoặc bảo vệ một cái gì đó
28
+ sentences:
29
+ - 'Tôi sẽ nói thẳng ở đây: Bạn không.'
30
+ - Nhưng lợi nhuận khiến thị trường chứng khoán Mỹ trở nên hấp dẫn phản ánh sự cân
31
+ bằng chính trị mong manh.
32
+ - hành động khẳng định hoặc khẳng định hoặc tuyên bố một cái gì đó.
33
+ - source_sentence: Một cậu bé mặc áo thun trắng đang phun nước trong nước nông
34
+ sentences:
35
+ - Một cậu bé đang chạy qua biển ở bãi biển
36
+ - Hai người đang đi xe đạp.
37
+ - Người đàn ông đang chơi với con chó nhỏ lông xù.
38
+ - source_sentence: Một người đàn ông đang ngồi gần một chiếc xe đạp và đang viết một
39
+ ghi chú
40
+ sentences:
41
+ - 'Mức thuế 50 pence của Đảng Lao động: Những điều bạn cần biết.'
42
+ - Các vận động viên khuyết tật chuẩn bị sẵn sàng, làm dấy lên những câu hỏi về hậu
43
+ cần và sự công bằng.
44
+ - Một người đàn ông mặc quần áo được phủ sơn và đang ngồi bên ngoài trong một khu
45
+ vực đông đúc để viết một cái gì đó
46
+ pipeline_tag: sentence-similarity
47
+ library_name: sentence-transformers
48
+ metrics:
49
+ - pearson_cosine
50
+ - spearman_cosine
51
+ model-index:
52
+ - name: SentenceTransformer based on jhu-clsp/mmBERT-base
53
+ results:
54
+ - task:
55
+ type: semantic-similarity
56
+ name: Semantic Similarity
57
+ dataset:
58
+ name: 8Opt sts dev 0001
59
+ type: 8Opt-sts-dev-0001
60
+ metrics:
61
+ - type: pearson_cosine
62
+ value: 0.7112661088179411
63
+ name: Pearson Cosine
64
+ - type: spearman_cosine
65
+ value: 0.7317986017657648
66
+ name: Spearman Cosine
67
+ - task:
68
+ type: semantic-similarity
69
+ name: Semantic Similarity
70
+ dataset:
71
+ name: 8Opt sts test 0002
72
+ type: 8Opt-sts-test-0002
73
+ metrics:
74
+ - type: pearson_cosine
75
+ value: 0.7113078452424898
76
+ name: Pearson Cosine
77
+ - type: spearman_cosine
78
+ value: 0.7318436946978517
79
+ name: Spearman Cosine
80
+ ---
81
+
82
+ # SentenceTransformer based on jhu-clsp/mmBERT-base
83
+
84
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
85
+
86
+ ## Model Details
87
+
88
+ ### Model Description
89
+ - **Model Type:** Sentence Transformer
90
+ - **Base model:** [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base) <!-- at revision c5955035435e2bf121cde7f3c8863ef52ff35d82 -->
91
+ - **Maximum Sequence Length:** 8192 tokens
92
+ - **Output Dimensionality:** 768 dimensions
93
+ - **Similarity Function:** Cosine Similarity
94
+ <!-- - **Training Dataset:** Unknown -->
95
+ <!-- - **Language:** Unknown -->
96
+ <!-- - **License:** Unknown -->
97
+
98
+ ### Model Sources
99
+
100
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
101
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
102
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
103
+
104
+ ### Full Model Architecture
105
+
106
+ ```
107
+ SentenceTransformer(
108
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
109
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
110
+ )
111
+ ```
112
+
113
+ ## Usage
114
+
115
+ ### Direct Usage (Sentence Transformers)
116
+
117
+ First install the Sentence Transformers library:
118
+
119
+ ```bash
120
+ pip install -U sentence-transformers
121
+ ```
122
+
123
+ Then you can load this model and run inference.
124
+ ```python
125
+ from sentence_transformers import SentenceTransformer
126
+
127
+ # Download from the 🤗 Hub
128
+ model = SentenceTransformer("8Opt/mmbert-base-vn-sts-001")
129
+ # Run inference
130
+ sentences = [
131
+ 'Một người đàn ông đang ngồi gần một chiếc xe đạp và đang viết một ghi chú',
132
+ 'Một người đàn ông mặc quần áo được phủ sơn và đang ngồi bên ngoài trong một khu vực đông đúc để viết một cái gì đó',
133
+ 'Các vận động viên khuyết tật chuẩn bị sẵn sàng, làm dấy lên những câu hỏi về hậu cần và sự công bằng.',
134
+ ]
135
+ embeddings = model.encode(sentences)
136
+ print(embeddings.shape)
137
+ # [3, 768]
138
+
139
+ # Get the similarity scores for the embeddings
140
+ similarities = model.similarity(embeddings, embeddings)
141
+ print(similarities)
142
+ # tensor([[1.0000, 0.8503, 0.7796],
143
+ # [0.8503, 1.0000, 0.7599],
144
+ # [0.7796, 0.7599, 1.0000]])
145
+ ```
146
+
147
+ <!--
148
+ ### Direct Usage (Transformers)
149
+
150
+ <details><summary>Click to see the direct usage in Transformers</summary>
151
+
152
+ </details>
153
+ -->
154
+
155
+ <!--
156
+ ### Downstream Usage (Sentence Transformers)
157
+
158
+ You can finetune this model on your own dataset.
159
+
160
+ <details><summary>Click to expand</summary>
161
+
162
+ </details>
163
+ -->
164
+
165
+ <!--
166
+ ### Out-of-Scope Use
167
+
168
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
169
+ -->
170
+
171
+ ## Evaluation
172
+
173
+ ### Metrics
174
+
175
+ #### Semantic Similarity
176
+
177
+ * Datasets: `8Opt-sts-dev-0001` and `8Opt-sts-test-0002`
178
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
179
+
180
+ | Metric | 8Opt-sts-dev-0001 | 8Opt-sts-test-0002 |
181
+ |:--------------------|:------------------|:-------------------|
182
+ | pearson_cosine | 0.7113 | 0.7113 |
183
+ | **spearman_cosine** | **0.7318** | **0.7318** |
184
+
185
+ <!--
186
+ ## Bias, Risks and Limitations
187
+
188
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
189
+ -->
190
+
191
+ <!--
192
+ ### Recommendations
193
+
194
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
195
+ -->
196
+
197
+ ## Training Details
198
+
199
+ ### Training Dataset
200
+
201
+ #### Unnamed Dataset
202
+
203
+ * Size: 28,990 training samples
204
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
205
+ * Approximate statistics based on the first 1000 samples:
206
+ | | sentence1 | sentence2 | score |
207
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
208
+ | type | string | string | float |
209
+ | details | <ul><li>min: 6 tokens</li><li>mean: 18.02 tokens</li><li>max: 74 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 17.77 tokens</li><li>max: 81 tokens</li></ul> | <ul><li>min: 0.04</li><li>mean: 2.6</li><li>max: 5.0</li></ul> |
210
+ * Samples:
211
+ | sentence1 | sentence2 | score |
212
+ |:------------------------------------------------------------|:-------------------------------------------------------------------------------|:-----------------|
213
+ | <code>Một con chó đen đang chạy trong tuyết.</code> | <code>Một con chó đen đang chạy trên bãi biển.</code> | <code>1.8</code> |
214
+ | <code>bóng đèn a tạo ra một khoảng trống</code> | <code>nếu bóng đèn a cháy ra, cả b và c đều không ở trong một đường kín</code> | <code>1.8</code> |
215
+ | <code>Sự phát triển an ninh tại Iraq, ngày 1 tháng 2</code> | <code>Sự phát triển an ninh tại Pakistan, ngày 13 tháng 3</code> | <code>1.6</code> |
216
+ * Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
217
+ ```json
218
+ {
219
+ "scale": 20.0,
220
+ "similarity_fct": "pairwise_cos_sim"
221
+ }
222
+ ```
223
+
224
+ ### Evaluation Dataset
225
+
226
+ #### Unnamed Dataset
227
+
228
+ * Size: 4,141 evaluation samples
229
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
230
+ * Approximate statistics based on the first 1000 samples:
231
+ | | sentence1 | sentence2 | score |
232
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------|
233
+ | type | string | string | float |
234
+ | details | <ul><li>min: 3 tokens</li><li>mean: 18.71 tokens</li><li>max: 107 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 17.54 tokens</li><li>max: 91 tokens</li></ul> | <ul><li>min: 0.04</li><li>mean: 2.56</li><li>max: 5.0</li></ul> |
235
+ * Samples:
236
+ | sentence1 | sentence2 | score |
237
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------|:-----------------|
238
+ | <code>một đơn vị đáp ứng các điều kiện tiên quyết để tham gia vào một sự kiện. một mức độ sửa đổi có thể được bao gồm để chỉ ra bao nhiêu đơn vị vượt quá hoặc không đạt được các yêu cầu tối thiểu.</code> | <code>(thường theo sau là `to ') có phương tiện cần thiết hoặc kỹ năng hoặc bí quyết hoặc thẩm quyền để làm một cái gì đó;</code> | <code>0.4</code> |
239
+ | <code>Tôi sẽ không đưa nó vào hồ sơ của mình.</code> | <code>Tôi sẽ không đưa công việc này vào hồ sơ của mình.</code> | <code>4.0</code> |
240
+ | <code>Một cậu bé trẻ với một chiếc áo khoác chứa tim đang nâng tay lên khi anh ta trượt</code> | <code>Một đứa trẻ tóc vàng đang đi xuống một slide và ném lên tay của mình</code> | <code>3.7</code> |
241
+ * Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
242
+ ```json
243
+ {
244
+ "scale": 20.0,
245
+ "similarity_fct": "pairwise_cos_sim"
246
+ }
247
+ ```
248
+
249
+ ### Training Hyperparameters
250
+ #### Non-Default Hyperparameters
251
+
252
+ - `eval_strategy`: steps
253
+ - `per_device_train_batch_size`: 32
254
+ - `per_device_eval_batch_size`: 32
255
+ - `learning_rate`: 2e-05
256
+ - `num_train_epochs`: 5
257
+ - `warmup_ratio`: 0.1
258
+ - `fp16`: True
259
+ - `batch_sampler`: no_duplicates
260
+
261
+ #### All Hyperparameters
262
+ <details><summary>Click to expand</summary>
263
+
264
+ - `overwrite_output_dir`: False
265
+ - `do_predict`: False
266
+ - `eval_strategy`: steps
267
+ - `prediction_loss_only`: True
268
+ - `per_device_train_batch_size`: 32
269
+ - `per_device_eval_batch_size`: 32
270
+ - `per_gpu_train_batch_size`: None
271
+ - `per_gpu_eval_batch_size`: None
272
+ - `gradient_accumulation_steps`: 1
273
+ - `eval_accumulation_steps`: None
274
+ - `torch_empty_cache_steps`: None
275
+ - `learning_rate`: 2e-05
276
+ - `weight_decay`: 0.0
277
+ - `adam_beta1`: 0.9
278
+ - `adam_beta2`: 0.999
279
+ - `adam_epsilon`: 1e-08
280
+ - `max_grad_norm`: 1.0
281
+ - `num_train_epochs`: 5
282
+ - `max_steps`: -1
283
+ - `lr_scheduler_type`: linear
284
+ - `lr_scheduler_kwargs`: {}
285
+ - `warmup_ratio`: 0.1
286
+ - `warmup_steps`: 0
287
+ - `log_level`: passive
288
+ - `log_level_replica`: warning
289
+ - `log_on_each_node`: True
290
+ - `logging_nan_inf_filter`: True
291
+ - `save_safetensors`: True
292
+ - `save_on_each_node`: False
293
+ - `save_only_model`: False
294
+ - `restore_callback_states_from_checkpoint`: False
295
+ - `no_cuda`: False
296
+ - `use_cpu`: False
297
+ - `use_mps_device`: False
298
+ - `seed`: 42
299
+ - `data_seed`: None
300
+ - `jit_mode_eval`: False
301
+ - `bf16`: False
302
+ - `fp16`: True
303
+ - `fp16_opt_level`: O1
304
+ - `half_precision_backend`: auto
305
+ - `bf16_full_eval`: False
306
+ - `fp16_full_eval`: False
307
+ - `tf32`: None
308
+ - `local_rank`: 0
309
+ - `ddp_backend`: None
310
+ - `tpu_num_cores`: None
311
+ - `tpu_metrics_debug`: False
312
+ - `debug`: []
313
+ - `dataloader_drop_last`: False
314
+ - `dataloader_num_workers`: 0
315
+ - `dataloader_prefetch_factor`: None
316
+ - `past_index`: -1
317
+ - `disable_tqdm`: False
318
+ - `remove_unused_columns`: True
319
+ - `label_names`: None
320
+ - `load_best_model_at_end`: False
321
+ - `ignore_data_skip`: False
322
+ - `fsdp`: []
323
+ - `fsdp_min_num_params`: 0
324
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
325
+ - `fsdp_transformer_layer_cls_to_wrap`: None
326
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
327
+ - `parallelism_config`: None
328
+ - `deepspeed`: None
329
+ - `label_smoothing_factor`: 0.0
330
+ - `optim`: adamw_torch_fused
331
+ - `optim_args`: None
332
+ - `adafactor`: False
333
+ - `group_by_length`: False
334
+ - `length_column_name`: length
335
+ - `project`: huggingface
336
+ - `trackio_space_id`: trackio
337
+ - `ddp_find_unused_parameters`: None
338
+ - `ddp_bucket_cap_mb`: None
339
+ - `ddp_broadcast_buffers`: False
340
+ - `dataloader_pin_memory`: True
341
+ - `dataloader_persistent_workers`: False
342
+ - `skip_memory_metrics`: True
343
+ - `use_legacy_prediction_loop`: False
344
+ - `push_to_hub`: False
345
+ - `resume_from_checkpoint`: None
346
+ - `hub_model_id`: None
347
+ - `hub_strategy`: every_save
348
+ - `hub_private_repo`: None
349
+ - `hub_always_push`: False
350
+ - `hub_revision`: None
351
+ - `gradient_checkpointing`: False
352
+ - `gradient_checkpointing_kwargs`: None
353
+ - `include_inputs_for_metrics`: False
354
+ - `include_for_metrics`: []
355
+ - `eval_do_concat_batches`: True
356
+ - `fp16_backend`: auto
357
+ - `push_to_hub_model_id`: None
358
+ - `push_to_hub_organization`: None
359
+ - `mp_parameters`:
360
+ - `auto_find_batch_size`: False
361
+ - `full_determinism`: False
362
+ - `torchdynamo`: None
363
+ - `ray_scope`: last
364
+ - `ddp_timeout`: 1800
365
+ - `torch_compile`: False
366
+ - `torch_compile_backend`: None
367
+ - `torch_compile_mode`: None
368
+ - `include_tokens_per_second`: False
369
+ - `include_num_input_tokens_seen`: no
370
+ - `neftune_noise_alpha`: None
371
+ - `optim_target_modules`: None
372
+ - `batch_eval_metrics`: False
373
+ - `eval_on_start`: False
374
+ - `use_liger_kernel`: False
375
+ - `liger_kernel_config`: None
376
+ - `eval_use_gather_object`: False
377
+ - `average_tokens_across_devices`: True
378
+ - `prompts`: None
379
+ - `batch_sampler`: no_duplicates
380
+ - `multi_dataset_batch_sampler`: proportional
381
+ - `router_mapping`: {}
382
+ - `learning_rate_mapping`: {}
383
+
384
+ </details>
385
+
386
+ ### Training Logs
387
+ | Epoch | Step | Training Loss | Validation Loss | 8Opt-sts-dev-0001_spearman_cosine | 8Opt-sts-test-0002_spearman_cosine |
388
+ |:------:|:----:|:-------------:|:---------------:|:---------------------------------:|:----------------------------------:|
389
+ | 0.1104 | 100 | 6.2285 | 6.1894 | 0.4639 | - |
390
+ | 0.2208 | 200 | 6.0912 | 6.0358 | 0.5950 | - |
391
+ | 0.3311 | 300 | 6.0572 | 6.0373 | 0.6327 | - |
392
+ | 0.4415 | 400 | 5.9895 | 5.9931 | 0.6654 | - |
393
+ | 0.5519 | 500 | 5.9872 | 5.9771 | 0.6707 | - |
394
+ | 0.6623 | 600 | 5.9583 | 5.9619 | 0.6785 | - |
395
+ | 0.7726 | 700 | 5.9517 | 5.9831 | 0.6930 | - |
396
+ | 0.8830 | 800 | 5.9866 | 5.9433 | 0.6980 | - |
397
+ | 0.9934 | 900 | 5.9541 | 5.9460 | 0.6964 | - |
398
+ | 1.1038 | 1000 | 5.8652 | 5.9386 | 0.7174 | - |
399
+ | 1.2141 | 1100 | 5.8577 | 5.9361 | 0.7077 | - |
400
+ | 1.3245 | 1200 | 5.8518 | 5.9412 | 0.7201 | - |
401
+ | 1.4349 | 1300 | 5.8636 | 5.9255 | 0.7257 | - |
402
+ | 1.5453 | 1400 | 5.8342 | 5.9430 | 0.7134 | - |
403
+ | 1.6556 | 1500 | 5.8309 | 5.9765 | 0.7177 | - |
404
+ | 1.7660 | 1600 | 5.853 | 5.9125 | 0.7289 | - |
405
+ | 1.8764 | 1700 | 5.8369 | 5.9112 | 0.7324 | - |
406
+ | 1.9868 | 1800 | 5.8504 | 5.9032 | 0.7335 | - |
407
+ | 2.0971 | 1900 | 5.7031 | 5.9822 | 0.7343 | - |
408
+ | 2.2075 | 2000 | 5.6934 | 6.0005 | 0.7328 | - |
409
+ | 2.3179 | 2100 | 5.6574 | 6.0913 | 0.7277 | - |
410
+ | 2.4283 | 2200 | 5.6671 | 6.0513 | 0.7255 | - |
411
+ | 2.5386 | 2300 | 5.6632 | 5.9786 | 0.7325 | - |
412
+ | 2.6490 | 2400 | 5.6746 | 6.0000 | 0.7342 | - |
413
+ | 2.7594 | 2500 | 5.6995 | 5.9492 | 0.7366 | - |
414
+ | 2.8698 | 2600 | 5.6814 | 6.0296 | 0.7315 | - |
415
+ | 2.9801 | 2700 | 5.6689 | 6.0508 | 0.7310 | - |
416
+ | 3.0905 | 2800 | 5.4825 | 6.2192 | 0.7296 | - |
417
+ | 3.2009 | 2900 | 5.4686 | 6.2524 | 0.7295 | - |
418
+ | 3.3113 | 3000 | 5.4698 | 6.1861 | 0.7294 | - |
419
+ | 3.4216 | 3100 | 5.4957 | 6.2815 | 0.7296 | - |
420
+ | 3.5320 | 3200 | 5.4993 | 6.2204 | 0.7309 | - |
421
+ | 3.6424 | 3300 | 5.5112 | 6.1372 | 0.7334 | - |
422
+ | 3.7528 | 3400 | 5.5259 | 6.1005 | 0.7337 | - |
423
+ | 3.8631 | 3500 | 5.5144 | 6.2305 | 0.7329 | - |
424
+ | 3.9735 | 3600 | 5.4785 | 6.1930 | 0.7354 | - |
425
+ | 4.0839 | 3700 | 5.367 | 6.5986 | 0.7276 | - |
426
+ | 4.1943 | 3800 | 5.2908 | 6.6695 | 0.7259 | - |
427
+ | 4.3046 | 3900 | 5.3125 | 6.6357 | 0.7264 | - |
428
+ | 4.4150 | 4000 | 5.2967 | 6.6588 | 0.7296 | - |
429
+ | 4.5254 | 4100 | 5.3019 | 6.6631 | 0.7313 | - |
430
+ | 4.6358 | 4200 | 5.2951 | 6.7149 | 0.7327 | - |
431
+ | 4.7461 | 4300 | 5.2609 | 6.7235 | 0.7323 | - |
432
+ | 4.8565 | 4400 | 5.2969 | 6.6987 | 0.7319 | - |
433
+ | 4.9669 | 4500 | 5.2938 | 6.7005 | 0.7318 | - |
434
+ | -1 | -1 | - | - | - | 0.7318 |
435
+
436
+
437
+ ### Framework Versions
438
+ - Python: 3.12.12
439
+ - Sentence Transformers: 5.1.2
440
+ - Transformers: 4.57.1
441
+ - PyTorch: 2.8.0+cu126
442
+ - Accelerate: 1.11.0
443
+ - Datasets: 4.0.0
444
+ - Tokenizers: 0.22.1
445
+
446
+ ## Citation
447
+
448
+ ### BibTeX
449
+
450
+ #### Sentence Transformers
451
+ ```bibtex
452
+ @inproceedings{reimers-2019-sentence-bert,
453
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
454
+ author = "Reimers, Nils and Gurevych, Iryna",
455
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
456
+ month = "11",
457
+ year = "2019",
458
+ publisher = "Association for Computational Linguistics",
459
+ url = "https://arxiv.org/abs/1908.10084",
460
+ }
461
+ ```
462
+
463
+ #### CoSENTLoss
464
+ ```bibtex
465
+ @article{10531646,
466
+ author={Huang, Xiang and Peng, Hao and Zou, Dongcheng and Liu, Zhiwei and Li, Jianxin and Liu, Kay and Wu, Jia and Su, Jianlin and Yu, Philip S.},
467
+ journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
468
+ title={CoSENT: Consistent Sentence Embedding via Similarity Ranking},
469
+ year={2024},
470
+ doi={10.1109/TASLP.2024.3402087}
471
+ }
472
+ ```
473
+
474
+ <!--
475
+ ## Glossary
476
+
477
+ *Clearly define terms in order to be accessible across audiences.*
478
+ -->
479
+
480
+ <!--
481
+ ## Model Card Authors
482
+
483
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
484
+ -->
485
+
486
+ <!--
487
+ ## Model Card Contact
488
+
489
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
490
+ -->
config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModernBertModel"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 2,
8
+ "classifier_activation": "gelu",
9
+ "classifier_bias": false,
10
+ "classifier_dropout": 0.0,
11
+ "classifier_pooling": "mean",
12
+ "cls_token_id": 1,
13
+ "decoder_bias": true,
14
+ "deterministic_flash_attn": false,
15
+ "dtype": "float32",
16
+ "embedding_dropout": 0.0,
17
+ "eos_token_id": 1,
18
+ "global_attn_every_n_layers": 3,
19
+ "global_rope_theta": 160000,
20
+ "gradient_checkpointing": false,
21
+ "hidden_activation": "gelu",
22
+ "hidden_size": 768,
23
+ "initializer_cutoff_factor": 2.0,
24
+ "initializer_range": 0.02,
25
+ "intermediate_size": 1152,
26
+ "layer_norm_eps": 1e-05,
27
+ "local_attention": 128,
28
+ "local_rope_theta": 160000,
29
+ "mask_token_id": 4,
30
+ "max_position_embeddings": 8192,
31
+ "mlp_bias": false,
32
+ "mlp_dropout": 0.0,
33
+ "model_type": "modernbert",
34
+ "norm_bias": false,
35
+ "norm_eps": 1e-05,
36
+ "num_attention_heads": 12,
37
+ "num_hidden_layers": 22,
38
+ "pad_token_id": 0,
39
+ "position_embedding_type": "sans_pos",
40
+ "repad_logits_with_grad": false,
41
+ "sep_token_id": 1,
42
+ "sparse_pred_ignore_index": -100,
43
+ "sparse_prediction": false,
44
+ "transformers_version": "4.57.1",
45
+ "vocab_size": 256000
46
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.1.2",
5
+ "transformers": "4.57.1",
6
+ "pytorch": "2.8.0+cu126"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2333ff1434287564fe6111ab39569dbd89f5182526cfcaacb5a1f118240ab335
3
+ size 1227771776
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<start_of_turn>",
4
+ "<end_of_turn>"
5
+ ],
6
+ "bos_token": {
7
+ "content": "<bos>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false
12
+ },
13
+ "cls_token": {
14
+ "content": "<bos>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ },
20
+ "eos_token": {
21
+ "content": "<eos>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false
26
+ },
27
+ "mask_token": {
28
+ "content": "<mask>",
29
+ "lstrip": true,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ },
34
+ "pad_token": {
35
+ "content": "<pad>",
36
+ "lstrip": false,
37
+ "normalized": false,
38
+ "rstrip": false,
39
+ "single_word": false
40
+ },
41
+ "sep_token": {
42
+ "content": "<eos>",
43
+ "lstrip": false,
44
+ "normalized": false,
45
+ "rstrip": false,
46
+ "single_word": false
47
+ },
48
+ "unk_token": {
49
+ "content": "<unk>",
50
+ "lstrip": false,
51
+ "normalized": false,
52
+ "rstrip": false,
53
+ "single_word": false
54
+ }
55
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8bd47075711f75a143d1b78e01a41cc65c1c591b00d3cfeffc23db07adce1392
3
+ size 34363442
tokenizer_config.json ADDED
@@ -0,0 +1,2018 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<pad>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<eos>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<bos>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "5": {
45
+ "content": "<2mass>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": false
51
+ },
52
+ "6": {
53
+ "content": "[@BOS@]",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": false
59
+ },
60
+ "7": {
61
+ "content": "<unused0>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": false
67
+ },
68
+ "8": {
69
+ "content": "<unused1>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": false
75
+ },
76
+ "9": {
77
+ "content": "<unused2>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": false
83
+ },
84
+ "10": {
85
+ "content": "<unused3>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": false
91
+ },
92
+ "11": {
93
+ "content": "<unused4>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": false
99
+ },
100
+ "12": {
101
+ "content": "<unused5>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": false
107
+ },
108
+ "13": {
109
+ "content": "<unused6>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": false
115
+ },
116
+ "14": {
117
+ "content": "<unused7>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": false
123
+ },
124
+ "15": {
125
+ "content": "<unused8>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": false
131
+ },
132
+ "16": {
133
+ "content": "<unused9>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": false
139
+ },
140
+ "17": {
141
+ "content": "<unused10>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": false
147
+ },
148
+ "18": {
149
+ "content": "<unused11>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": false
155
+ },
156
+ "19": {
157
+ "content": "<unused12>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": false
163
+ },
164
+ "20": {
165
+ "content": "<unused13>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": false
171
+ },
172
+ "21": {
173
+ "content": "<unused14>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": false
179
+ },
180
+ "22": {
181
+ "content": "<unused15>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": false
187
+ },
188
+ "23": {
189
+ "content": "<unused16>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": false
195
+ },
196
+ "24": {
197
+ "content": "<unused17>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": false
203
+ },
204
+ "25": {
205
+ "content": "<unused18>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": false
211
+ },
212
+ "26": {
213
+ "content": "<unused19>",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": false
219
+ },
220
+ "27": {
221
+ "content": "<unused20>",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": false
227
+ },
228
+ "28": {
229
+ "content": "<unused21>",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": false
235
+ },
236
+ "29": {
237
+ "content": "<unused22>",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": false
243
+ },
244
+ "30": {
245
+ "content": "<unused23>",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": false
251
+ },
252
+ "31": {
253
+ "content": "<unused24>",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": false
259
+ },
260
+ "32": {
261
+ "content": "<unused25>",
262
+ "lstrip": false,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": false
267
+ },
268
+ "33": {
269
+ "content": "<unused26>",
270
+ "lstrip": false,
271
+ "normalized": false,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": false
275
+ },
276
+ "34": {
277
+ "content": "<unused27>",
278
+ "lstrip": false,
279
+ "normalized": false,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": false
283
+ },
284
+ "35": {
285
+ "content": "<unused28>",
286
+ "lstrip": false,
287
+ "normalized": false,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": false
291
+ },
292
+ "36": {
293
+ "content": "<unused29>",
294
+ "lstrip": false,
295
+ "normalized": false,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": false
299
+ },
300
+ "37": {
301
+ "content": "<unused30>",
302
+ "lstrip": false,
303
+ "normalized": false,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": false
307
+ },
308
+ "38": {
309
+ "content": "<unused31>",
310
+ "lstrip": false,
311
+ "normalized": false,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": false
315
+ },
316
+ "39": {
317
+ "content": "<unused32>",
318
+ "lstrip": false,
319
+ "normalized": false,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": false
323
+ },
324
+ "40": {
325
+ "content": "<unused33>",
326
+ "lstrip": false,
327
+ "normalized": false,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": false
331
+ },
332
+ "41": {
333
+ "content": "<unused34>",
334
+ "lstrip": false,
335
+ "normalized": false,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": false
339
+ },
340
+ "42": {
341
+ "content": "<unused35>",
342
+ "lstrip": false,
343
+ "normalized": false,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": false
347
+ },
348
+ "43": {
349
+ "content": "<unused36>",
350
+ "lstrip": false,
351
+ "normalized": false,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": false
355
+ },
356
+ "44": {
357
+ "content": "<unused37>",
358
+ "lstrip": false,
359
+ "normalized": false,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": false
363
+ },
364
+ "45": {
365
+ "content": "<unused38>",
366
+ "lstrip": false,
367
+ "normalized": false,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": false
371
+ },
372
+ "46": {
373
+ "content": "<unused39>",
374
+ "lstrip": false,
375
+ "normalized": false,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": false
379
+ },
380
+ "47": {
381
+ "content": "<unused40>",
382
+ "lstrip": false,
383
+ "normalized": false,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": false
387
+ },
388
+ "48": {
389
+ "content": "<unused41>",
390
+ "lstrip": false,
391
+ "normalized": false,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": false
395
+ },
396
+ "49": {
397
+ "content": "<unused42>",
398
+ "lstrip": false,
399
+ "normalized": false,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": false
403
+ },
404
+ "50": {
405
+ "content": "<unused43>",
406
+ "lstrip": false,
407
+ "normalized": false,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": false
411
+ },
412
+ "51": {
413
+ "content": "<unused44>",
414
+ "lstrip": false,
415
+ "normalized": false,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": false
419
+ },
420
+ "52": {
421
+ "content": "<unused45>",
422
+ "lstrip": false,
423
+ "normalized": false,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": false
427
+ },
428
+ "53": {
429
+ "content": "<unused46>",
430
+ "lstrip": false,
431
+ "normalized": false,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": false
435
+ },
436
+ "54": {
437
+ "content": "<unused47>",
438
+ "lstrip": false,
439
+ "normalized": false,
440
+ "rstrip": false,
441
+ "single_word": false,
442
+ "special": false
443
+ },
444
+ "55": {
445
+ "content": "<unused48>",
446
+ "lstrip": false,
447
+ "normalized": false,
448
+ "rstrip": false,
449
+ "single_word": false,
450
+ "special": false
451
+ },
452
+ "56": {
453
+ "content": "<unused49>",
454
+ "lstrip": false,
455
+ "normalized": false,
456
+ "rstrip": false,
457
+ "single_word": false,
458
+ "special": false
459
+ },
460
+ "57": {
461
+ "content": "<unused50>",
462
+ "lstrip": false,
463
+ "normalized": false,
464
+ "rstrip": false,
465
+ "single_word": false,
466
+ "special": false
467
+ },
468
+ "58": {
469
+ "content": "<unused51>",
470
+ "lstrip": false,
471
+ "normalized": false,
472
+ "rstrip": false,
473
+ "single_word": false,
474
+ "special": false
475
+ },
476
+ "59": {
477
+ "content": "<unused52>",
478
+ "lstrip": false,
479
+ "normalized": false,
480
+ "rstrip": false,
481
+ "single_word": false,
482
+ "special": false
483
+ },
484
+ "60": {
485
+ "content": "<unused53>",
486
+ "lstrip": false,
487
+ "normalized": false,
488
+ "rstrip": false,
489
+ "single_word": false,
490
+ "special": false
491
+ },
492
+ "61": {
493
+ "content": "<unused54>",
494
+ "lstrip": false,
495
+ "normalized": false,
496
+ "rstrip": false,
497
+ "single_word": false,
498
+ "special": false
499
+ },
500
+ "62": {
501
+ "content": "<unused55>",
502
+ "lstrip": false,
503
+ "normalized": false,
504
+ "rstrip": false,
505
+ "single_word": false,
506
+ "special": false
507
+ },
508
+ "63": {
509
+ "content": "<unused56>",
510
+ "lstrip": false,
511
+ "normalized": false,
512
+ "rstrip": false,
513
+ "single_word": false,
514
+ "special": false
515
+ },
516
+ "64": {
517
+ "content": "<unused57>",
518
+ "lstrip": false,
519
+ "normalized": false,
520
+ "rstrip": false,
521
+ "single_word": false,
522
+ "special": false
523
+ },
524
+ "65": {
525
+ "content": "<unused58>",
526
+ "lstrip": false,
527
+ "normalized": false,
528
+ "rstrip": false,
529
+ "single_word": false,
530
+ "special": false
531
+ },
532
+ "66": {
533
+ "content": "<unused59>",
534
+ "lstrip": false,
535
+ "normalized": false,
536
+ "rstrip": false,
537
+ "single_word": false,
538
+ "special": false
539
+ },
540
+ "67": {
541
+ "content": "<unused60>",
542
+ "lstrip": false,
543
+ "normalized": false,
544
+ "rstrip": false,
545
+ "single_word": false,
546
+ "special": false
547
+ },
548
+ "68": {
549
+ "content": "<unused61>",
550
+ "lstrip": false,
551
+ "normalized": false,
552
+ "rstrip": false,
553
+ "single_word": false,
554
+ "special": false
555
+ },
556
+ "69": {
557
+ "content": "<unused62>",
558
+ "lstrip": false,
559
+ "normalized": false,
560
+ "rstrip": false,
561
+ "single_word": false,
562
+ "special": false
563
+ },
564
+ "70": {
565
+ "content": "<unused63>",
566
+ "lstrip": false,
567
+ "normalized": false,
568
+ "rstrip": false,
569
+ "single_word": false,
570
+ "special": false
571
+ },
572
+ "71": {
573
+ "content": "<unused64>",
574
+ "lstrip": false,
575
+ "normalized": false,
576
+ "rstrip": false,
577
+ "single_word": false,
578
+ "special": false
579
+ },
580
+ "72": {
581
+ "content": "<unused65>",
582
+ "lstrip": false,
583
+ "normalized": false,
584
+ "rstrip": false,
585
+ "single_word": false,
586
+ "special": false
587
+ },
588
+ "73": {
589
+ "content": "<unused66>",
590
+ "lstrip": false,
591
+ "normalized": false,
592
+ "rstrip": false,
593
+ "single_word": false,
594
+ "special": false
595
+ },
596
+ "74": {
597
+ "content": "<unused67>",
598
+ "lstrip": false,
599
+ "normalized": false,
600
+ "rstrip": false,
601
+ "single_word": false,
602
+ "special": false
603
+ },
604
+ "75": {
605
+ "content": "<unused68>",
606
+ "lstrip": false,
607
+ "normalized": false,
608
+ "rstrip": false,
609
+ "single_word": false,
610
+ "special": false
611
+ },
612
+ "76": {
613
+ "content": "<unused69>",
614
+ "lstrip": false,
615
+ "normalized": false,
616
+ "rstrip": false,
617
+ "single_word": false,
618
+ "special": false
619
+ },
620
+ "77": {
621
+ "content": "<unused70>",
622
+ "lstrip": false,
623
+ "normalized": false,
624
+ "rstrip": false,
625
+ "single_word": false,
626
+ "special": false
627
+ },
628
+ "78": {
629
+ "content": "<unused71>",
630
+ "lstrip": false,
631
+ "normalized": false,
632
+ "rstrip": false,
633
+ "single_word": false,
634
+ "special": false
635
+ },
636
+ "79": {
637
+ "content": "<unused72>",
638
+ "lstrip": false,
639
+ "normalized": false,
640
+ "rstrip": false,
641
+ "single_word": false,
642
+ "special": false
643
+ },
644
+ "80": {
645
+ "content": "<unused73>",
646
+ "lstrip": false,
647
+ "normalized": false,
648
+ "rstrip": false,
649
+ "single_word": false,
650
+ "special": false
651
+ },
652
+ "81": {
653
+ "content": "<unused74>",
654
+ "lstrip": false,
655
+ "normalized": false,
656
+ "rstrip": false,
657
+ "single_word": false,
658
+ "special": false
659
+ },
660
+ "82": {
661
+ "content": "<unused75>",
662
+ "lstrip": false,
663
+ "normalized": false,
664
+ "rstrip": false,
665
+ "single_word": false,
666
+ "special": false
667
+ },
668
+ "83": {
669
+ "content": "<unused76>",
670
+ "lstrip": false,
671
+ "normalized": false,
672
+ "rstrip": false,
673
+ "single_word": false,
674
+ "special": false
675
+ },
676
+ "84": {
677
+ "content": "<unused77>",
678
+ "lstrip": false,
679
+ "normalized": false,
680
+ "rstrip": false,
681
+ "single_word": false,
682
+ "special": false
683
+ },
684
+ "85": {
685
+ "content": "<unused78>",
686
+ "lstrip": false,
687
+ "normalized": false,
688
+ "rstrip": false,
689
+ "single_word": false,
690
+ "special": false
691
+ },
692
+ "86": {
693
+ "content": "<unused79>",
694
+ "lstrip": false,
695
+ "normalized": false,
696
+ "rstrip": false,
697
+ "single_word": false,
698
+ "special": false
699
+ },
700
+ "87": {
701
+ "content": "<unused80>",
702
+ "lstrip": false,
703
+ "normalized": false,
704
+ "rstrip": false,
705
+ "single_word": false,
706
+ "special": false
707
+ },
708
+ "88": {
709
+ "content": "<unused81>",
710
+ "lstrip": false,
711
+ "normalized": false,
712
+ "rstrip": false,
713
+ "single_word": false,
714
+ "special": false
715
+ },
716
+ "89": {
717
+ "content": "<unused82>",
718
+ "lstrip": false,
719
+ "normalized": false,
720
+ "rstrip": false,
721
+ "single_word": false,
722
+ "special": false
723
+ },
724
+ "90": {
725
+ "content": "<unused83>",
726
+ "lstrip": false,
727
+ "normalized": false,
728
+ "rstrip": false,
729
+ "single_word": false,
730
+ "special": false
731
+ },
732
+ "91": {
733
+ "content": "<unused84>",
734
+ "lstrip": false,
735
+ "normalized": false,
736
+ "rstrip": false,
737
+ "single_word": false,
738
+ "special": false
739
+ },
740
+ "92": {
741
+ "content": "<unused85>",
742
+ "lstrip": false,
743
+ "normalized": false,
744
+ "rstrip": false,
745
+ "single_word": false,
746
+ "special": false
747
+ },
748
+ "93": {
749
+ "content": "<unused86>",
750
+ "lstrip": false,
751
+ "normalized": false,
752
+ "rstrip": false,
753
+ "single_word": false,
754
+ "special": false
755
+ },
756
+ "94": {
757
+ "content": "<unused87>",
758
+ "lstrip": false,
759
+ "normalized": false,
760
+ "rstrip": false,
761
+ "single_word": false,
762
+ "special": false
763
+ },
764
+ "95": {
765
+ "content": "<unused88>",
766
+ "lstrip": false,
767
+ "normalized": false,
768
+ "rstrip": false,
769
+ "single_word": false,
770
+ "special": false
771
+ },
772
+ "96": {
773
+ "content": "<unused89>",
774
+ "lstrip": false,
775
+ "normalized": false,
776
+ "rstrip": false,
777
+ "single_word": false,
778
+ "special": false
779
+ },
780
+ "97": {
781
+ "content": "<unused90>",
782
+ "lstrip": false,
783
+ "normalized": false,
784
+ "rstrip": false,
785
+ "single_word": false,
786
+ "special": false
787
+ },
788
+ "98": {
789
+ "content": "<unused91>",
790
+ "lstrip": false,
791
+ "normalized": false,
792
+ "rstrip": false,
793
+ "single_word": false,
794
+ "special": false
795
+ },
796
+ "99": {
797
+ "content": "<unused92>",
798
+ "lstrip": false,
799
+ "normalized": false,
800
+ "rstrip": false,
801
+ "single_word": false,
802
+ "special": false
803
+ },
804
+ "100": {
805
+ "content": "<unused93>",
806
+ "lstrip": false,
807
+ "normalized": false,
808
+ "rstrip": false,
809
+ "single_word": false,
810
+ "special": false
811
+ },
812
+ "101": {
813
+ "content": "<unused94>",
814
+ "lstrip": false,
815
+ "normalized": false,
816
+ "rstrip": false,
817
+ "single_word": false,
818
+ "special": false
819
+ },
820
+ "102": {
821
+ "content": "<unused95>",
822
+ "lstrip": false,
823
+ "normalized": false,
824
+ "rstrip": false,
825
+ "single_word": false,
826
+ "special": false
827
+ },
828
+ "103": {
829
+ "content": "<unused96>",
830
+ "lstrip": false,
831
+ "normalized": false,
832
+ "rstrip": false,
833
+ "single_word": false,
834
+ "special": false
835
+ },
836
+ "104": {
837
+ "content": "<unused97>",
838
+ "lstrip": false,
839
+ "normalized": false,
840
+ "rstrip": false,
841
+ "single_word": false,
842
+ "special": false
843
+ },
844
+ "105": {
845
+ "content": "<unused98>",
846
+ "lstrip": false,
847
+ "normalized": false,
848
+ "rstrip": false,
849
+ "single_word": false,
850
+ "special": false
851
+ },
852
+ "106": {
853
+ "content": "<start_of_turn>",
854
+ "lstrip": false,
855
+ "normalized": false,
856
+ "rstrip": false,
857
+ "single_word": false,
858
+ "special": true
859
+ },
860
+ "107": {
861
+ "content": "<end_of_turn>",
862
+ "lstrip": false,
863
+ "normalized": false,
864
+ "rstrip": false,
865
+ "single_word": false,
866
+ "special": true
867
+ },
868
+ "108": {
869
+ "content": "\n",
870
+ "lstrip": false,
871
+ "normalized": false,
872
+ "rstrip": false,
873
+ "single_word": false,
874
+ "special": false
875
+ },
876
+ "109": {
877
+ "content": "\n\n",
878
+ "lstrip": false,
879
+ "normalized": false,
880
+ "rstrip": false,
881
+ "single_word": false,
882
+ "special": false
883
+ },
884
+ "110": {
885
+ "content": "\n\n\n",
886
+ "lstrip": false,
887
+ "normalized": false,
888
+ "rstrip": false,
889
+ "single_word": false,
890
+ "special": false
891
+ },
892
+ "111": {
893
+ "content": "\n\n\n\n",
894
+ "lstrip": false,
895
+ "normalized": false,
896
+ "rstrip": false,
897
+ "single_word": false,
898
+ "special": false
899
+ },
900
+ "112": {
901
+ "content": "\n\n\n\n\n",
902
+ "lstrip": false,
903
+ "normalized": false,
904
+ "rstrip": false,
905
+ "single_word": false,
906
+ "special": false
907
+ },
908
+ "113": {
909
+ "content": "\n\n\n\n\n\n",
910
+ "lstrip": false,
911
+ "normalized": false,
912
+ "rstrip": false,
913
+ "single_word": false,
914
+ "special": false
915
+ },
916
+ "114": {
917
+ "content": "\n\n\n\n\n\n\n",
918
+ "lstrip": false,
919
+ "normalized": false,
920
+ "rstrip": false,
921
+ "single_word": false,
922
+ "special": false
923
+ },
924
+ "115": {
925
+ "content": "\n\n\n\n\n\n\n\n",
926
+ "lstrip": false,
927
+ "normalized": false,
928
+ "rstrip": false,
929
+ "single_word": false,
930
+ "special": false
931
+ },
932
+ "116": {
933
+ "content": "\n\n\n\n\n\n\n\n\n",
934
+ "lstrip": false,
935
+ "normalized": false,
936
+ "rstrip": false,
937
+ "single_word": false,
938
+ "special": false
939
+ },
940
+ "117": {
941
+ "content": "\n\n\n\n\n\n\n\n\n\n",
942
+ "lstrip": false,
943
+ "normalized": false,
944
+ "rstrip": false,
945
+ "single_word": false,
946
+ "special": false
947
+ },
948
+ "118": {
949
+ "content": "\n\n\n\n\n\n\n\n\n\n\n",
950
+ "lstrip": false,
951
+ "normalized": false,
952
+ "rstrip": false,
953
+ "single_word": false,
954
+ "special": false
955
+ },
956
+ "119": {
957
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n",
958
+ "lstrip": false,
959
+ "normalized": false,
960
+ "rstrip": false,
961
+ "single_word": false,
962
+ "special": false
963
+ },
964
+ "120": {
965
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n",
966
+ "lstrip": false,
967
+ "normalized": false,
968
+ "rstrip": false,
969
+ "single_word": false,
970
+ "special": false
971
+ },
972
+ "121": {
973
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
974
+ "lstrip": false,
975
+ "normalized": false,
976
+ "rstrip": false,
977
+ "single_word": false,
978
+ "special": false
979
+ },
980
+ "122": {
981
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
982
+ "lstrip": false,
983
+ "normalized": false,
984
+ "rstrip": false,
985
+ "single_word": false,
986
+ "special": false
987
+ },
988
+ "123": {
989
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
990
+ "lstrip": false,
991
+ "normalized": false,
992
+ "rstrip": false,
993
+ "single_word": false,
994
+ "special": false
995
+ },
996
+ "124": {
997
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
998
+ "lstrip": false,
999
+ "normalized": false,
1000
+ "rstrip": false,
1001
+ "single_word": false,
1002
+ "special": false
1003
+ },
1004
+ "125": {
1005
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1006
+ "lstrip": false,
1007
+ "normalized": false,
1008
+ "rstrip": false,
1009
+ "single_word": false,
1010
+ "special": false
1011
+ },
1012
+ "126": {
1013
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1014
+ "lstrip": false,
1015
+ "normalized": false,
1016
+ "rstrip": false,
1017
+ "single_word": false,
1018
+ "special": false
1019
+ },
1020
+ "127": {
1021
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1022
+ "lstrip": false,
1023
+ "normalized": false,
1024
+ "rstrip": false,
1025
+ "single_word": false,
1026
+ "special": false
1027
+ },
1028
+ "128": {
1029
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1030
+ "lstrip": false,
1031
+ "normalized": false,
1032
+ "rstrip": false,
1033
+ "single_word": false,
1034
+ "special": false
1035
+ },
1036
+ "129": {
1037
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1038
+ "lstrip": false,
1039
+ "normalized": false,
1040
+ "rstrip": false,
1041
+ "single_word": false,
1042
+ "special": false
1043
+ },
1044
+ "130": {
1045
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1046
+ "lstrip": false,
1047
+ "normalized": false,
1048
+ "rstrip": false,
1049
+ "single_word": false,
1050
+ "special": false
1051
+ },
1052
+ "131": {
1053
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1054
+ "lstrip": false,
1055
+ "normalized": false,
1056
+ "rstrip": false,
1057
+ "single_word": false,
1058
+ "special": false
1059
+ },
1060
+ "132": {
1061
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1062
+ "lstrip": false,
1063
+ "normalized": false,
1064
+ "rstrip": false,
1065
+ "single_word": false,
1066
+ "special": false
1067
+ },
1068
+ "133": {
1069
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1070
+ "lstrip": false,
1071
+ "normalized": false,
1072
+ "rstrip": false,
1073
+ "single_word": false,
1074
+ "special": false
1075
+ },
1076
+ "134": {
1077
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1078
+ "lstrip": false,
1079
+ "normalized": false,
1080
+ "rstrip": false,
1081
+ "single_word": false,
1082
+ "special": false
1083
+ },
1084
+ "135": {
1085
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1086
+ "lstrip": false,
1087
+ "normalized": false,
1088
+ "rstrip": false,
1089
+ "single_word": false,
1090
+ "special": false
1091
+ },
1092
+ "136": {
1093
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1094
+ "lstrip": false,
1095
+ "normalized": false,
1096
+ "rstrip": false,
1097
+ "single_word": false,
1098
+ "special": false
1099
+ },
1100
+ "137": {
1101
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1102
+ "lstrip": false,
1103
+ "normalized": false,
1104
+ "rstrip": false,
1105
+ "single_word": false,
1106
+ "special": false
1107
+ },
1108
+ "138": {
1109
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1110
+ "lstrip": false,
1111
+ "normalized": false,
1112
+ "rstrip": false,
1113
+ "single_word": false,
1114
+ "special": false
1115
+ },
1116
+ "139": {
1117
+ "content": "▁▁",
1118
+ "lstrip": false,
1119
+ "normalized": false,
1120
+ "rstrip": false,
1121
+ "single_word": false,
1122
+ "special": false
1123
+ },
1124
+ "140": {
1125
+ "content": "▁▁▁",
1126
+ "lstrip": false,
1127
+ "normalized": false,
1128
+ "rstrip": false,
1129
+ "single_word": false,
1130
+ "special": false
1131
+ },
1132
+ "141": {
1133
+ "content": "▁▁▁▁",
1134
+ "lstrip": false,
1135
+ "normalized": false,
1136
+ "rstrip": false,
1137
+ "single_word": false,
1138
+ "special": false
1139
+ },
1140
+ "142": {
1141
+ "content": "▁▁▁▁▁",
1142
+ "lstrip": false,
1143
+ "normalized": false,
1144
+ "rstrip": false,
1145
+ "single_word": false,
1146
+ "special": false
1147
+ },
1148
+ "143": {
1149
+ "content": "▁▁▁▁▁▁",
1150
+ "lstrip": false,
1151
+ "normalized": false,
1152
+ "rstrip": false,
1153
+ "single_word": false,
1154
+ "special": false
1155
+ },
1156
+ "144": {
1157
+ "content": "▁▁▁▁▁▁▁",
1158
+ "lstrip": false,
1159
+ "normalized": false,
1160
+ "rstrip": false,
1161
+ "single_word": false,
1162
+ "special": false
1163
+ },
1164
+ "145": {
1165
+ "content": "▁▁▁▁▁▁▁▁",
1166
+ "lstrip": false,
1167
+ "normalized": false,
1168
+ "rstrip": false,
1169
+ "single_word": false,
1170
+ "special": false
1171
+ },
1172
+ "146": {
1173
+ "content": "▁▁▁▁▁▁▁▁▁",
1174
+ "lstrip": false,
1175
+ "normalized": false,
1176
+ "rstrip": false,
1177
+ "single_word": false,
1178
+ "special": false
1179
+ },
1180
+ "147": {
1181
+ "content": "▁▁▁▁▁▁▁▁▁▁",
1182
+ "lstrip": false,
1183
+ "normalized": false,
1184
+ "rstrip": false,
1185
+ "single_word": false,
1186
+ "special": false
1187
+ },
1188
+ "148": {
1189
+ "content": "▁▁▁▁▁▁▁▁▁▁▁",
1190
+ "lstrip": false,
1191
+ "normalized": false,
1192
+ "rstrip": false,
1193
+ "single_word": false,
1194
+ "special": false
1195
+ },
1196
+ "149": {
1197
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁",
1198
+ "lstrip": false,
1199
+ "normalized": false,
1200
+ "rstrip": false,
1201
+ "single_word": false,
1202
+ "special": false
1203
+ },
1204
+ "150": {
1205
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁",
1206
+ "lstrip": false,
1207
+ "normalized": false,
1208
+ "rstrip": false,
1209
+ "single_word": false,
1210
+ "special": false
1211
+ },
1212
+ "151": {
1213
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1214
+ "lstrip": false,
1215
+ "normalized": false,
1216
+ "rstrip": false,
1217
+ "single_word": false,
1218
+ "special": false
1219
+ },
1220
+ "152": {
1221
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1222
+ "lstrip": false,
1223
+ "normalized": false,
1224
+ "rstrip": false,
1225
+ "single_word": false,
1226
+ "special": false
1227
+ },
1228
+ "153": {
1229
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1230
+ "lstrip": false,
1231
+ "normalized": false,
1232
+ "rstrip": false,
1233
+ "single_word": false,
1234
+ "special": false
1235
+ },
1236
+ "154": {
1237
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1238
+ "lstrip": false,
1239
+ "normalized": false,
1240
+ "rstrip": false,
1241
+ "single_word": false,
1242
+ "special": false
1243
+ },
1244
+ "155": {
1245
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1246
+ "lstrip": false,
1247
+ "normalized": false,
1248
+ "rstrip": false,
1249
+ "single_word": false,
1250
+ "special": false
1251
+ },
1252
+ "156": {
1253
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1254
+ "lstrip": false,
1255
+ "normalized": false,
1256
+ "rstrip": false,
1257
+ "single_word": false,
1258
+ "special": false
1259
+ },
1260
+ "157": {
1261
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1262
+ "lstrip": false,
1263
+ "normalized": false,
1264
+ "rstrip": false,
1265
+ "single_word": false,
1266
+ "special": false
1267
+ },
1268
+ "158": {
1269
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1270
+ "lstrip": false,
1271
+ "normalized": false,
1272
+ "rstrip": false,
1273
+ "single_word": false,
1274
+ "special": false
1275
+ },
1276
+ "159": {
1277
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1278
+ "lstrip": false,
1279
+ "normalized": false,
1280
+ "rstrip": false,
1281
+ "single_word": false,
1282
+ "special": false
1283
+ },
1284
+ "160": {
1285
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1286
+ "lstrip": false,
1287
+ "normalized": false,
1288
+ "rstrip": false,
1289
+ "single_word": false,
1290
+ "special": false
1291
+ },
1292
+ "161": {
1293
+ "content": "▁▁▁���▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1294
+ "lstrip": false,
1295
+ "normalized": false,
1296
+ "rstrip": false,
1297
+ "single_word": false,
1298
+ "special": false
1299
+ },
1300
+ "162": {
1301
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1302
+ "lstrip": false,
1303
+ "normalized": false,
1304
+ "rstrip": false,
1305
+ "single_word": false,
1306
+ "special": false
1307
+ },
1308
+ "163": {
1309
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1310
+ "lstrip": false,
1311
+ "normalized": false,
1312
+ "rstrip": false,
1313
+ "single_word": false,
1314
+ "special": false
1315
+ },
1316
+ "164": {
1317
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1318
+ "lstrip": false,
1319
+ "normalized": false,
1320
+ "rstrip": false,
1321
+ "single_word": false,
1322
+ "special": false
1323
+ },
1324
+ "165": {
1325
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1326
+ "lstrip": false,
1327
+ "normalized": false,
1328
+ "rstrip": false,
1329
+ "single_word": false,
1330
+ "special": false
1331
+ },
1332
+ "166": {
1333
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1334
+ "lstrip": false,
1335
+ "normalized": false,
1336
+ "rstrip": false,
1337
+ "single_word": false,
1338
+ "special": false
1339
+ },
1340
+ "167": {
1341
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1342
+ "lstrip": false,
1343
+ "normalized": false,
1344
+ "rstrip": false,
1345
+ "single_word": false,
1346
+ "special": false
1347
+ },
1348
+ "168": {
1349
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1350
+ "lstrip": false,
1351
+ "normalized": false,
1352
+ "rstrip": false,
1353
+ "single_word": false,
1354
+ "special": false
1355
+ },
1356
+ "169": {
1357
+ "content": "<table>",
1358
+ "lstrip": false,
1359
+ "normalized": false,
1360
+ "rstrip": false,
1361
+ "single_word": false,
1362
+ "special": false
1363
+ },
1364
+ "170": {
1365
+ "content": "<caption>",
1366
+ "lstrip": false,
1367
+ "normalized": false,
1368
+ "rstrip": false,
1369
+ "single_word": false,
1370
+ "special": false
1371
+ },
1372
+ "171": {
1373
+ "content": "<thead>",
1374
+ "lstrip": false,
1375
+ "normalized": false,
1376
+ "rstrip": false,
1377
+ "single_word": false,
1378
+ "special": false
1379
+ },
1380
+ "172": {
1381
+ "content": "<tbody>",
1382
+ "lstrip": false,
1383
+ "normalized": false,
1384
+ "rstrip": false,
1385
+ "single_word": false,
1386
+ "special": false
1387
+ },
1388
+ "173": {
1389
+ "content": "<tfoot>",
1390
+ "lstrip": false,
1391
+ "normalized": false,
1392
+ "rstrip": false,
1393
+ "single_word": false,
1394
+ "special": false
1395
+ },
1396
+ "174": {
1397
+ "content": "<tr>",
1398
+ "lstrip": false,
1399
+ "normalized": false,
1400
+ "rstrip": false,
1401
+ "single_word": false,
1402
+ "special": false
1403
+ },
1404
+ "175": {
1405
+ "content": "<th>",
1406
+ "lstrip": false,
1407
+ "normalized": false,
1408
+ "rstrip": false,
1409
+ "single_word": false,
1410
+ "special": false
1411
+ },
1412
+ "176": {
1413
+ "content": "<td>",
1414
+ "lstrip": false,
1415
+ "normalized": false,
1416
+ "rstrip": false,
1417
+ "single_word": false,
1418
+ "special": false
1419
+ },
1420
+ "177": {
1421
+ "content": "</table>",
1422
+ "lstrip": false,
1423
+ "normalized": false,
1424
+ "rstrip": false,
1425
+ "single_word": false,
1426
+ "special": false
1427
+ },
1428
+ "178": {
1429
+ "content": "</caption>",
1430
+ "lstrip": false,
1431
+ "normalized": false,
1432
+ "rstrip": false,
1433
+ "single_word": false,
1434
+ "special": false
1435
+ },
1436
+ "179": {
1437
+ "content": "</thead>",
1438
+ "lstrip": false,
1439
+ "normalized": false,
1440
+ "rstrip": false,
1441
+ "single_word": false,
1442
+ "special": false
1443
+ },
1444
+ "180": {
1445
+ "content": "</tbody>",
1446
+ "lstrip": false,
1447
+ "normalized": false,
1448
+ "rstrip": false,
1449
+ "single_word": false,
1450
+ "special": false
1451
+ },
1452
+ "181": {
1453
+ "content": "</tfoot>",
1454
+ "lstrip": false,
1455
+ "normalized": false,
1456
+ "rstrip": false,
1457
+ "single_word": false,
1458
+ "special": false
1459
+ },
1460
+ "182": {
1461
+ "content": "</tr>",
1462
+ "lstrip": false,
1463
+ "normalized": false,
1464
+ "rstrip": false,
1465
+ "single_word": false,
1466
+ "special": false
1467
+ },
1468
+ "183": {
1469
+ "content": "</th>",
1470
+ "lstrip": false,
1471
+ "normalized": false,
1472
+ "rstrip": false,
1473
+ "single_word": false,
1474
+ "special": false
1475
+ },
1476
+ "184": {
1477
+ "content": "</td>",
1478
+ "lstrip": false,
1479
+ "normalized": false,
1480
+ "rstrip": false,
1481
+ "single_word": false,
1482
+ "special": false
1483
+ },
1484
+ "185": {
1485
+ "content": "<h1>",
1486
+ "lstrip": false,
1487
+ "normalized": false,
1488
+ "rstrip": false,
1489
+ "single_word": false,
1490
+ "special": false
1491
+ },
1492
+ "186": {
1493
+ "content": "<h2>",
1494
+ "lstrip": false,
1495
+ "normalized": false,
1496
+ "rstrip": false,
1497
+ "single_word": false,
1498
+ "special": false
1499
+ },
1500
+ "187": {
1501
+ "content": "<h3>",
1502
+ "lstrip": false,
1503
+ "normalized": false,
1504
+ "rstrip": false,
1505
+ "single_word": false,
1506
+ "special": false
1507
+ },
1508
+ "188": {
1509
+ "content": "<h4>",
1510
+ "lstrip": false,
1511
+ "normalized": false,
1512
+ "rstrip": false,
1513
+ "single_word": false,
1514
+ "special": false
1515
+ },
1516
+ "189": {
1517
+ "content": "<h5>",
1518
+ "lstrip": false,
1519
+ "normalized": false,
1520
+ "rstrip": false,
1521
+ "single_word": false,
1522
+ "special": false
1523
+ },
1524
+ "190": {
1525
+ "content": "<h6>",
1526
+ "lstrip": false,
1527
+ "normalized": false,
1528
+ "rstrip": false,
1529
+ "single_word": false,
1530
+ "special": false
1531
+ },
1532
+ "191": {
1533
+ "content": "<blockquote>",
1534
+ "lstrip": false,
1535
+ "normalized": false,
1536
+ "rstrip": false,
1537
+ "single_word": false,
1538
+ "special": false
1539
+ },
1540
+ "192": {
1541
+ "content": "</h1>",
1542
+ "lstrip": false,
1543
+ "normalized": false,
1544
+ "rstrip": false,
1545
+ "single_word": false,
1546
+ "special": false
1547
+ },
1548
+ "193": {
1549
+ "content": "</h2>",
1550
+ "lstrip": false,
1551
+ "normalized": false,
1552
+ "rstrip": false,
1553
+ "single_word": false,
1554
+ "special": false
1555
+ },
1556
+ "194": {
1557
+ "content": "</h3>",
1558
+ "lstrip": false,
1559
+ "normalized": false,
1560
+ "rstrip": false,
1561
+ "single_word": false,
1562
+ "special": false
1563
+ },
1564
+ "195": {
1565
+ "content": "</h4>",
1566
+ "lstrip": false,
1567
+ "normalized": false,
1568
+ "rstrip": false,
1569
+ "single_word": false,
1570
+ "special": false
1571
+ },
1572
+ "196": {
1573
+ "content": "</h5>",
1574
+ "lstrip": false,
1575
+ "normalized": false,
1576
+ "rstrip": false,
1577
+ "single_word": false,
1578
+ "special": false
1579
+ },
1580
+ "197": {
1581
+ "content": "</h6>",
1582
+ "lstrip": false,
1583
+ "normalized": false,
1584
+ "rstrip": false,
1585
+ "single_word": false,
1586
+ "special": false
1587
+ },
1588
+ "198": {
1589
+ "content": "</blockquote>",
1590
+ "lstrip": false,
1591
+ "normalized": false,
1592
+ "rstrip": false,
1593
+ "single_word": false,
1594
+ "special": false
1595
+ },
1596
+ "199": {
1597
+ "content": "<strong>",
1598
+ "lstrip": false,
1599
+ "normalized": false,
1600
+ "rstrip": false,
1601
+ "single_word": false,
1602
+ "special": false
1603
+ },
1604
+ "200": {
1605
+ "content": "<em>",
1606
+ "lstrip": false,
1607
+ "normalized": false,
1608
+ "rstrip": false,
1609
+ "single_word": false,
1610
+ "special": false
1611
+ },
1612
+ "201": {
1613
+ "content": "<b>",
1614
+ "lstrip": false,
1615
+ "normalized": false,
1616
+ "rstrip": false,
1617
+ "single_word": false,
1618
+ "special": false
1619
+ },
1620
+ "202": {
1621
+ "content": "<i>",
1622
+ "lstrip": false,
1623
+ "normalized": false,
1624
+ "rstrip": false,
1625
+ "single_word": false,
1626
+ "special": false
1627
+ },
1628
+ "203": {
1629
+ "content": "<u>",
1630
+ "lstrip": false,
1631
+ "normalized": false,
1632
+ "rstrip": false,
1633
+ "single_word": false,
1634
+ "special": false
1635
+ },
1636
+ "204": {
1637
+ "content": "<s>",
1638
+ "lstrip": false,
1639
+ "normalized": false,
1640
+ "rstrip": false,
1641
+ "single_word": false,
1642
+ "special": false
1643
+ },
1644
+ "205": {
1645
+ "content": "<sub>",
1646
+ "lstrip": false,
1647
+ "normalized": false,
1648
+ "rstrip": false,
1649
+ "single_word": false,
1650
+ "special": false
1651
+ },
1652
+ "206": {
1653
+ "content": "<sup>",
1654
+ "lstrip": false,
1655
+ "normalized": false,
1656
+ "rstrip": false,
1657
+ "single_word": false,
1658
+ "special": false
1659
+ },
1660
+ "207": {
1661
+ "content": "<code>",
1662
+ "lstrip": false,
1663
+ "normalized": false,
1664
+ "rstrip": false,
1665
+ "single_word": false,
1666
+ "special": false
1667
+ },
1668
+ "208": {
1669
+ "content": "</strong>",
1670
+ "lstrip": false,
1671
+ "normalized": false,
1672
+ "rstrip": false,
1673
+ "single_word": false,
1674
+ "special": false
1675
+ },
1676
+ "209": {
1677
+ "content": "</em>",
1678
+ "lstrip": false,
1679
+ "normalized": false,
1680
+ "rstrip": false,
1681
+ "single_word": false,
1682
+ "special": false
1683
+ },
1684
+ "210": {
1685
+ "content": "</b>",
1686
+ "lstrip": false,
1687
+ "normalized": false,
1688
+ "rstrip": false,
1689
+ "single_word": false,
1690
+ "special": false
1691
+ },
1692
+ "211": {
1693
+ "content": "</i>",
1694
+ "lstrip": false,
1695
+ "normalized": false,
1696
+ "rstrip": false,
1697
+ "single_word": false,
1698
+ "special": false
1699
+ },
1700
+ "212": {
1701
+ "content": "</u>",
1702
+ "lstrip": false,
1703
+ "normalized": false,
1704
+ "rstrip": false,
1705
+ "single_word": false,
1706
+ "special": false
1707
+ },
1708
+ "213": {
1709
+ "content": "</s>",
1710
+ "lstrip": false,
1711
+ "normalized": false,
1712
+ "rstrip": false,
1713
+ "single_word": false,
1714
+ "special": false
1715
+ },
1716
+ "214": {
1717
+ "content": "</sub>",
1718
+ "lstrip": false,
1719
+ "normalized": false,
1720
+ "rstrip": false,
1721
+ "single_word": false,
1722
+ "special": false
1723
+ },
1724
+ "215": {
1725
+ "content": "</sup>",
1726
+ "lstrip": false,
1727
+ "normalized": false,
1728
+ "rstrip": false,
1729
+ "single_word": false,
1730
+ "special": false
1731
+ },
1732
+ "216": {
1733
+ "content": "</code>",
1734
+ "lstrip": false,
1735
+ "normalized": false,
1736
+ "rstrip": false,
1737
+ "single_word": false,
1738
+ "special": false
1739
+ },
1740
+ "255968": {
1741
+ "content": "[toxicity=0]",
1742
+ "lstrip": false,
1743
+ "normalized": false,
1744
+ "rstrip": false,
1745
+ "single_word": false,
1746
+ "special": false
1747
+ },
1748
+ "255969": {
1749
+ "content": "\t\t",
1750
+ "lstrip": false,
1751
+ "normalized": false,
1752
+ "rstrip": false,
1753
+ "single_word": false,
1754
+ "special": false
1755
+ },
1756
+ "255970": {
1757
+ "content": "\t\t\t",
1758
+ "lstrip": false,
1759
+ "normalized": false,
1760
+ "rstrip": false,
1761
+ "single_word": false,
1762
+ "special": false
1763
+ },
1764
+ "255971": {
1765
+ "content": "\t\t\t\t",
1766
+ "lstrip": false,
1767
+ "normalized": false,
1768
+ "rstrip": false,
1769
+ "single_word": false,
1770
+ "special": false
1771
+ },
1772
+ "255972": {
1773
+ "content": "\t\t\t\t\t",
1774
+ "lstrip": false,
1775
+ "normalized": false,
1776
+ "rstrip": false,
1777
+ "single_word": false,
1778
+ "special": false
1779
+ },
1780
+ "255973": {
1781
+ "content": "\t\t\t\t\t\t",
1782
+ "lstrip": false,
1783
+ "normalized": false,
1784
+ "rstrip": false,
1785
+ "single_word": false,
1786
+ "special": false
1787
+ },
1788
+ "255974": {
1789
+ "content": "\t\t\t\t\t\t\t",
1790
+ "lstrip": false,
1791
+ "normalized": false,
1792
+ "rstrip": false,
1793
+ "single_word": false,
1794
+ "special": false
1795
+ },
1796
+ "255975": {
1797
+ "content": "\t\t\t\t\t\t\t\t",
1798
+ "lstrip": false,
1799
+ "normalized": false,
1800
+ "rstrip": false,
1801
+ "single_word": false,
1802
+ "special": false
1803
+ },
1804
+ "255976": {
1805
+ "content": "\t\t\t\t\t\t\t\t\t",
1806
+ "lstrip": false,
1807
+ "normalized": false,
1808
+ "rstrip": false,
1809
+ "single_word": false,
1810
+ "special": false
1811
+ },
1812
+ "255977": {
1813
+ "content": "\t\t\t\t\t\t\t\t\t\t",
1814
+ "lstrip": false,
1815
+ "normalized": false,
1816
+ "rstrip": false,
1817
+ "single_word": false,
1818
+ "special": false
1819
+ },
1820
+ "255978": {
1821
+ "content": "\t\t\t\t\t\t\t\t\t\t\t",
1822
+ "lstrip": false,
1823
+ "normalized": false,
1824
+ "rstrip": false,
1825
+ "single_word": false,
1826
+ "special": false
1827
+ },
1828
+ "255979": {
1829
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t",
1830
+ "lstrip": false,
1831
+ "normalized": false,
1832
+ "rstrip": false,
1833
+ "single_word": false,
1834
+ "special": false
1835
+ },
1836
+ "255980": {
1837
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t",
1838
+ "lstrip": false,
1839
+ "normalized": false,
1840
+ "rstrip": false,
1841
+ "single_word": false,
1842
+ "special": false
1843
+ },
1844
+ "255981": {
1845
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1846
+ "lstrip": false,
1847
+ "normalized": false,
1848
+ "rstrip": false,
1849
+ "single_word": false,
1850
+ "special": false
1851
+ },
1852
+ "255982": {
1853
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1854
+ "lstrip": false,
1855
+ "normalized": false,
1856
+ "rstrip": false,
1857
+ "single_word": false,
1858
+ "special": false
1859
+ },
1860
+ "255983": {
1861
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1862
+ "lstrip": false,
1863
+ "normalized": false,
1864
+ "rstrip": false,
1865
+ "single_word": false,
1866
+ "special": false
1867
+ },
1868
+ "255984": {
1869
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1870
+ "lstrip": false,
1871
+ "normalized": false,
1872
+ "rstrip": false,
1873
+ "single_word": false,
1874
+ "special": false
1875
+ },
1876
+ "255985": {
1877
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1878
+ "lstrip": false,
1879
+ "normalized": false,
1880
+ "rstrip": false,
1881
+ "single_word": false,
1882
+ "special": false
1883
+ },
1884
+ "255986": {
1885
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1886
+ "lstrip": false,
1887
+ "normalized": false,
1888
+ "rstrip": false,
1889
+ "single_word": false,
1890
+ "special": false
1891
+ },
1892
+ "255987": {
1893
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1894
+ "lstrip": false,
1895
+ "normalized": false,
1896
+ "rstrip": false,
1897
+ "single_word": false,
1898
+ "special": false
1899
+ },
1900
+ "255988": {
1901
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1902
+ "lstrip": false,
1903
+ "normalized": false,
1904
+ "rstrip": false,
1905
+ "single_word": false,
1906
+ "special": false
1907
+ },
1908
+ "255989": {
1909
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1910
+ "lstrip": false,
1911
+ "normalized": false,
1912
+ "rstrip": false,
1913
+ "single_word": false,
1914
+ "special": false
1915
+ },
1916
+ "255990": {
1917
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1918
+ "lstrip": false,
1919
+ "normalized": false,
1920
+ "rstrip": false,
1921
+ "single_word": false,
1922
+ "special": false
1923
+ },
1924
+ "255991": {
1925
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1926
+ "lstrip": false,
1927
+ "normalized": false,
1928
+ "rstrip": false,
1929
+ "single_word": false,
1930
+ "special": false
1931
+ },
1932
+ "255992": {
1933
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1934
+ "lstrip": false,
1935
+ "normalized": false,
1936
+ "rstrip": false,
1937
+ "single_word": false,
1938
+ "special": false
1939
+ },
1940
+ "255993": {
1941
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1942
+ "lstrip": false,
1943
+ "normalized": false,
1944
+ "rstrip": false,
1945
+ "single_word": false,
1946
+ "special": false
1947
+ },
1948
+ "255994": {
1949
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1950
+ "lstrip": false,
1951
+ "normalized": false,
1952
+ "rstrip": false,
1953
+ "single_word": false,
1954
+ "special": false
1955
+ },
1956
+ "255995": {
1957
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1958
+ "lstrip": false,
1959
+ "normalized": false,
1960
+ "rstrip": false,
1961
+ "single_word": false,
1962
+ "special": false
1963
+ },
1964
+ "255996": {
1965
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1966
+ "lstrip": false,
1967
+ "normalized": false,
1968
+ "rstrip": false,
1969
+ "single_word": false,
1970
+ "special": false
1971
+ },
1972
+ "255997": {
1973
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1974
+ "lstrip": false,
1975
+ "normalized": false,
1976
+ "rstrip": false,
1977
+ "single_word": false,
1978
+ "special": false
1979
+ },
1980
+ "255998": {
1981
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1982
+ "lstrip": false,
1983
+ "normalized": false,
1984
+ "rstrip": false,
1985
+ "single_word": false,
1986
+ "special": false
1987
+ },
1988
+ "255999": {
1989
+ "content": "<unused99>",
1990
+ "lstrip": false,
1991
+ "normalized": false,
1992
+ "rstrip": false,
1993
+ "single_word": false,
1994
+ "special": false
1995
+ }
1996
+ },
1997
+ "additional_special_tokens": [
1998
+ "<start_of_turn>",
1999
+ "<end_of_turn>"
2000
+ ],
2001
+ "bos_token": "<bos>",
2002
+ "clean_up_tokenization_spaces": false,
2003
+ "cls_token": "<bos>",
2004
+ "eos_token": "<eos>",
2005
+ "extra_special_tokens": {},
2006
+ "mask_token": "<mask>",
2007
+ "model_input_names": [
2008
+ "input_ids",
2009
+ "attention_mask"
2010
+ ],
2011
+ "model_max_length": 8192,
2012
+ "pad_token": "<pad>",
2013
+ "padding_side": "right",
2014
+ "sep_token": "<eos>",
2015
+ "spaces_between_special_tokens": false,
2016
+ "tokenizer_class": "PreTrainedTokenizerFast",
2017
+ "unk_token": "<unk>"
2018
+ }