minhleduc commited on
Commit
7b445cb
·
verified ·
1 Parent(s): 416a937

Add new SentenceTransformer model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,463 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:28990
9
+ - loss:CoSENTLoss
10
+ base_model: jhu-clsp/mmBERT-small
11
+ widget:
12
+ - source_sentence: Bộ Y tế Trung Quốc cho biết thêm 5 người đã chết vì Sars và thêm
13
+ 159 người bị nhiễm bệnh.
14
+ sentences:
15
+ - Hôm thứ Hai, Trung Quốc cho biết thêm 9 người đã chết vì SARS và 160 người nữa
16
+ bị nhiễm virus này.
17
+ - Cộng hòa Hồi giáo là nước áp dụng nhiều hình phạt tử hình nhất trên thế giới sau
18
+ Trung Quốc.
19
+ - Chỉ số Dow Jones Industrial Average tăng 194,14 điểm, hoặc 2,09%, lên 9.469,20
20
+ sau khi giảm hơn 1% một ngày trước đó.
21
+ - source_sentence: Thừa kế là một khái niệm cơ bản trong lập trình hướng đối tượng.
22
+ sentences:
23
+ - Thừa kế trong lập trình hướng đối tượng là một cách để tạo các lớp mới bằng cách
24
+ sử dụng các lớp đã được định nghĩa.
25
+ - Một con chó màu nâu và trắng đang cầm một quả bóng chày trong miệng
26
+ - Tôi cũng đang gặp đúng vấn đề tương tự.
27
+ - source_sentence: hành động bảo tồn hoặc bảo vệ một cái gì đó
28
+ sentences:
29
+ - 'Tôi sẽ nói thẳng ở đây: Bạn không.'
30
+ - Nhưng lợi nhuận khiến thị trường chứng khoán Mỹ trở nên hấp dẫn phản ánh sự cân
31
+ bằng chính trị mong manh.
32
+ - hành động khẳng định hoặc khẳng định hoặc tuyên bố một cái gì đó.
33
+ - source_sentence: Một cậu bé mặc áo thun trắng đang phun nước trong nước nông
34
+ sentences:
35
+ - Một cậu bé đang chạy qua biển ở bãi biển
36
+ - Hai người đang đi xe đạp.
37
+ - Người đàn ông đang chơi với con chó nhỏ lông xù.
38
+ - source_sentence: Một người đàn ông đang ngồi gần một chiếc xe đạp và đang viết một
39
+ ghi chú
40
+ sentences:
41
+ - 'Mức thuế 50 pence của Đảng Lao động: Những điều bạn cần biết.'
42
+ - Các vận động viên khuyết tật chuẩn bị sẵn sàng, làm dấy lên những câu hỏi về hậu
43
+ cần và sự công bằng.
44
+ - Một người đàn ông mặc quần áo được phủ sơn và đang ngồi bên ngoài trong một khu
45
+ vực đông đúc để viết một cái gì đó
46
+ pipeline_tag: sentence-similarity
47
+ library_name: sentence-transformers
48
+ metrics:
49
+ - pearson_cosine
50
+ - spearman_cosine
51
+ model-index:
52
+ - name: SentenceTransformer based on jhu-clsp/mmBERT-small
53
+ results:
54
+ - task:
55
+ type: semantic-similarity
56
+ name: Semantic Similarity
57
+ dataset:
58
+ name: 8Opt sts dev 0002
59
+ type: 8Opt-sts-dev-0002
60
+ metrics:
61
+ - type: pearson_cosine
62
+ value: 0.6930301743749289
63
+ name: Pearson Cosine
64
+ - type: spearman_cosine
65
+ value: 0.7196305705541503
66
+ name: Spearman Cosine
67
+ - task:
68
+ type: semantic-similarity
69
+ name: Semantic Similarity
70
+ dataset:
71
+ name: 8Opt sts test 0002
72
+ type: 8Opt-sts-test-0002
73
+ metrics:
74
+ - type: pearson_cosine
75
+ value: 0.6929781201615477
76
+ name: Pearson Cosine
77
+ - type: spearman_cosine
78
+ value: 0.7196122481384816
79
+ name: Spearman Cosine
80
+ ---
81
+
82
+ # SentenceTransformer based on jhu-clsp/mmBERT-small
83
+
84
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [jhu-clsp/mmBERT-small](https://huggingface.co/jhu-clsp/mmBERT-small). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
85
+
86
+ ## Model Details
87
+
88
+ ### Model Description
89
+ - **Model Type:** Sentence Transformer
90
+ - **Base model:** [jhu-clsp/mmBERT-small](https://huggingface.co/jhu-clsp/mmBERT-small) <!-- at revision abc32620dd4f6ab06f5fbe905dc25f310618e09f -->
91
+ - **Maximum Sequence Length:** 8192 tokens
92
+ - **Output Dimensionality:** 384 dimensions
93
+ - **Similarity Function:** Cosine Similarity
94
+ <!-- - **Training Dataset:** Unknown -->
95
+ <!-- - **Language:** Unknown -->
96
+ <!-- - **License:** Unknown -->
97
+
98
+ ### Model Sources
99
+
100
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
101
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers)
102
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
103
+
104
+ ### Full Model Architecture
105
+
106
+ ```
107
+ SentenceTransformer(
108
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
109
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
110
+ )
111
+ ```
112
+
113
+ ## Usage
114
+
115
+ ### Direct Usage (Sentence Transformers)
116
+
117
+ First install the Sentence Transformers library:
118
+
119
+ ```bash
120
+ pip install -U sentence-transformers
121
+ ```
122
+
123
+ Then you can load this model and run inference.
124
+ ```python
125
+ from sentence_transformers import SentenceTransformer
126
+
127
+ # Download from the 🤗 Hub
128
+ model = SentenceTransformer("8Opt/mmbert-small-vn-sts-001")
129
+ # Run inference
130
+ sentences = [
131
+ 'Một người đàn ông đang ngồi gần một chiếc xe đạp và đang viết một ghi chú',
132
+ 'Một người đàn ông mặc quần áo được phủ sơn và đang ngồi bên ngoài trong một khu vực đông đúc để viết một cái gì đó',
133
+ 'Các vận động viên khuyết tật chuẩn bị sẵn sàng, làm dấy lên những câu hỏi về hậu cần và sự công bằng.',
134
+ ]
135
+ embeddings = model.encode(sentences)
136
+ print(embeddings.shape)
137
+ # [3, 384]
138
+
139
+ # Get the similarity scores for the embeddings
140
+ similarities = model.similarity(embeddings, embeddings)
141
+ print(similarities)
142
+ # tensor([[1.0000, 0.9369, 0.9048],
143
+ # [0.9369, 1.0000, 0.8964],
144
+ # [0.9048, 0.8964, 1.0000]])
145
+ ```
146
+
147
+ <!--
148
+ ### Direct Usage (Transformers)
149
+
150
+ <details><summary>Click to see the direct usage in Transformers</summary>
151
+
152
+ </details>
153
+ -->
154
+
155
+ <!--
156
+ ### Downstream Usage (Sentence Transformers)
157
+
158
+ You can finetune this model on your own dataset.
159
+
160
+ <details><summary>Click to expand</summary>
161
+
162
+ </details>
163
+ -->
164
+
165
+ <!--
166
+ ### Out-of-Scope Use
167
+
168
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
169
+ -->
170
+
171
+ ## Evaluation
172
+
173
+ ### Metrics
174
+
175
+ #### Semantic Similarity
176
+
177
+ * Datasets: `8Opt-sts-dev-0002` and `8Opt-sts-test-0002`
178
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
179
+
180
+ | Metric | 8Opt-sts-dev-0002 | 8Opt-sts-test-0002 |
181
+ |:--------------------|:------------------|:-------------------|
182
+ | pearson_cosine | 0.693 | 0.693 |
183
+ | **spearman_cosine** | **0.7196** | **0.7196** |
184
+
185
+ <!--
186
+ ## Bias, Risks and Limitations
187
+
188
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
189
+ -->
190
+
191
+ <!--
192
+ ### Recommendations
193
+
194
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
195
+ -->
196
+
197
+ ## Training Details
198
+
199
+ ### Training Dataset
200
+
201
+ #### Unnamed Dataset
202
+
203
+ * Size: 28,990 training samples
204
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
205
+ * Approximate statistics based on the first 1000 samples:
206
+ | | sentence1 | sentence2 | score |
207
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
208
+ | type | string | string | float |
209
+ | details | <ul><li>min: 6 tokens</li><li>mean: 18.02 tokens</li><li>max: 74 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 17.77 tokens</li><li>max: 81 tokens</li></ul> | <ul><li>min: 0.04</li><li>mean: 2.6</li><li>max: 5.0</li></ul> |
210
+ * Samples:
211
+ | sentence1 | sentence2 | score |
212
+ |:------------------------------------------------------------|:-------------------------------------------------------------------------------|:-----------------|
213
+ | <code>Một con chó đen đang chạy trong tuyết.</code> | <code>Một con chó đen đang chạy trên bãi biển.</code> | <code>1.8</code> |
214
+ | <code>bóng đèn a tạo ra một khoảng trống</code> | <code>nếu bóng đèn a cháy ra, cả b và c đều không ở trong một đường kín</code> | <code>1.8</code> |
215
+ | <code>Sự phát triển an ninh tại Iraq, ngày 1 tháng 2</code> | <code>Sự phát triển an ninh tại Pakistan, ngày 13 tháng 3</code> | <code>1.6</code> |
216
+ * Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
217
+ ```json
218
+ {
219
+ "scale": 20.0,
220
+ "similarity_fct": "pairwise_cos_sim"
221
+ }
222
+ ```
223
+
224
+ ### Evaluation Dataset
225
+
226
+ #### Unnamed Dataset
227
+
228
+ * Size: 4,141 evaluation samples
229
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
230
+ * Approximate statistics based on the first 1000 samples:
231
+ | | sentence1 | sentence2 | score |
232
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------|
233
+ | type | string | string | float |
234
+ | details | <ul><li>min: 3 tokens</li><li>mean: 18.71 tokens</li><li>max: 107 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 17.54 tokens</li><li>max: 91 tokens</li></ul> | <ul><li>min: 0.04</li><li>mean: 2.56</li><li>max: 5.0</li></ul> |
235
+ * Samples:
236
+ | sentence1 | sentence2 | score |
237
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------|:-----------------|
238
+ | <code>một đơn vị đáp ứng các điều kiện tiên quyết để tham gia vào một sự kiện. một mức độ sửa đổi có thể được bao gồm để chỉ ra bao nhiêu đơn vị vượt quá hoặc không đạt được các yêu cầu tối thiểu.</code> | <code>(thường theo sau là `to ') có phương tiện cần thiết hoặc kỹ năng hoặc bí quyết hoặc thẩm quyền để làm một cái gì đó;</code> | <code>0.4</code> |
239
+ | <code>Tôi sẽ không đưa nó vào hồ sơ của mình.</code> | <code>Tôi sẽ không đưa công việc này vào hồ sơ của mình.</code> | <code>4.0</code> |
240
+ | <code>Một cậu bé trẻ với một chiếc áo khoác chứa tim đang nâng tay lên khi anh ta trượt</code> | <code>Một đứa trẻ tóc vàng đang đi xuống một slide và ném lên tay của mình</code> | <code>3.7</code> |
241
+ * Loss: [<code>CoSENTLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosentloss) with these parameters:
242
+ ```json
243
+ {
244
+ "scale": 20.0,
245
+ "similarity_fct": "pairwise_cos_sim"
246
+ }
247
+ ```
248
+
249
+ ### Training Hyperparameters
250
+ #### Non-Default Hyperparameters
251
+
252
+ - `eval_strategy`: steps
253
+ - `per_device_train_batch_size`: 45
254
+ - `per_device_eval_batch_size`: 45
255
+ - `learning_rate`: 2e-05
256
+ - `warmup_ratio`: 0.1
257
+ - `fp16`: True
258
+ - `batch_sampler`: no_duplicates
259
+
260
+ #### All Hyperparameters
261
+ <details><summary>Click to expand</summary>
262
+
263
+ - `overwrite_output_dir`: False
264
+ - `do_predict`: False
265
+ - `eval_strategy`: steps
266
+ - `prediction_loss_only`: True
267
+ - `per_device_train_batch_size`: 45
268
+ - `per_device_eval_batch_size`: 45
269
+ - `per_gpu_train_batch_size`: None
270
+ - `per_gpu_eval_batch_size`: None
271
+ - `gradient_accumulation_steps`: 1
272
+ - `eval_accumulation_steps`: None
273
+ - `torch_empty_cache_steps`: None
274
+ - `learning_rate`: 2e-05
275
+ - `weight_decay`: 0.0
276
+ - `adam_beta1`: 0.9
277
+ - `adam_beta2`: 0.999
278
+ - `adam_epsilon`: 1e-08
279
+ - `max_grad_norm`: 1.0
280
+ - `num_train_epochs`: 3
281
+ - `max_steps`: -1
282
+ - `lr_scheduler_type`: linear
283
+ - `lr_scheduler_kwargs`: {}
284
+ - `warmup_ratio`: 0.1
285
+ - `warmup_steps`: 0
286
+ - `log_level`: passive
287
+ - `log_level_replica`: warning
288
+ - `log_on_each_node`: True
289
+ - `logging_nan_inf_filter`: True
290
+ - `save_safetensors`: True
291
+ - `save_on_each_node`: False
292
+ - `save_only_model`: False
293
+ - `restore_callback_states_from_checkpoint`: False
294
+ - `no_cuda`: False
295
+ - `use_cpu`: False
296
+ - `use_mps_device`: False
297
+ - `seed`: 42
298
+ - `data_seed`: None
299
+ - `jit_mode_eval`: False
300
+ - `bf16`: False
301
+ - `fp16`: True
302
+ - `fp16_opt_level`: O1
303
+ - `half_precision_backend`: auto
304
+ - `bf16_full_eval`: False
305
+ - `fp16_full_eval`: False
306
+ - `tf32`: None
307
+ - `local_rank`: 0
308
+ - `ddp_backend`: None
309
+ - `tpu_num_cores`: None
310
+ - `tpu_metrics_debug`: False
311
+ - `debug`: []
312
+ - `dataloader_drop_last`: False
313
+ - `dataloader_num_workers`: 0
314
+ - `dataloader_prefetch_factor`: None
315
+ - `past_index`: -1
316
+ - `disable_tqdm`: False
317
+ - `remove_unused_columns`: True
318
+ - `label_names`: None
319
+ - `load_best_model_at_end`: False
320
+ - `ignore_data_skip`: False
321
+ - `fsdp`: []
322
+ - `fsdp_min_num_params`: 0
323
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
324
+ - `fsdp_transformer_layer_cls_to_wrap`: None
325
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
326
+ - `parallelism_config`: None
327
+ - `deepspeed`: None
328
+ - `label_smoothing_factor`: 0.0
329
+ - `optim`: adamw_torch_fused
330
+ - `optim_args`: None
331
+ - `adafactor`: False
332
+ - `group_by_length`: False
333
+ - `length_column_name`: length
334
+ - `project`: huggingface
335
+ - `trackio_space_id`: trackio
336
+ - `ddp_find_unused_parameters`: None
337
+ - `ddp_bucket_cap_mb`: None
338
+ - `ddp_broadcast_buffers`: False
339
+ - `dataloader_pin_memory`: True
340
+ - `dataloader_persistent_workers`: False
341
+ - `skip_memory_metrics`: True
342
+ - `use_legacy_prediction_loop`: False
343
+ - `push_to_hub`: False
344
+ - `resume_from_checkpoint`: None
345
+ - `hub_model_id`: None
346
+ - `hub_strategy`: every_save
347
+ - `hub_private_repo`: None
348
+ - `hub_always_push`: False
349
+ - `hub_revision`: None
350
+ - `gradient_checkpointing`: False
351
+ - `gradient_checkpointing_kwargs`: None
352
+ - `include_inputs_for_metrics`: False
353
+ - `include_for_metrics`: []
354
+ - `eval_do_concat_batches`: True
355
+ - `fp16_backend`: auto
356
+ - `push_to_hub_model_id`: None
357
+ - `push_to_hub_organization`: None
358
+ - `mp_parameters`:
359
+ - `auto_find_batch_size`: False
360
+ - `full_determinism`: False
361
+ - `torchdynamo`: None
362
+ - `ray_scope`: last
363
+ - `ddp_timeout`: 1800
364
+ - `torch_compile`: False
365
+ - `torch_compile_backend`: None
366
+ - `torch_compile_mode`: None
367
+ - `include_tokens_per_second`: False
368
+ - `include_num_input_tokens_seen`: no
369
+ - `neftune_noise_alpha`: None
370
+ - `optim_target_modules`: None
371
+ - `batch_eval_metrics`: False
372
+ - `eval_on_start`: False
373
+ - `use_liger_kernel`: False
374
+ - `liger_kernel_config`: None
375
+ - `eval_use_gather_object`: False
376
+ - `average_tokens_across_devices`: True
377
+ - `prompts`: None
378
+ - `batch_sampler`: no_duplicates
379
+ - `multi_dataset_batch_sampler`: proportional
380
+ - `router_mapping`: {}
381
+ - `learning_rate_mapping`: {}
382
+
383
+ </details>
384
+
385
+ ### Training Logs
386
+ | Epoch | Step | Training Loss | Validation Loss | 8Opt-sts-dev-0002_spearman_cosine | 8Opt-sts-test-0002_spearman_cosine |
387
+ |:------:|:----:|:-------------:|:---------------:|:---------------------------------:|:----------------------------------:|
388
+ | 0.1550 | 100 | 6.8187 | 6.7929 | 0.5004 | - |
389
+ | 0.3101 | 200 | 6.7419 | 6.7168 | 0.6143 | - |
390
+ | 0.4651 | 300 | 6.6975 | 6.7319 | 0.6434 | - |
391
+ | 0.6202 | 400 | 6.6737 | 6.6706 | 0.6739 | - |
392
+ | 0.7752 | 500 | 6.6784 | 6.7054 | 0.6772 | - |
393
+ | 0.9302 | 600 | 6.6712 | 6.6551 | 0.6898 | - |
394
+ | 1.0853 | 700 | 6.5671 | 6.6602 | 0.6959 | - |
395
+ | 1.2403 | 800 | 6.5571 | 6.6496 | 0.6945 | - |
396
+ | 1.3953 | 900 | 6.5368 | 6.6535 | 0.7046 | - |
397
+ | 1.5504 | 1000 | 6.5372 | 6.6903 | 0.7055 | - |
398
+ | 1.7054 | 1100 | 6.5123 | 6.6377 | 0.7162 | - |
399
+ | 1.8605 | 1200 | 6.5288 | 6.6298 | 0.7181 | - |
400
+ | 2.0155 | 1300 | 6.4976 | 6.6644 | 0.7204 | - |
401
+ | 2.1705 | 1400 | 6.3824 | 6.7222 | 0.7172 | - |
402
+ | 2.3256 | 1500 | 6.3512 | 6.7714 | 0.7165 | - |
403
+ | 2.4806 | 1600 | 6.3392 | 6.7757 | 0.7161 | - |
404
+ | 2.6357 | 1700 | 6.3487 | 6.7514 | 0.7205 | - |
405
+ | 2.7907 | 1800 | 6.3476 | 6.7294 | 0.7195 | - |
406
+ | 2.9457 | 1900 | 6.3519 | 6.7533 | 0.7196 | - |
407
+ | -1 | -1 | - | - | - | 0.7196 |
408
+
409
+
410
+ ### Framework Versions
411
+ - Python: 3.12.12
412
+ - Sentence Transformers: 5.1.2
413
+ - Transformers: 4.57.1
414
+ - PyTorch: 2.8.0+cu126
415
+ - Accelerate: 1.11.0
416
+ - Datasets: 4.0.0
417
+ - Tokenizers: 0.22.1
418
+
419
+ ## Citation
420
+
421
+ ### BibTeX
422
+
423
+ #### Sentence Transformers
424
+ ```bibtex
425
+ @inproceedings{reimers-2019-sentence-bert,
426
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
427
+ author = "Reimers, Nils and Gurevych, Iryna",
428
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
429
+ month = "11",
430
+ year = "2019",
431
+ publisher = "Association for Computational Linguistics",
432
+ url = "https://arxiv.org/abs/1908.10084",
433
+ }
434
+ ```
435
+
436
+ #### CoSENTLoss
437
+ ```bibtex
438
+ @article{10531646,
439
+ author={Huang, Xiang and Peng, Hao and Zou, Dongcheng and Liu, Zhiwei and Li, Jianxin and Liu, Kay and Wu, Jia and Su, Jianlin and Yu, Philip S.},
440
+ journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
441
+ title={CoSENT: Consistent Sentence Embedding via Similarity Ranking},
442
+ year={2024},
443
+ doi={10.1109/TASLP.2024.3402087}
444
+ }
445
+ ```
446
+
447
+ <!--
448
+ ## Glossary
449
+
450
+ *Clearly define terms in order to be accessible across audiences.*
451
+ -->
452
+
453
+ <!--
454
+ ## Model Card Authors
455
+
456
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
457
+ -->
458
+
459
+ <!--
460
+ ## Model Card Contact
461
+
462
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
463
+ -->
config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModernBertModel"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 2,
8
+ "classifier_activation": "gelu",
9
+ "classifier_bias": false,
10
+ "classifier_dropout": 0.0,
11
+ "classifier_pooling": "mean",
12
+ "cls_token_id": 1,
13
+ "decoder_bias": true,
14
+ "deterministic_flash_attn": false,
15
+ "dtype": "float32",
16
+ "embedding_dropout": 0.0,
17
+ "eos_token_id": 1,
18
+ "global_attn_every_n_layers": 3,
19
+ "global_rope_theta": 160000,
20
+ "gradient_checkpointing": false,
21
+ "hidden_activation": "gelu",
22
+ "hidden_size": 384,
23
+ "initializer_cutoff_factor": 2.0,
24
+ "initializer_range": 0.02,
25
+ "intermediate_size": 1152,
26
+ "layer_norm_eps": 1e-05,
27
+ "local_attention": 128,
28
+ "local_rope_theta": 160000,
29
+ "mask_token_id": 4,
30
+ "max_position_embeddings": 8192,
31
+ "mlp_bias": false,
32
+ "mlp_dropout": 0.0,
33
+ "model_type": "modernbert",
34
+ "norm_bias": false,
35
+ "norm_eps": 1e-05,
36
+ "num_attention_heads": 6,
37
+ "num_hidden_layers": 22,
38
+ "pad_token_id": 0,
39
+ "position_embedding_type": "sans_pos",
40
+ "repad_logits_with_grad": false,
41
+ "sep_token_id": 1,
42
+ "sparse_pred_ignore_index": -100,
43
+ "sparse_prediction": false,
44
+ "transformers_version": "4.57.1",
45
+ "vocab_size": 256000
46
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.1.2",
5
+ "transformers": "4.57.1",
6
+ "pytorch": "2.8.0+cu126"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f128ee4e4737996cb3e97d2e88123cd2042e7962401354dca7bc335a381fc3c
3
+ size 561987832
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<start_of_turn>",
4
+ "<end_of_turn>"
5
+ ],
6
+ "bos_token": {
7
+ "content": "<bos>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false
12
+ },
13
+ "cls_token": {
14
+ "content": "<bos>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ },
20
+ "eos_token": {
21
+ "content": "<eos>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false
26
+ },
27
+ "mask_token": {
28
+ "content": "<mask>",
29
+ "lstrip": true,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ },
34
+ "pad_token": {
35
+ "content": "<pad>",
36
+ "lstrip": false,
37
+ "normalized": false,
38
+ "rstrip": false,
39
+ "single_word": false
40
+ },
41
+ "sep_token": {
42
+ "content": "<eos>",
43
+ "lstrip": false,
44
+ "normalized": false,
45
+ "rstrip": false,
46
+ "single_word": false
47
+ },
48
+ "unk_token": {
49
+ "content": "<unk>",
50
+ "lstrip": false,
51
+ "normalized": false,
52
+ "rstrip": false,
53
+ "single_word": false
54
+ }
55
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8bd47075711f75a143d1b78e01a41cc65c1c591b00d3cfeffc23db07adce1392
3
+ size 34363442
tokenizer_config.json ADDED
@@ -0,0 +1,2018 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<pad>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<eos>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "<bos>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "5": {
45
+ "content": "<2mass>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": false
51
+ },
52
+ "6": {
53
+ "content": "[@BOS@]",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": false
59
+ },
60
+ "7": {
61
+ "content": "<unused0>",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": false
67
+ },
68
+ "8": {
69
+ "content": "<unused1>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": false
75
+ },
76
+ "9": {
77
+ "content": "<unused2>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": false
83
+ },
84
+ "10": {
85
+ "content": "<unused3>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": false
91
+ },
92
+ "11": {
93
+ "content": "<unused4>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": false
99
+ },
100
+ "12": {
101
+ "content": "<unused5>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": false
107
+ },
108
+ "13": {
109
+ "content": "<unused6>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": false
115
+ },
116
+ "14": {
117
+ "content": "<unused7>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": false
123
+ },
124
+ "15": {
125
+ "content": "<unused8>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": false
131
+ },
132
+ "16": {
133
+ "content": "<unused9>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": false
139
+ },
140
+ "17": {
141
+ "content": "<unused10>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": false
147
+ },
148
+ "18": {
149
+ "content": "<unused11>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": false
155
+ },
156
+ "19": {
157
+ "content": "<unused12>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": false
163
+ },
164
+ "20": {
165
+ "content": "<unused13>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": false
171
+ },
172
+ "21": {
173
+ "content": "<unused14>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": false
179
+ },
180
+ "22": {
181
+ "content": "<unused15>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": false
187
+ },
188
+ "23": {
189
+ "content": "<unused16>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": false
195
+ },
196
+ "24": {
197
+ "content": "<unused17>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": false
203
+ },
204
+ "25": {
205
+ "content": "<unused18>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": false
211
+ },
212
+ "26": {
213
+ "content": "<unused19>",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": false
219
+ },
220
+ "27": {
221
+ "content": "<unused20>",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": false
227
+ },
228
+ "28": {
229
+ "content": "<unused21>",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": false
235
+ },
236
+ "29": {
237
+ "content": "<unused22>",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": false
243
+ },
244
+ "30": {
245
+ "content": "<unused23>",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": false
251
+ },
252
+ "31": {
253
+ "content": "<unused24>",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": false
259
+ },
260
+ "32": {
261
+ "content": "<unused25>",
262
+ "lstrip": false,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": false
267
+ },
268
+ "33": {
269
+ "content": "<unused26>",
270
+ "lstrip": false,
271
+ "normalized": false,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": false
275
+ },
276
+ "34": {
277
+ "content": "<unused27>",
278
+ "lstrip": false,
279
+ "normalized": false,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": false
283
+ },
284
+ "35": {
285
+ "content": "<unused28>",
286
+ "lstrip": false,
287
+ "normalized": false,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": false
291
+ },
292
+ "36": {
293
+ "content": "<unused29>",
294
+ "lstrip": false,
295
+ "normalized": false,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": false
299
+ },
300
+ "37": {
301
+ "content": "<unused30>",
302
+ "lstrip": false,
303
+ "normalized": false,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": false
307
+ },
308
+ "38": {
309
+ "content": "<unused31>",
310
+ "lstrip": false,
311
+ "normalized": false,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": false
315
+ },
316
+ "39": {
317
+ "content": "<unused32>",
318
+ "lstrip": false,
319
+ "normalized": false,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": false
323
+ },
324
+ "40": {
325
+ "content": "<unused33>",
326
+ "lstrip": false,
327
+ "normalized": false,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": false
331
+ },
332
+ "41": {
333
+ "content": "<unused34>",
334
+ "lstrip": false,
335
+ "normalized": false,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": false
339
+ },
340
+ "42": {
341
+ "content": "<unused35>",
342
+ "lstrip": false,
343
+ "normalized": false,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": false
347
+ },
348
+ "43": {
349
+ "content": "<unused36>",
350
+ "lstrip": false,
351
+ "normalized": false,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": false
355
+ },
356
+ "44": {
357
+ "content": "<unused37>",
358
+ "lstrip": false,
359
+ "normalized": false,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": false
363
+ },
364
+ "45": {
365
+ "content": "<unused38>",
366
+ "lstrip": false,
367
+ "normalized": false,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": false
371
+ },
372
+ "46": {
373
+ "content": "<unused39>",
374
+ "lstrip": false,
375
+ "normalized": false,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": false
379
+ },
380
+ "47": {
381
+ "content": "<unused40>",
382
+ "lstrip": false,
383
+ "normalized": false,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": false
387
+ },
388
+ "48": {
389
+ "content": "<unused41>",
390
+ "lstrip": false,
391
+ "normalized": false,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": false
395
+ },
396
+ "49": {
397
+ "content": "<unused42>",
398
+ "lstrip": false,
399
+ "normalized": false,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": false
403
+ },
404
+ "50": {
405
+ "content": "<unused43>",
406
+ "lstrip": false,
407
+ "normalized": false,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": false
411
+ },
412
+ "51": {
413
+ "content": "<unused44>",
414
+ "lstrip": false,
415
+ "normalized": false,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": false
419
+ },
420
+ "52": {
421
+ "content": "<unused45>",
422
+ "lstrip": false,
423
+ "normalized": false,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": false
427
+ },
428
+ "53": {
429
+ "content": "<unused46>",
430
+ "lstrip": false,
431
+ "normalized": false,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": false
435
+ },
436
+ "54": {
437
+ "content": "<unused47>",
438
+ "lstrip": false,
439
+ "normalized": false,
440
+ "rstrip": false,
441
+ "single_word": false,
442
+ "special": false
443
+ },
444
+ "55": {
445
+ "content": "<unused48>",
446
+ "lstrip": false,
447
+ "normalized": false,
448
+ "rstrip": false,
449
+ "single_word": false,
450
+ "special": false
451
+ },
452
+ "56": {
453
+ "content": "<unused49>",
454
+ "lstrip": false,
455
+ "normalized": false,
456
+ "rstrip": false,
457
+ "single_word": false,
458
+ "special": false
459
+ },
460
+ "57": {
461
+ "content": "<unused50>",
462
+ "lstrip": false,
463
+ "normalized": false,
464
+ "rstrip": false,
465
+ "single_word": false,
466
+ "special": false
467
+ },
468
+ "58": {
469
+ "content": "<unused51>",
470
+ "lstrip": false,
471
+ "normalized": false,
472
+ "rstrip": false,
473
+ "single_word": false,
474
+ "special": false
475
+ },
476
+ "59": {
477
+ "content": "<unused52>",
478
+ "lstrip": false,
479
+ "normalized": false,
480
+ "rstrip": false,
481
+ "single_word": false,
482
+ "special": false
483
+ },
484
+ "60": {
485
+ "content": "<unused53>",
486
+ "lstrip": false,
487
+ "normalized": false,
488
+ "rstrip": false,
489
+ "single_word": false,
490
+ "special": false
491
+ },
492
+ "61": {
493
+ "content": "<unused54>",
494
+ "lstrip": false,
495
+ "normalized": false,
496
+ "rstrip": false,
497
+ "single_word": false,
498
+ "special": false
499
+ },
500
+ "62": {
501
+ "content": "<unused55>",
502
+ "lstrip": false,
503
+ "normalized": false,
504
+ "rstrip": false,
505
+ "single_word": false,
506
+ "special": false
507
+ },
508
+ "63": {
509
+ "content": "<unused56>",
510
+ "lstrip": false,
511
+ "normalized": false,
512
+ "rstrip": false,
513
+ "single_word": false,
514
+ "special": false
515
+ },
516
+ "64": {
517
+ "content": "<unused57>",
518
+ "lstrip": false,
519
+ "normalized": false,
520
+ "rstrip": false,
521
+ "single_word": false,
522
+ "special": false
523
+ },
524
+ "65": {
525
+ "content": "<unused58>",
526
+ "lstrip": false,
527
+ "normalized": false,
528
+ "rstrip": false,
529
+ "single_word": false,
530
+ "special": false
531
+ },
532
+ "66": {
533
+ "content": "<unused59>",
534
+ "lstrip": false,
535
+ "normalized": false,
536
+ "rstrip": false,
537
+ "single_word": false,
538
+ "special": false
539
+ },
540
+ "67": {
541
+ "content": "<unused60>",
542
+ "lstrip": false,
543
+ "normalized": false,
544
+ "rstrip": false,
545
+ "single_word": false,
546
+ "special": false
547
+ },
548
+ "68": {
549
+ "content": "<unused61>",
550
+ "lstrip": false,
551
+ "normalized": false,
552
+ "rstrip": false,
553
+ "single_word": false,
554
+ "special": false
555
+ },
556
+ "69": {
557
+ "content": "<unused62>",
558
+ "lstrip": false,
559
+ "normalized": false,
560
+ "rstrip": false,
561
+ "single_word": false,
562
+ "special": false
563
+ },
564
+ "70": {
565
+ "content": "<unused63>",
566
+ "lstrip": false,
567
+ "normalized": false,
568
+ "rstrip": false,
569
+ "single_word": false,
570
+ "special": false
571
+ },
572
+ "71": {
573
+ "content": "<unused64>",
574
+ "lstrip": false,
575
+ "normalized": false,
576
+ "rstrip": false,
577
+ "single_word": false,
578
+ "special": false
579
+ },
580
+ "72": {
581
+ "content": "<unused65>",
582
+ "lstrip": false,
583
+ "normalized": false,
584
+ "rstrip": false,
585
+ "single_word": false,
586
+ "special": false
587
+ },
588
+ "73": {
589
+ "content": "<unused66>",
590
+ "lstrip": false,
591
+ "normalized": false,
592
+ "rstrip": false,
593
+ "single_word": false,
594
+ "special": false
595
+ },
596
+ "74": {
597
+ "content": "<unused67>",
598
+ "lstrip": false,
599
+ "normalized": false,
600
+ "rstrip": false,
601
+ "single_word": false,
602
+ "special": false
603
+ },
604
+ "75": {
605
+ "content": "<unused68>",
606
+ "lstrip": false,
607
+ "normalized": false,
608
+ "rstrip": false,
609
+ "single_word": false,
610
+ "special": false
611
+ },
612
+ "76": {
613
+ "content": "<unused69>",
614
+ "lstrip": false,
615
+ "normalized": false,
616
+ "rstrip": false,
617
+ "single_word": false,
618
+ "special": false
619
+ },
620
+ "77": {
621
+ "content": "<unused70>",
622
+ "lstrip": false,
623
+ "normalized": false,
624
+ "rstrip": false,
625
+ "single_word": false,
626
+ "special": false
627
+ },
628
+ "78": {
629
+ "content": "<unused71>",
630
+ "lstrip": false,
631
+ "normalized": false,
632
+ "rstrip": false,
633
+ "single_word": false,
634
+ "special": false
635
+ },
636
+ "79": {
637
+ "content": "<unused72>",
638
+ "lstrip": false,
639
+ "normalized": false,
640
+ "rstrip": false,
641
+ "single_word": false,
642
+ "special": false
643
+ },
644
+ "80": {
645
+ "content": "<unused73>",
646
+ "lstrip": false,
647
+ "normalized": false,
648
+ "rstrip": false,
649
+ "single_word": false,
650
+ "special": false
651
+ },
652
+ "81": {
653
+ "content": "<unused74>",
654
+ "lstrip": false,
655
+ "normalized": false,
656
+ "rstrip": false,
657
+ "single_word": false,
658
+ "special": false
659
+ },
660
+ "82": {
661
+ "content": "<unused75>",
662
+ "lstrip": false,
663
+ "normalized": false,
664
+ "rstrip": false,
665
+ "single_word": false,
666
+ "special": false
667
+ },
668
+ "83": {
669
+ "content": "<unused76>",
670
+ "lstrip": false,
671
+ "normalized": false,
672
+ "rstrip": false,
673
+ "single_word": false,
674
+ "special": false
675
+ },
676
+ "84": {
677
+ "content": "<unused77>",
678
+ "lstrip": false,
679
+ "normalized": false,
680
+ "rstrip": false,
681
+ "single_word": false,
682
+ "special": false
683
+ },
684
+ "85": {
685
+ "content": "<unused78>",
686
+ "lstrip": false,
687
+ "normalized": false,
688
+ "rstrip": false,
689
+ "single_word": false,
690
+ "special": false
691
+ },
692
+ "86": {
693
+ "content": "<unused79>",
694
+ "lstrip": false,
695
+ "normalized": false,
696
+ "rstrip": false,
697
+ "single_word": false,
698
+ "special": false
699
+ },
700
+ "87": {
701
+ "content": "<unused80>",
702
+ "lstrip": false,
703
+ "normalized": false,
704
+ "rstrip": false,
705
+ "single_word": false,
706
+ "special": false
707
+ },
708
+ "88": {
709
+ "content": "<unused81>",
710
+ "lstrip": false,
711
+ "normalized": false,
712
+ "rstrip": false,
713
+ "single_word": false,
714
+ "special": false
715
+ },
716
+ "89": {
717
+ "content": "<unused82>",
718
+ "lstrip": false,
719
+ "normalized": false,
720
+ "rstrip": false,
721
+ "single_word": false,
722
+ "special": false
723
+ },
724
+ "90": {
725
+ "content": "<unused83>",
726
+ "lstrip": false,
727
+ "normalized": false,
728
+ "rstrip": false,
729
+ "single_word": false,
730
+ "special": false
731
+ },
732
+ "91": {
733
+ "content": "<unused84>",
734
+ "lstrip": false,
735
+ "normalized": false,
736
+ "rstrip": false,
737
+ "single_word": false,
738
+ "special": false
739
+ },
740
+ "92": {
741
+ "content": "<unused85>",
742
+ "lstrip": false,
743
+ "normalized": false,
744
+ "rstrip": false,
745
+ "single_word": false,
746
+ "special": false
747
+ },
748
+ "93": {
749
+ "content": "<unused86>",
750
+ "lstrip": false,
751
+ "normalized": false,
752
+ "rstrip": false,
753
+ "single_word": false,
754
+ "special": false
755
+ },
756
+ "94": {
757
+ "content": "<unused87>",
758
+ "lstrip": false,
759
+ "normalized": false,
760
+ "rstrip": false,
761
+ "single_word": false,
762
+ "special": false
763
+ },
764
+ "95": {
765
+ "content": "<unused88>",
766
+ "lstrip": false,
767
+ "normalized": false,
768
+ "rstrip": false,
769
+ "single_word": false,
770
+ "special": false
771
+ },
772
+ "96": {
773
+ "content": "<unused89>",
774
+ "lstrip": false,
775
+ "normalized": false,
776
+ "rstrip": false,
777
+ "single_word": false,
778
+ "special": false
779
+ },
780
+ "97": {
781
+ "content": "<unused90>",
782
+ "lstrip": false,
783
+ "normalized": false,
784
+ "rstrip": false,
785
+ "single_word": false,
786
+ "special": false
787
+ },
788
+ "98": {
789
+ "content": "<unused91>",
790
+ "lstrip": false,
791
+ "normalized": false,
792
+ "rstrip": false,
793
+ "single_word": false,
794
+ "special": false
795
+ },
796
+ "99": {
797
+ "content": "<unused92>",
798
+ "lstrip": false,
799
+ "normalized": false,
800
+ "rstrip": false,
801
+ "single_word": false,
802
+ "special": false
803
+ },
804
+ "100": {
805
+ "content": "<unused93>",
806
+ "lstrip": false,
807
+ "normalized": false,
808
+ "rstrip": false,
809
+ "single_word": false,
810
+ "special": false
811
+ },
812
+ "101": {
813
+ "content": "<unused94>",
814
+ "lstrip": false,
815
+ "normalized": false,
816
+ "rstrip": false,
817
+ "single_word": false,
818
+ "special": false
819
+ },
820
+ "102": {
821
+ "content": "<unused95>",
822
+ "lstrip": false,
823
+ "normalized": false,
824
+ "rstrip": false,
825
+ "single_word": false,
826
+ "special": false
827
+ },
828
+ "103": {
829
+ "content": "<unused96>",
830
+ "lstrip": false,
831
+ "normalized": false,
832
+ "rstrip": false,
833
+ "single_word": false,
834
+ "special": false
835
+ },
836
+ "104": {
837
+ "content": "<unused97>",
838
+ "lstrip": false,
839
+ "normalized": false,
840
+ "rstrip": false,
841
+ "single_word": false,
842
+ "special": false
843
+ },
844
+ "105": {
845
+ "content": "<unused98>",
846
+ "lstrip": false,
847
+ "normalized": false,
848
+ "rstrip": false,
849
+ "single_word": false,
850
+ "special": false
851
+ },
852
+ "106": {
853
+ "content": "<start_of_turn>",
854
+ "lstrip": false,
855
+ "normalized": false,
856
+ "rstrip": false,
857
+ "single_word": false,
858
+ "special": true
859
+ },
860
+ "107": {
861
+ "content": "<end_of_turn>",
862
+ "lstrip": false,
863
+ "normalized": false,
864
+ "rstrip": false,
865
+ "single_word": false,
866
+ "special": true
867
+ },
868
+ "108": {
869
+ "content": "\n",
870
+ "lstrip": false,
871
+ "normalized": false,
872
+ "rstrip": false,
873
+ "single_word": false,
874
+ "special": false
875
+ },
876
+ "109": {
877
+ "content": "\n\n",
878
+ "lstrip": false,
879
+ "normalized": false,
880
+ "rstrip": false,
881
+ "single_word": false,
882
+ "special": false
883
+ },
884
+ "110": {
885
+ "content": "\n\n\n",
886
+ "lstrip": false,
887
+ "normalized": false,
888
+ "rstrip": false,
889
+ "single_word": false,
890
+ "special": false
891
+ },
892
+ "111": {
893
+ "content": "\n\n\n\n",
894
+ "lstrip": false,
895
+ "normalized": false,
896
+ "rstrip": false,
897
+ "single_word": false,
898
+ "special": false
899
+ },
900
+ "112": {
901
+ "content": "\n\n\n\n\n",
902
+ "lstrip": false,
903
+ "normalized": false,
904
+ "rstrip": false,
905
+ "single_word": false,
906
+ "special": false
907
+ },
908
+ "113": {
909
+ "content": "\n\n\n\n\n\n",
910
+ "lstrip": false,
911
+ "normalized": false,
912
+ "rstrip": false,
913
+ "single_word": false,
914
+ "special": false
915
+ },
916
+ "114": {
917
+ "content": "\n\n\n\n\n\n\n",
918
+ "lstrip": false,
919
+ "normalized": false,
920
+ "rstrip": false,
921
+ "single_word": false,
922
+ "special": false
923
+ },
924
+ "115": {
925
+ "content": "\n\n\n\n\n\n\n\n",
926
+ "lstrip": false,
927
+ "normalized": false,
928
+ "rstrip": false,
929
+ "single_word": false,
930
+ "special": false
931
+ },
932
+ "116": {
933
+ "content": "\n\n\n\n\n\n\n\n\n",
934
+ "lstrip": false,
935
+ "normalized": false,
936
+ "rstrip": false,
937
+ "single_word": false,
938
+ "special": false
939
+ },
940
+ "117": {
941
+ "content": "\n\n\n\n\n\n\n\n\n\n",
942
+ "lstrip": false,
943
+ "normalized": false,
944
+ "rstrip": false,
945
+ "single_word": false,
946
+ "special": false
947
+ },
948
+ "118": {
949
+ "content": "\n\n\n\n\n\n\n\n\n\n\n",
950
+ "lstrip": false,
951
+ "normalized": false,
952
+ "rstrip": false,
953
+ "single_word": false,
954
+ "special": false
955
+ },
956
+ "119": {
957
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n",
958
+ "lstrip": false,
959
+ "normalized": false,
960
+ "rstrip": false,
961
+ "single_word": false,
962
+ "special": false
963
+ },
964
+ "120": {
965
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n",
966
+ "lstrip": false,
967
+ "normalized": false,
968
+ "rstrip": false,
969
+ "single_word": false,
970
+ "special": false
971
+ },
972
+ "121": {
973
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
974
+ "lstrip": false,
975
+ "normalized": false,
976
+ "rstrip": false,
977
+ "single_word": false,
978
+ "special": false
979
+ },
980
+ "122": {
981
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
982
+ "lstrip": false,
983
+ "normalized": false,
984
+ "rstrip": false,
985
+ "single_word": false,
986
+ "special": false
987
+ },
988
+ "123": {
989
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
990
+ "lstrip": false,
991
+ "normalized": false,
992
+ "rstrip": false,
993
+ "single_word": false,
994
+ "special": false
995
+ },
996
+ "124": {
997
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
998
+ "lstrip": false,
999
+ "normalized": false,
1000
+ "rstrip": false,
1001
+ "single_word": false,
1002
+ "special": false
1003
+ },
1004
+ "125": {
1005
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1006
+ "lstrip": false,
1007
+ "normalized": false,
1008
+ "rstrip": false,
1009
+ "single_word": false,
1010
+ "special": false
1011
+ },
1012
+ "126": {
1013
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1014
+ "lstrip": false,
1015
+ "normalized": false,
1016
+ "rstrip": false,
1017
+ "single_word": false,
1018
+ "special": false
1019
+ },
1020
+ "127": {
1021
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1022
+ "lstrip": false,
1023
+ "normalized": false,
1024
+ "rstrip": false,
1025
+ "single_word": false,
1026
+ "special": false
1027
+ },
1028
+ "128": {
1029
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1030
+ "lstrip": false,
1031
+ "normalized": false,
1032
+ "rstrip": false,
1033
+ "single_word": false,
1034
+ "special": false
1035
+ },
1036
+ "129": {
1037
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1038
+ "lstrip": false,
1039
+ "normalized": false,
1040
+ "rstrip": false,
1041
+ "single_word": false,
1042
+ "special": false
1043
+ },
1044
+ "130": {
1045
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1046
+ "lstrip": false,
1047
+ "normalized": false,
1048
+ "rstrip": false,
1049
+ "single_word": false,
1050
+ "special": false
1051
+ },
1052
+ "131": {
1053
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1054
+ "lstrip": false,
1055
+ "normalized": false,
1056
+ "rstrip": false,
1057
+ "single_word": false,
1058
+ "special": false
1059
+ },
1060
+ "132": {
1061
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1062
+ "lstrip": false,
1063
+ "normalized": false,
1064
+ "rstrip": false,
1065
+ "single_word": false,
1066
+ "special": false
1067
+ },
1068
+ "133": {
1069
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1070
+ "lstrip": false,
1071
+ "normalized": false,
1072
+ "rstrip": false,
1073
+ "single_word": false,
1074
+ "special": false
1075
+ },
1076
+ "134": {
1077
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1078
+ "lstrip": false,
1079
+ "normalized": false,
1080
+ "rstrip": false,
1081
+ "single_word": false,
1082
+ "special": false
1083
+ },
1084
+ "135": {
1085
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1086
+ "lstrip": false,
1087
+ "normalized": false,
1088
+ "rstrip": false,
1089
+ "single_word": false,
1090
+ "special": false
1091
+ },
1092
+ "136": {
1093
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1094
+ "lstrip": false,
1095
+ "normalized": false,
1096
+ "rstrip": false,
1097
+ "single_word": false,
1098
+ "special": false
1099
+ },
1100
+ "137": {
1101
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1102
+ "lstrip": false,
1103
+ "normalized": false,
1104
+ "rstrip": false,
1105
+ "single_word": false,
1106
+ "special": false
1107
+ },
1108
+ "138": {
1109
+ "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n",
1110
+ "lstrip": false,
1111
+ "normalized": false,
1112
+ "rstrip": false,
1113
+ "single_word": false,
1114
+ "special": false
1115
+ },
1116
+ "139": {
1117
+ "content": "▁▁",
1118
+ "lstrip": false,
1119
+ "normalized": false,
1120
+ "rstrip": false,
1121
+ "single_word": false,
1122
+ "special": false
1123
+ },
1124
+ "140": {
1125
+ "content": "▁▁▁",
1126
+ "lstrip": false,
1127
+ "normalized": false,
1128
+ "rstrip": false,
1129
+ "single_word": false,
1130
+ "special": false
1131
+ },
1132
+ "141": {
1133
+ "content": "▁▁▁▁",
1134
+ "lstrip": false,
1135
+ "normalized": false,
1136
+ "rstrip": false,
1137
+ "single_word": false,
1138
+ "special": false
1139
+ },
1140
+ "142": {
1141
+ "content": "▁▁▁▁▁",
1142
+ "lstrip": false,
1143
+ "normalized": false,
1144
+ "rstrip": false,
1145
+ "single_word": false,
1146
+ "special": false
1147
+ },
1148
+ "143": {
1149
+ "content": "▁▁▁▁▁▁",
1150
+ "lstrip": false,
1151
+ "normalized": false,
1152
+ "rstrip": false,
1153
+ "single_word": false,
1154
+ "special": false
1155
+ },
1156
+ "144": {
1157
+ "content": "▁▁▁▁▁▁▁",
1158
+ "lstrip": false,
1159
+ "normalized": false,
1160
+ "rstrip": false,
1161
+ "single_word": false,
1162
+ "special": false
1163
+ },
1164
+ "145": {
1165
+ "content": "▁▁▁▁▁▁▁▁",
1166
+ "lstrip": false,
1167
+ "normalized": false,
1168
+ "rstrip": false,
1169
+ "single_word": false,
1170
+ "special": false
1171
+ },
1172
+ "146": {
1173
+ "content": "▁▁▁▁▁▁▁▁▁",
1174
+ "lstrip": false,
1175
+ "normalized": false,
1176
+ "rstrip": false,
1177
+ "single_word": false,
1178
+ "special": false
1179
+ },
1180
+ "147": {
1181
+ "content": "▁▁▁▁▁▁▁▁▁▁",
1182
+ "lstrip": false,
1183
+ "normalized": false,
1184
+ "rstrip": false,
1185
+ "single_word": false,
1186
+ "special": false
1187
+ },
1188
+ "148": {
1189
+ "content": "▁▁▁▁▁▁▁▁▁▁▁",
1190
+ "lstrip": false,
1191
+ "normalized": false,
1192
+ "rstrip": false,
1193
+ "single_word": false,
1194
+ "special": false
1195
+ },
1196
+ "149": {
1197
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁",
1198
+ "lstrip": false,
1199
+ "normalized": false,
1200
+ "rstrip": false,
1201
+ "single_word": false,
1202
+ "special": false
1203
+ },
1204
+ "150": {
1205
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁",
1206
+ "lstrip": false,
1207
+ "normalized": false,
1208
+ "rstrip": false,
1209
+ "single_word": false,
1210
+ "special": false
1211
+ },
1212
+ "151": {
1213
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1214
+ "lstrip": false,
1215
+ "normalized": false,
1216
+ "rstrip": false,
1217
+ "single_word": false,
1218
+ "special": false
1219
+ },
1220
+ "152": {
1221
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1222
+ "lstrip": false,
1223
+ "normalized": false,
1224
+ "rstrip": false,
1225
+ "single_word": false,
1226
+ "special": false
1227
+ },
1228
+ "153": {
1229
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1230
+ "lstrip": false,
1231
+ "normalized": false,
1232
+ "rstrip": false,
1233
+ "single_word": false,
1234
+ "special": false
1235
+ },
1236
+ "154": {
1237
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1238
+ "lstrip": false,
1239
+ "normalized": false,
1240
+ "rstrip": false,
1241
+ "single_word": false,
1242
+ "special": false
1243
+ },
1244
+ "155": {
1245
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1246
+ "lstrip": false,
1247
+ "normalized": false,
1248
+ "rstrip": false,
1249
+ "single_word": false,
1250
+ "special": false
1251
+ },
1252
+ "156": {
1253
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1254
+ "lstrip": false,
1255
+ "normalized": false,
1256
+ "rstrip": false,
1257
+ "single_word": false,
1258
+ "special": false
1259
+ },
1260
+ "157": {
1261
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1262
+ "lstrip": false,
1263
+ "normalized": false,
1264
+ "rstrip": false,
1265
+ "single_word": false,
1266
+ "special": false
1267
+ },
1268
+ "158": {
1269
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1270
+ "lstrip": false,
1271
+ "normalized": false,
1272
+ "rstrip": false,
1273
+ "single_word": false,
1274
+ "special": false
1275
+ },
1276
+ "159": {
1277
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1278
+ "lstrip": false,
1279
+ "normalized": false,
1280
+ "rstrip": false,
1281
+ "single_word": false,
1282
+ "special": false
1283
+ },
1284
+ "160": {
1285
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1286
+ "lstrip": false,
1287
+ "normalized": false,
1288
+ "rstrip": false,
1289
+ "single_word": false,
1290
+ "special": false
1291
+ },
1292
+ "161": {
1293
+ "content": "▁▁▁���▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1294
+ "lstrip": false,
1295
+ "normalized": false,
1296
+ "rstrip": false,
1297
+ "single_word": false,
1298
+ "special": false
1299
+ },
1300
+ "162": {
1301
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1302
+ "lstrip": false,
1303
+ "normalized": false,
1304
+ "rstrip": false,
1305
+ "single_word": false,
1306
+ "special": false
1307
+ },
1308
+ "163": {
1309
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1310
+ "lstrip": false,
1311
+ "normalized": false,
1312
+ "rstrip": false,
1313
+ "single_word": false,
1314
+ "special": false
1315
+ },
1316
+ "164": {
1317
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1318
+ "lstrip": false,
1319
+ "normalized": false,
1320
+ "rstrip": false,
1321
+ "single_word": false,
1322
+ "special": false
1323
+ },
1324
+ "165": {
1325
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1326
+ "lstrip": false,
1327
+ "normalized": false,
1328
+ "rstrip": false,
1329
+ "single_word": false,
1330
+ "special": false
1331
+ },
1332
+ "166": {
1333
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1334
+ "lstrip": false,
1335
+ "normalized": false,
1336
+ "rstrip": false,
1337
+ "single_word": false,
1338
+ "special": false
1339
+ },
1340
+ "167": {
1341
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1342
+ "lstrip": false,
1343
+ "normalized": false,
1344
+ "rstrip": false,
1345
+ "single_word": false,
1346
+ "special": false
1347
+ },
1348
+ "168": {
1349
+ "content": "▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁",
1350
+ "lstrip": false,
1351
+ "normalized": false,
1352
+ "rstrip": false,
1353
+ "single_word": false,
1354
+ "special": false
1355
+ },
1356
+ "169": {
1357
+ "content": "<table>",
1358
+ "lstrip": false,
1359
+ "normalized": false,
1360
+ "rstrip": false,
1361
+ "single_word": false,
1362
+ "special": false
1363
+ },
1364
+ "170": {
1365
+ "content": "<caption>",
1366
+ "lstrip": false,
1367
+ "normalized": false,
1368
+ "rstrip": false,
1369
+ "single_word": false,
1370
+ "special": false
1371
+ },
1372
+ "171": {
1373
+ "content": "<thead>",
1374
+ "lstrip": false,
1375
+ "normalized": false,
1376
+ "rstrip": false,
1377
+ "single_word": false,
1378
+ "special": false
1379
+ },
1380
+ "172": {
1381
+ "content": "<tbody>",
1382
+ "lstrip": false,
1383
+ "normalized": false,
1384
+ "rstrip": false,
1385
+ "single_word": false,
1386
+ "special": false
1387
+ },
1388
+ "173": {
1389
+ "content": "<tfoot>",
1390
+ "lstrip": false,
1391
+ "normalized": false,
1392
+ "rstrip": false,
1393
+ "single_word": false,
1394
+ "special": false
1395
+ },
1396
+ "174": {
1397
+ "content": "<tr>",
1398
+ "lstrip": false,
1399
+ "normalized": false,
1400
+ "rstrip": false,
1401
+ "single_word": false,
1402
+ "special": false
1403
+ },
1404
+ "175": {
1405
+ "content": "<th>",
1406
+ "lstrip": false,
1407
+ "normalized": false,
1408
+ "rstrip": false,
1409
+ "single_word": false,
1410
+ "special": false
1411
+ },
1412
+ "176": {
1413
+ "content": "<td>",
1414
+ "lstrip": false,
1415
+ "normalized": false,
1416
+ "rstrip": false,
1417
+ "single_word": false,
1418
+ "special": false
1419
+ },
1420
+ "177": {
1421
+ "content": "</table>",
1422
+ "lstrip": false,
1423
+ "normalized": false,
1424
+ "rstrip": false,
1425
+ "single_word": false,
1426
+ "special": false
1427
+ },
1428
+ "178": {
1429
+ "content": "</caption>",
1430
+ "lstrip": false,
1431
+ "normalized": false,
1432
+ "rstrip": false,
1433
+ "single_word": false,
1434
+ "special": false
1435
+ },
1436
+ "179": {
1437
+ "content": "</thead>",
1438
+ "lstrip": false,
1439
+ "normalized": false,
1440
+ "rstrip": false,
1441
+ "single_word": false,
1442
+ "special": false
1443
+ },
1444
+ "180": {
1445
+ "content": "</tbody>",
1446
+ "lstrip": false,
1447
+ "normalized": false,
1448
+ "rstrip": false,
1449
+ "single_word": false,
1450
+ "special": false
1451
+ },
1452
+ "181": {
1453
+ "content": "</tfoot>",
1454
+ "lstrip": false,
1455
+ "normalized": false,
1456
+ "rstrip": false,
1457
+ "single_word": false,
1458
+ "special": false
1459
+ },
1460
+ "182": {
1461
+ "content": "</tr>",
1462
+ "lstrip": false,
1463
+ "normalized": false,
1464
+ "rstrip": false,
1465
+ "single_word": false,
1466
+ "special": false
1467
+ },
1468
+ "183": {
1469
+ "content": "</th>",
1470
+ "lstrip": false,
1471
+ "normalized": false,
1472
+ "rstrip": false,
1473
+ "single_word": false,
1474
+ "special": false
1475
+ },
1476
+ "184": {
1477
+ "content": "</td>",
1478
+ "lstrip": false,
1479
+ "normalized": false,
1480
+ "rstrip": false,
1481
+ "single_word": false,
1482
+ "special": false
1483
+ },
1484
+ "185": {
1485
+ "content": "<h1>",
1486
+ "lstrip": false,
1487
+ "normalized": false,
1488
+ "rstrip": false,
1489
+ "single_word": false,
1490
+ "special": false
1491
+ },
1492
+ "186": {
1493
+ "content": "<h2>",
1494
+ "lstrip": false,
1495
+ "normalized": false,
1496
+ "rstrip": false,
1497
+ "single_word": false,
1498
+ "special": false
1499
+ },
1500
+ "187": {
1501
+ "content": "<h3>",
1502
+ "lstrip": false,
1503
+ "normalized": false,
1504
+ "rstrip": false,
1505
+ "single_word": false,
1506
+ "special": false
1507
+ },
1508
+ "188": {
1509
+ "content": "<h4>",
1510
+ "lstrip": false,
1511
+ "normalized": false,
1512
+ "rstrip": false,
1513
+ "single_word": false,
1514
+ "special": false
1515
+ },
1516
+ "189": {
1517
+ "content": "<h5>",
1518
+ "lstrip": false,
1519
+ "normalized": false,
1520
+ "rstrip": false,
1521
+ "single_word": false,
1522
+ "special": false
1523
+ },
1524
+ "190": {
1525
+ "content": "<h6>",
1526
+ "lstrip": false,
1527
+ "normalized": false,
1528
+ "rstrip": false,
1529
+ "single_word": false,
1530
+ "special": false
1531
+ },
1532
+ "191": {
1533
+ "content": "<blockquote>",
1534
+ "lstrip": false,
1535
+ "normalized": false,
1536
+ "rstrip": false,
1537
+ "single_word": false,
1538
+ "special": false
1539
+ },
1540
+ "192": {
1541
+ "content": "</h1>",
1542
+ "lstrip": false,
1543
+ "normalized": false,
1544
+ "rstrip": false,
1545
+ "single_word": false,
1546
+ "special": false
1547
+ },
1548
+ "193": {
1549
+ "content": "</h2>",
1550
+ "lstrip": false,
1551
+ "normalized": false,
1552
+ "rstrip": false,
1553
+ "single_word": false,
1554
+ "special": false
1555
+ },
1556
+ "194": {
1557
+ "content": "</h3>",
1558
+ "lstrip": false,
1559
+ "normalized": false,
1560
+ "rstrip": false,
1561
+ "single_word": false,
1562
+ "special": false
1563
+ },
1564
+ "195": {
1565
+ "content": "</h4>",
1566
+ "lstrip": false,
1567
+ "normalized": false,
1568
+ "rstrip": false,
1569
+ "single_word": false,
1570
+ "special": false
1571
+ },
1572
+ "196": {
1573
+ "content": "</h5>",
1574
+ "lstrip": false,
1575
+ "normalized": false,
1576
+ "rstrip": false,
1577
+ "single_word": false,
1578
+ "special": false
1579
+ },
1580
+ "197": {
1581
+ "content": "</h6>",
1582
+ "lstrip": false,
1583
+ "normalized": false,
1584
+ "rstrip": false,
1585
+ "single_word": false,
1586
+ "special": false
1587
+ },
1588
+ "198": {
1589
+ "content": "</blockquote>",
1590
+ "lstrip": false,
1591
+ "normalized": false,
1592
+ "rstrip": false,
1593
+ "single_word": false,
1594
+ "special": false
1595
+ },
1596
+ "199": {
1597
+ "content": "<strong>",
1598
+ "lstrip": false,
1599
+ "normalized": false,
1600
+ "rstrip": false,
1601
+ "single_word": false,
1602
+ "special": false
1603
+ },
1604
+ "200": {
1605
+ "content": "<em>",
1606
+ "lstrip": false,
1607
+ "normalized": false,
1608
+ "rstrip": false,
1609
+ "single_word": false,
1610
+ "special": false
1611
+ },
1612
+ "201": {
1613
+ "content": "<b>",
1614
+ "lstrip": false,
1615
+ "normalized": false,
1616
+ "rstrip": false,
1617
+ "single_word": false,
1618
+ "special": false
1619
+ },
1620
+ "202": {
1621
+ "content": "<i>",
1622
+ "lstrip": false,
1623
+ "normalized": false,
1624
+ "rstrip": false,
1625
+ "single_word": false,
1626
+ "special": false
1627
+ },
1628
+ "203": {
1629
+ "content": "<u>",
1630
+ "lstrip": false,
1631
+ "normalized": false,
1632
+ "rstrip": false,
1633
+ "single_word": false,
1634
+ "special": false
1635
+ },
1636
+ "204": {
1637
+ "content": "<s>",
1638
+ "lstrip": false,
1639
+ "normalized": false,
1640
+ "rstrip": false,
1641
+ "single_word": false,
1642
+ "special": false
1643
+ },
1644
+ "205": {
1645
+ "content": "<sub>",
1646
+ "lstrip": false,
1647
+ "normalized": false,
1648
+ "rstrip": false,
1649
+ "single_word": false,
1650
+ "special": false
1651
+ },
1652
+ "206": {
1653
+ "content": "<sup>",
1654
+ "lstrip": false,
1655
+ "normalized": false,
1656
+ "rstrip": false,
1657
+ "single_word": false,
1658
+ "special": false
1659
+ },
1660
+ "207": {
1661
+ "content": "<code>",
1662
+ "lstrip": false,
1663
+ "normalized": false,
1664
+ "rstrip": false,
1665
+ "single_word": false,
1666
+ "special": false
1667
+ },
1668
+ "208": {
1669
+ "content": "</strong>",
1670
+ "lstrip": false,
1671
+ "normalized": false,
1672
+ "rstrip": false,
1673
+ "single_word": false,
1674
+ "special": false
1675
+ },
1676
+ "209": {
1677
+ "content": "</em>",
1678
+ "lstrip": false,
1679
+ "normalized": false,
1680
+ "rstrip": false,
1681
+ "single_word": false,
1682
+ "special": false
1683
+ },
1684
+ "210": {
1685
+ "content": "</b>",
1686
+ "lstrip": false,
1687
+ "normalized": false,
1688
+ "rstrip": false,
1689
+ "single_word": false,
1690
+ "special": false
1691
+ },
1692
+ "211": {
1693
+ "content": "</i>",
1694
+ "lstrip": false,
1695
+ "normalized": false,
1696
+ "rstrip": false,
1697
+ "single_word": false,
1698
+ "special": false
1699
+ },
1700
+ "212": {
1701
+ "content": "</u>",
1702
+ "lstrip": false,
1703
+ "normalized": false,
1704
+ "rstrip": false,
1705
+ "single_word": false,
1706
+ "special": false
1707
+ },
1708
+ "213": {
1709
+ "content": "</s>",
1710
+ "lstrip": false,
1711
+ "normalized": false,
1712
+ "rstrip": false,
1713
+ "single_word": false,
1714
+ "special": false
1715
+ },
1716
+ "214": {
1717
+ "content": "</sub>",
1718
+ "lstrip": false,
1719
+ "normalized": false,
1720
+ "rstrip": false,
1721
+ "single_word": false,
1722
+ "special": false
1723
+ },
1724
+ "215": {
1725
+ "content": "</sup>",
1726
+ "lstrip": false,
1727
+ "normalized": false,
1728
+ "rstrip": false,
1729
+ "single_word": false,
1730
+ "special": false
1731
+ },
1732
+ "216": {
1733
+ "content": "</code>",
1734
+ "lstrip": false,
1735
+ "normalized": false,
1736
+ "rstrip": false,
1737
+ "single_word": false,
1738
+ "special": false
1739
+ },
1740
+ "255968": {
1741
+ "content": "[toxicity=0]",
1742
+ "lstrip": false,
1743
+ "normalized": false,
1744
+ "rstrip": false,
1745
+ "single_word": false,
1746
+ "special": false
1747
+ },
1748
+ "255969": {
1749
+ "content": "\t\t",
1750
+ "lstrip": false,
1751
+ "normalized": false,
1752
+ "rstrip": false,
1753
+ "single_word": false,
1754
+ "special": false
1755
+ },
1756
+ "255970": {
1757
+ "content": "\t\t\t",
1758
+ "lstrip": false,
1759
+ "normalized": false,
1760
+ "rstrip": false,
1761
+ "single_word": false,
1762
+ "special": false
1763
+ },
1764
+ "255971": {
1765
+ "content": "\t\t\t\t",
1766
+ "lstrip": false,
1767
+ "normalized": false,
1768
+ "rstrip": false,
1769
+ "single_word": false,
1770
+ "special": false
1771
+ },
1772
+ "255972": {
1773
+ "content": "\t\t\t\t\t",
1774
+ "lstrip": false,
1775
+ "normalized": false,
1776
+ "rstrip": false,
1777
+ "single_word": false,
1778
+ "special": false
1779
+ },
1780
+ "255973": {
1781
+ "content": "\t\t\t\t\t\t",
1782
+ "lstrip": false,
1783
+ "normalized": false,
1784
+ "rstrip": false,
1785
+ "single_word": false,
1786
+ "special": false
1787
+ },
1788
+ "255974": {
1789
+ "content": "\t\t\t\t\t\t\t",
1790
+ "lstrip": false,
1791
+ "normalized": false,
1792
+ "rstrip": false,
1793
+ "single_word": false,
1794
+ "special": false
1795
+ },
1796
+ "255975": {
1797
+ "content": "\t\t\t\t\t\t\t\t",
1798
+ "lstrip": false,
1799
+ "normalized": false,
1800
+ "rstrip": false,
1801
+ "single_word": false,
1802
+ "special": false
1803
+ },
1804
+ "255976": {
1805
+ "content": "\t\t\t\t\t\t\t\t\t",
1806
+ "lstrip": false,
1807
+ "normalized": false,
1808
+ "rstrip": false,
1809
+ "single_word": false,
1810
+ "special": false
1811
+ },
1812
+ "255977": {
1813
+ "content": "\t\t\t\t\t\t\t\t\t\t",
1814
+ "lstrip": false,
1815
+ "normalized": false,
1816
+ "rstrip": false,
1817
+ "single_word": false,
1818
+ "special": false
1819
+ },
1820
+ "255978": {
1821
+ "content": "\t\t\t\t\t\t\t\t\t\t\t",
1822
+ "lstrip": false,
1823
+ "normalized": false,
1824
+ "rstrip": false,
1825
+ "single_word": false,
1826
+ "special": false
1827
+ },
1828
+ "255979": {
1829
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t",
1830
+ "lstrip": false,
1831
+ "normalized": false,
1832
+ "rstrip": false,
1833
+ "single_word": false,
1834
+ "special": false
1835
+ },
1836
+ "255980": {
1837
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t",
1838
+ "lstrip": false,
1839
+ "normalized": false,
1840
+ "rstrip": false,
1841
+ "single_word": false,
1842
+ "special": false
1843
+ },
1844
+ "255981": {
1845
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1846
+ "lstrip": false,
1847
+ "normalized": false,
1848
+ "rstrip": false,
1849
+ "single_word": false,
1850
+ "special": false
1851
+ },
1852
+ "255982": {
1853
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1854
+ "lstrip": false,
1855
+ "normalized": false,
1856
+ "rstrip": false,
1857
+ "single_word": false,
1858
+ "special": false
1859
+ },
1860
+ "255983": {
1861
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1862
+ "lstrip": false,
1863
+ "normalized": false,
1864
+ "rstrip": false,
1865
+ "single_word": false,
1866
+ "special": false
1867
+ },
1868
+ "255984": {
1869
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1870
+ "lstrip": false,
1871
+ "normalized": false,
1872
+ "rstrip": false,
1873
+ "single_word": false,
1874
+ "special": false
1875
+ },
1876
+ "255985": {
1877
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1878
+ "lstrip": false,
1879
+ "normalized": false,
1880
+ "rstrip": false,
1881
+ "single_word": false,
1882
+ "special": false
1883
+ },
1884
+ "255986": {
1885
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1886
+ "lstrip": false,
1887
+ "normalized": false,
1888
+ "rstrip": false,
1889
+ "single_word": false,
1890
+ "special": false
1891
+ },
1892
+ "255987": {
1893
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1894
+ "lstrip": false,
1895
+ "normalized": false,
1896
+ "rstrip": false,
1897
+ "single_word": false,
1898
+ "special": false
1899
+ },
1900
+ "255988": {
1901
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1902
+ "lstrip": false,
1903
+ "normalized": false,
1904
+ "rstrip": false,
1905
+ "single_word": false,
1906
+ "special": false
1907
+ },
1908
+ "255989": {
1909
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1910
+ "lstrip": false,
1911
+ "normalized": false,
1912
+ "rstrip": false,
1913
+ "single_word": false,
1914
+ "special": false
1915
+ },
1916
+ "255990": {
1917
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1918
+ "lstrip": false,
1919
+ "normalized": false,
1920
+ "rstrip": false,
1921
+ "single_word": false,
1922
+ "special": false
1923
+ },
1924
+ "255991": {
1925
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1926
+ "lstrip": false,
1927
+ "normalized": false,
1928
+ "rstrip": false,
1929
+ "single_word": false,
1930
+ "special": false
1931
+ },
1932
+ "255992": {
1933
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1934
+ "lstrip": false,
1935
+ "normalized": false,
1936
+ "rstrip": false,
1937
+ "single_word": false,
1938
+ "special": false
1939
+ },
1940
+ "255993": {
1941
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1942
+ "lstrip": false,
1943
+ "normalized": false,
1944
+ "rstrip": false,
1945
+ "single_word": false,
1946
+ "special": false
1947
+ },
1948
+ "255994": {
1949
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1950
+ "lstrip": false,
1951
+ "normalized": false,
1952
+ "rstrip": false,
1953
+ "single_word": false,
1954
+ "special": false
1955
+ },
1956
+ "255995": {
1957
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1958
+ "lstrip": false,
1959
+ "normalized": false,
1960
+ "rstrip": false,
1961
+ "single_word": false,
1962
+ "special": false
1963
+ },
1964
+ "255996": {
1965
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1966
+ "lstrip": false,
1967
+ "normalized": false,
1968
+ "rstrip": false,
1969
+ "single_word": false,
1970
+ "special": false
1971
+ },
1972
+ "255997": {
1973
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1974
+ "lstrip": false,
1975
+ "normalized": false,
1976
+ "rstrip": false,
1977
+ "single_word": false,
1978
+ "special": false
1979
+ },
1980
+ "255998": {
1981
+ "content": "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t",
1982
+ "lstrip": false,
1983
+ "normalized": false,
1984
+ "rstrip": false,
1985
+ "single_word": false,
1986
+ "special": false
1987
+ },
1988
+ "255999": {
1989
+ "content": "<unused99>",
1990
+ "lstrip": false,
1991
+ "normalized": false,
1992
+ "rstrip": false,
1993
+ "single_word": false,
1994
+ "special": false
1995
+ }
1996
+ },
1997
+ "additional_special_tokens": [
1998
+ "<start_of_turn>",
1999
+ "<end_of_turn>"
2000
+ ],
2001
+ "bos_token": "<bos>",
2002
+ "clean_up_tokenization_spaces": false,
2003
+ "cls_token": "<bos>",
2004
+ "eos_token": "<eos>",
2005
+ "extra_special_tokens": {},
2006
+ "mask_token": "<mask>",
2007
+ "model_input_names": [
2008
+ "input_ids",
2009
+ "attention_mask"
2010
+ ],
2011
+ "model_max_length": 8192,
2012
+ "pad_token": "<pad>",
2013
+ "padding_side": "right",
2014
+ "sep_token": "<eos>",
2015
+ "spaces_between_special_tokens": false,
2016
+ "tokenizer_class": "PreTrainedTokenizerFast",
2017
+ "unk_token": "<unk>"
2018
+ }