Lakoc commited on
Commit
4736a54
·
verified ·
1 Parent(s): ba07935

Upload tokenizer

Browse files
Files changed (3) hide show
  1. special_tokens_map.json +7 -0
  2. tokenizer.json +2153 -0
  3. tokenizer_config.json +10 -0
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "([bos])",
3
+ "eos_token": "([eos])",
4
+ "mask_token": "([mask])",
5
+ "pad_token": "([pad])",
6
+ "unk_token": "([unk])"
7
+ }
tokenizer.json ADDED
@@ -0,0 +1,2153 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
5
+ "added_tokens": [
6
+ {
7
+ "id": 0,
8
+ "content": "([bos])",
9
+ "single_word": false,
10
+ "lstrip": false,
11
+ "rstrip": false,
12
+ "normalized": false,
13
+ "special": true
14
+ },
15
+ {
16
+ "id": 1,
17
+ "content": "([eos])",
18
+ "single_word": false,
19
+ "lstrip": false,
20
+ "rstrip": false,
21
+ "normalized": false,
22
+ "special": true
23
+ },
24
+ {
25
+ "id": 2,
26
+ "content": "([unk])",
27
+ "single_word": false,
28
+ "lstrip": false,
29
+ "rstrip": false,
30
+ "normalized": false,
31
+ "special": true
32
+ },
33
+ {
34
+ "id": 3,
35
+ "content": "([pad])",
36
+ "single_word": false,
37
+ "lstrip": false,
38
+ "rstrip": false,
39
+ "normalized": false,
40
+ "special": true
41
+ },
42
+ {
43
+ "id": 4,
44
+ "content": "([mask])",
45
+ "single_word": false,
46
+ "lstrip": false,
47
+ "rstrip": false,
48
+ "normalized": false,
49
+ "special": true
50
+ }
51
+ ],
52
+ "normalizer": {
53
+ "type": "Sequence",
54
+ "normalizers": [
55
+ {
56
+ "type": "Replace",
57
+ "pattern": {
58
+ "String": "``"
59
+ },
60
+ "content": "\""
61
+ },
62
+ {
63
+ "type": "Replace",
64
+ "pattern": {
65
+ "String": "''"
66
+ },
67
+ "content": "\""
68
+ },
69
+ {
70
+ "type": "Lowercase"
71
+ }
72
+ ]
73
+ },
74
+ "pre_tokenizer": {
75
+ "type": "Metaspace",
76
+ "replacement": "▁",
77
+ "add_prefix_space": true
78
+ },
79
+ "post_processor": {
80
+ "type": "TemplateProcessing",
81
+ "single": [
82
+ {
83
+ "Sequence": {
84
+ "id": "A",
85
+ "type_id": 0
86
+ }
87
+ },
88
+ {
89
+ "SpecialToken": {
90
+ "id": "([eos])",
91
+ "type_id": 0
92
+ }
93
+ }
94
+ ],
95
+ "pair": [
96
+ {
97
+ "Sequence": {
98
+ "id": "A",
99
+ "type_id": 0
100
+ }
101
+ },
102
+ {
103
+ "SpecialToken": {
104
+ "id": "([eos])",
105
+ "type_id": 0
106
+ }
107
+ },
108
+ {
109
+ "Sequence": {
110
+ "id": "B",
111
+ "type_id": 1
112
+ }
113
+ },
114
+ {
115
+ "SpecialToken": {
116
+ "id": "([eos])",
117
+ "type_id": 1
118
+ }
119
+ }
120
+ ],
121
+ "special_tokens": {
122
+ "([bos])": {
123
+ "id": "([bos])",
124
+ "ids": [
125
+ 0
126
+ ],
127
+ "tokens": [
128
+ "([bos])"
129
+ ]
130
+ },
131
+ "([eos])": {
132
+ "id": "([eos])",
133
+ "ids": [
134
+ 1
135
+ ],
136
+ "tokens": [
137
+ "([eos])"
138
+ ]
139
+ }
140
+ }
141
+ },
142
+ "decoder": {
143
+ "type": "Metaspace",
144
+ "replacement": "▁",
145
+ "add_prefix_space": true
146
+ },
147
+ "model": {
148
+ "type": "Unigram",
149
+ "unk_id": 2,
150
+ "vocab": [
151
+ [
152
+ "([bos])",
153
+ 0.0
154
+ ],
155
+ [
156
+ "([eos])",
157
+ 0.0
158
+ ],
159
+ [
160
+ "([unk])",
161
+ 0.0
162
+ ],
163
+ [
164
+ "([pad])",
165
+ 0.0
166
+ ],
167
+ [
168
+ "([mask])",
169
+ 0.0
170
+ ],
171
+ [
172
+ "▁",
173
+ -2.6754905386285905
174
+ ],
175
+ [
176
+ "s",
177
+ -2.8768613408851564
178
+ ],
179
+ [
180
+ "e",
181
+ -3.2762470735395457
182
+ ],
183
+ [
184
+ "t",
185
+ -3.346505232415412
186
+ ],
187
+ [
188
+ "▁the",
189
+ -3.4613494666908995
190
+ ],
191
+ [
192
+ "d",
193
+ -3.90460894182274
194
+ ],
195
+ [
196
+ "a",
197
+ -3.988983277321309
198
+ ],
199
+ [
200
+ "▁a",
201
+ -4.1043433268201825
202
+ ],
203
+ [
204
+ "m",
205
+ -4.130835343792769
206
+ ],
207
+ [
208
+ "y",
209
+ -4.187992643366066
210
+ ],
211
+ [
212
+ "n",
213
+ -4.217156675408131
214
+ ],
215
+ [
216
+ "i",
217
+ -4.232281987977052
218
+ ],
219
+ [
220
+ "ed",
221
+ -4.311072875814168
222
+ ],
223
+ [
224
+ "ing",
225
+ -4.317765051829164
226
+ ],
227
+ [
228
+ "▁to",
229
+ -4.331294592787134
230
+ ],
231
+ [
232
+ "o",
233
+ -4.346337287814527
234
+ ],
235
+ [
236
+ "c",
237
+ -4.3537456912204755
238
+ ],
239
+ [
240
+ "▁of",
241
+ -4.408302422120218
242
+ ],
243
+ [
244
+ "▁in",
245
+ -4.439226546242843
246
+ ],
247
+ [
248
+ "and",
249
+ -4.447707206682657
250
+ ],
251
+ [
252
+ "p",
253
+ -4.557523898205941
254
+ ],
255
+ [
256
+ "u",
257
+ -4.599969020077225
258
+ ],
259
+ [
260
+ "g",
261
+ -4.644611077920461
262
+ ],
263
+ [
264
+ "r",
265
+ -4.706481270700311
266
+ ],
267
+ [
268
+ "l",
269
+ -4.713380466456869
270
+ ],
271
+ [
272
+ "er",
273
+ -4.74899222065031
274
+ ],
275
+ [
276
+ "al",
277
+ -4.832948319811666
278
+ ],
279
+ [
280
+ "▁is",
281
+ -4.870481905541696
282
+ ],
283
+ [
284
+ "in",
285
+ -4.893132701006505
286
+ ],
287
+ [
288
+ "k",
289
+ -4.974102278368674
290
+ ],
291
+ [
292
+ "or",
293
+ -5.0025052704507935
294
+ ],
295
+ [
296
+ "b",
297
+ -5.035860350244143
298
+ ],
299
+ [
300
+ "▁be",
301
+ -5.071977878617131
302
+ ],
303
+ [
304
+ "ly",
305
+ -5.110970671924331
306
+ ],
307
+ [
308
+ "en",
309
+ -5.176639562119152
310
+ ],
311
+ [
312
+ "it",
313
+ -5.2223464220674005
314
+ ],
315
+ [
316
+ "v",
317
+ -5.230571121824163
318
+ ],
319
+ [
320
+ "le",
321
+ -5.261011559598946
322
+ ],
323
+ [
324
+ "ar",
325
+ -5.28733580866414
326
+ ],
327
+ [
328
+ "ch",
329
+ -5.331946977557094
330
+ ],
331
+ [
332
+ "st",
333
+ -5.34591745748041
334
+ ],
335
+ [
336
+ "an",
337
+ -5.355807230421707
338
+ ],
339
+ [
340
+ "▁f",
341
+ -5.391744389389056
342
+ ],
343
+ [
344
+ "ic",
345
+ -5.423546538328285
346
+ ],
347
+ [
348
+ "▁for",
349
+ -5.447670532180611
350
+ ],
351
+ [
352
+ "f",
353
+ -5.48303699228936
354
+ ],
355
+ [
356
+ "w",
357
+ -5.5198467796805755
358
+ ],
359
+ [
360
+ "ur",
361
+ -5.530880383304706
362
+ ],
363
+ [
364
+ "es",
365
+ -5.5501989639084055
366
+ ],
367
+ [
368
+ "on",
369
+ -5.557399947373246
370
+ ],
371
+ [
372
+ "▁re",
373
+ -5.567545311958694
374
+ ],
375
+ [
376
+ "▁are",
377
+ -5.573269832805648
378
+ ],
379
+ [
380
+ "▁on",
381
+ -5.59021385720431
382
+ ],
383
+ [
384
+ "th",
385
+ -5.627197447181516
386
+ ],
387
+ [
388
+ "▁h",
389
+ -5.644403198021651
390
+ ],
391
+ [
392
+ "▁tha",
393
+ -5.647913682947817
394
+ ],
395
+ [
396
+ "▁c",
397
+ -5.6522531999839325
398
+ ],
399
+ [
400
+ "h",
401
+ -5.6671890386742785
402
+ ],
403
+ [
404
+ "▁w",
405
+ -5.688082766473154
406
+ ],
407
+ [
408
+ "re",
409
+ -5.722153275438761
410
+ ],
411
+ [
412
+ "ir",
413
+ -5.7903344504118595
414
+ ],
415
+ [
416
+ "▁b",
417
+ -5.801750081259955
418
+ ],
419
+ [
420
+ "▁with",
421
+ -5.819491368215889
422
+ ],
423
+ [
424
+ "ro",
425
+ -5.822405765770177
426
+ ],
427
+ [
428
+ "ri",
429
+ -5.854370618582819
430
+ ],
431
+ [
432
+ "ation",
433
+ -5.855145426489802
434
+ ],
435
+ [
436
+ "0",
437
+ -5.8872829347974
438
+ ],
439
+ [
440
+ "▁was",
441
+ -5.894023413092957
442
+ ],
443
+ [
444
+ "ent",
445
+ -5.913173819895014
446
+ ],
447
+ [
448
+ "▁or",
449
+ -5.929829545288575
450
+ ],
451
+ [
452
+ "ce",
453
+ -5.94118778631942
454
+ ],
455
+ [
456
+ "▁an",
457
+ -5.942275864998134
458
+ ],
459
+ [
460
+ "▁st",
461
+ -5.959845626706876
462
+ ],
463
+ [
464
+ "▁p",
465
+ -5.979182853147907
466
+ ],
467
+ [
468
+ "▁co",
469
+ -5.99464444437195
470
+ ],
471
+ [
472
+ "tion",
473
+ -6.018005629870164
474
+ ],
475
+ [
476
+ "▁de",
477
+ -6.0197214334498375
478
+ ],
479
+ [
480
+ "at",
481
+ -6.023872247567192
482
+ ],
483
+ [
484
+ "▁it",
485
+ -6.0468006319666525
486
+ ],
487
+ [
488
+ "il",
489
+ -6.071419632524207
490
+ ],
491
+ [
492
+ "▁su",
493
+ -6.087265345738892
494
+ ],
495
+ [
496
+ "▁ma",
497
+ -6.0955072307599565
498
+ ],
499
+ [
500
+ "▁as",
501
+ -6.096445849090732
502
+ ],
503
+ [
504
+ "▁fro",
505
+ -6.129354549270313
506
+ ],
507
+ [
508
+ "▁you",
509
+ -6.131897252589564
510
+ ],
511
+ [
512
+ "ol",
513
+ -6.1390576793196425
514
+ ],
515
+ [
516
+ "▁con",
517
+ -6.145755126990158
518
+ ],
519
+ [
520
+ "la",
521
+ -6.146651426274379
522
+ ],
523
+ [
524
+ "▁can",
525
+ -6.149043410740569
526
+ ],
527
+ [
528
+ "▁he",
529
+ -6.152763862900574
530
+ ],
531
+ [
532
+ "▁hav",
533
+ -6.156435802908428
534
+ ],
535
+ [
536
+ "▁1",
537
+ -6.160132051411401
538
+ ],
539
+ [
540
+ "is",
541
+ -6.195569695401377
542
+ ],
543
+ [
544
+ "▁by",
545
+ -6.212529629983307
546
+ ],
547
+ [
548
+ "lo",
549
+ -6.220446873243513
550
+ ],
551
+ [
552
+ "ll",
553
+ -6.228179292983937
554
+ ],
555
+ [
556
+ "ies",
557
+ -6.245665905271688
558
+ ],
559
+ [
560
+ "id",
561
+ -6.260508224854381
562
+ ],
563
+ [
564
+ "▁not",
565
+ -6.267819101117954
566
+ ],
567
+ [
568
+ "if",
569
+ -6.270644329586759
570
+ ],
571
+ [
572
+ "un",
573
+ -6.275506263031922
574
+ ],
575
+ [
576
+ "▁thi",
577
+ -6.2881655764394235
578
+ ],
579
+ [
580
+ "▁di",
581
+ -6.329588431001757
582
+ ],
583
+ [
584
+ "us",
585
+ -6.333327209962254
586
+ ],
587
+ [
588
+ "ra",
589
+ -6.340865501780096
590
+ ],
591
+ [
592
+ "▁at",
593
+ -6.355182350030107
594
+ ],
595
+ [
596
+ "j",
597
+ -6.357964711000992
598
+ ],
599
+ [
600
+ "ve",
601
+ -6.372531519329924
602
+ ],
603
+ [
604
+ "om",
605
+ -6.37496567318558
606
+ ],
607
+ [
608
+ "ate",
609
+ -6.3753246003687005
610
+ ],
611
+ [
612
+ "el",
613
+ -6.390196095937399
614
+ ],
615
+ [
616
+ "ion",
617
+ -6.396975087246941
618
+ ],
619
+ [
620
+ "▁2",
621
+ -6.400755836647712
622
+ ],
623
+ [
624
+ "▁we",
625
+ -6.412240561664114
626
+ ],
627
+ [
628
+ "▁g",
629
+ -6.416027622473206
630
+ ],
631
+ [
632
+ "z",
633
+ -6.43633541535936
634
+ ],
635
+ [
636
+ "▁sa",
637
+ -6.43850552769723
638
+ ],
639
+ [
640
+ "ver",
641
+ -6.447887731618186
642
+ ],
643
+ [
644
+ "▁whi",
645
+ -6.457883910972052
646
+ ],
647
+ [
648
+ "x",
649
+ -6.459583214795332
650
+ ],
651
+ [
652
+ "▁pa",
653
+ -6.503480316267584
654
+ ],
655
+ [
656
+ "ive",
657
+ -6.5110947788952975
658
+ ],
659
+ [
660
+ "ight",
661
+ -6.536365914035072
662
+ ],
663
+ [
664
+ "ers",
665
+ -6.5428103154111135
666
+ ],
667
+ [
668
+ "▁bo",
669
+ -6.546150816046826
670
+ ],
671
+ [
672
+ "▁no",
673
+ -6.547088192255935
674
+ ],
675
+ [
676
+ "ment",
677
+ -6.562479873766899
678
+ ],
679
+ [
680
+ "▁pro",
681
+ -6.571064898982521
682
+ ],
683
+ [
684
+ "act",
685
+ -6.583990559778682
686
+ ],
687
+ [
688
+ "▁has",
689
+ -6.603654202468725
690
+ ],
691
+ [
692
+ "out",
693
+ -6.60580960365786
694
+ ],
695
+ [
696
+ "▁mo",
697
+ -6.612206063842681
698
+ ],
699
+ [
700
+ "▁man",
701
+ -6.641204427471876
702
+ ],
703
+ [
704
+ "age",
705
+ -6.641280763657149
706
+ ],
707
+ [
708
+ "▁will",
709
+ -6.64270247271019
710
+ ],
711
+ [
712
+ "▁me",
713
+ -6.647724830351693
714
+ ],
715
+ [
716
+ "4",
717
+ -6.651679234308922
718
+ ],
719
+ [
720
+ "ant",
721
+ -6.653057213377799
722
+ ],
723
+ [
724
+ "▁sp",
725
+ -6.653170671243362
726
+ ],
727
+ [
728
+ "▁us",
729
+ -6.660440017314415
730
+ ],
731
+ [
732
+ "5",
733
+ -6.669223493954846
734
+ ],
735
+ [
736
+ "▁k",
737
+ -6.669597778381468
738
+ ],
739
+ [
740
+ "▁some",
741
+ -6.6818801214084935
742
+ ],
743
+ [
744
+ "were",
745
+ -6.685358326258509
746
+ ],
747
+ [
748
+ "day",
749
+ -6.685810790026934
750
+ ],
751
+ [
752
+ "est",
753
+ -6.692698060575062
754
+ ],
755
+ [
756
+ "ul",
757
+ -6.710575311022998
758
+ ],
759
+ [
760
+ "ph",
761
+ -6.73056730249125
762
+ ],
763
+ [
764
+ "▁ex",
765
+ -6.733336219158555
766
+ ],
767
+ [
768
+ "▁fa",
769
+ -6.765091874615061
770
+ ],
771
+ [
772
+ "▁so",
773
+ -6.767286242386988
774
+ ],
775
+ [
776
+ "ard",
777
+ -6.772605838493793
778
+ ],
779
+ [
780
+ "▁un",
781
+ -6.787572388877607
782
+ ],
783
+ [
784
+ "▁do",
785
+ -6.795074557643034
786
+ ],
787
+ [
788
+ "▁other",
789
+ -6.7956782657022545
790
+ ],
791
+ [
792
+ "▁but",
793
+ -6.8025729087235485
794
+ ],
795
+ [
796
+ "ther",
797
+ -6.803231620372917
798
+ ],
799
+ [
800
+ "ian",
801
+ -6.812902667758519
802
+ ],
803
+ [
804
+ "▁one",
805
+ -6.821076061857491
806
+ ],
807
+ [
808
+ "▁more",
809
+ -6.8266493698278925
810
+ ],
811
+ [
812
+ "6",
813
+ -6.82879636799627
814
+ ],
815
+ [
816
+ "ous",
817
+ -6.855962306912703
818
+ ],
819
+ [
820
+ "▁whe",
821
+ -6.859091467805891
822
+ ],
823
+ [
824
+ "able",
825
+ -6.8654555236076575
826
+ ],
827
+ [
828
+ "▁most",
829
+ -6.865507428780849
830
+ ],
831
+ [
832
+ "▁all",
833
+ -6.868332909576947
834
+ ],
835
+ [
836
+ "ca",
837
+ -6.868732115162404
838
+ ],
839
+ [
840
+ "▁mi",
841
+ -6.871663607421129
842
+ ],
843
+ [
844
+ "▁car",
845
+ -6.872038781117334
846
+ ],
847
+ [
848
+ "▁en",
849
+ -6.877446929350223
850
+ ],
851
+ [
852
+ "per",
853
+ -6.892167759263063
854
+ ],
855
+ [
856
+ "▁time",
857
+ -6.8931312430041185
858
+ ],
859
+ [
860
+ "▁vi",
861
+ -6.895209208386817
862
+ ],
863
+ [
864
+ "had",
865
+ -6.898533648058522
866
+ ],
867
+ [
868
+ "▁go",
869
+ -6.902336913332548
870
+ ],
871
+ [
872
+ "lso",
873
+ -6.913439467216554
874
+ ],
875
+ [
876
+ "▁sh",
877
+ -6.919449248503678
878
+ ],
879
+ [
880
+ "▁li",
881
+ -6.922560430255311
882
+ ],
883
+ [
884
+ "▁countr",
885
+ -6.923075365505835
886
+ ],
887
+ [
888
+ "▁ra",
889
+ -6.924585801451686
890
+ ],
891
+ [
892
+ "▁br",
893
+ -6.931064678460284
894
+ ],
895
+ [
896
+ "▁ab",
897
+ -6.931940367854744
898
+ ],
899
+ [
900
+ "ia",
901
+ -6.937806621480712
902
+ ],
903
+ [
904
+ "ance",
905
+ -6.957346174263031
906
+ ],
907
+ [
908
+ "8",
909
+ -6.961832959098544
910
+ ],
911
+ [
912
+ "▁3",
913
+ -6.968100343799258
914
+ ],
915
+ [
916
+ "▁there",
917
+ -6.9733914148350085
918
+ ],
919
+ [
920
+ "ish",
921
+ -6.986137594465574
922
+ ],
923
+ [
924
+ "▁even",
925
+ -7.002799148099052
926
+ ],
927
+ [
928
+ "▁people",
929
+ -7.02716205152031
930
+ ],
931
+ [
932
+ "▁per",
933
+ -7.0330618605904665
934
+ ],
935
+ [
936
+ "way",
937
+ -7.03633517131352
938
+ ],
939
+ [
940
+ "ern",
941
+ -7.042479278911798
942
+ ],
943
+ [
944
+ "▁pre",
945
+ -7.0523064460866784
946
+ ],
947
+ [
948
+ "ff",
949
+ -7.053031071467035
950
+ ],
951
+ [
952
+ "▁your",
953
+ -7.066913374732923
954
+ ],
955
+ [
956
+ "▁new",
957
+ -7.0793013547195525
958
+ ],
959
+ [
960
+ "9",
961
+ -7.085883866389088
962
+ ],
963
+ [
964
+ "ill",
965
+ -7.089598704000823
966
+ ],
967
+ [
968
+ "▁over",
969
+ -7.098805832816234
970
+ ],
971
+ [
972
+ "▁after",
973
+ -7.115356643253071
974
+ ],
975
+ [
976
+ "▁tra",
977
+ -7.120228788920384
978
+ ],
979
+ [
980
+ "low",
981
+ -7.122220040050001
982
+ ],
983
+ [
984
+ "▁comp",
985
+ -7.133500815815238
986
+ ],
987
+ [
988
+ "▁ba",
989
+ -7.141131224024946
990
+ ],
991
+ [
992
+ "ical",
993
+ -7.1422260237143265
994
+ ],
995
+ [
996
+ "round",
997
+ -7.1435233350247245
998
+ ],
999
+ [
1000
+ "7",
1001
+ -7.152886984152966
1002
+ ],
1003
+ [
1004
+ "▁year",
1005
+ -7.153144193143549
1006
+ ],
1007
+ [
1008
+ "▁may",
1009
+ -7.159125415859334
1010
+ ],
1011
+ [
1012
+ "▁na",
1013
+ -7.161008162022526
1014
+ ],
1015
+ [
1016
+ "one",
1017
+ -7.184878065317987
1018
+ ],
1019
+ [
1020
+ "▁al",
1021
+ -7.195461984655057
1022
+ ],
1023
+ [
1024
+ "um",
1025
+ -7.209006967663773
1026
+ ],
1027
+ [
1028
+ "▁pri",
1029
+ -7.210069850711059
1030
+ ],
1031
+ [
1032
+ "▁world",
1033
+ -7.212023222385909
1034
+ ],
1035
+ [
1036
+ "▁who",
1037
+ -7.223014137024571
1038
+ ],
1039
+ [
1040
+ "ould",
1041
+ -7.226760603979994
1042
+ ],
1043
+ [
1044
+ "▁out",
1045
+ -7.231998550417056
1046
+ ],
1047
+ [
1048
+ "▁its",
1049
+ -7.233405561461712
1050
+ ],
1051
+ [
1052
+ "▁up",
1053
+ -7.285142155426511
1054
+ ],
1055
+ [
1056
+ "▁than",
1057
+ -7.299137896625927
1058
+ ],
1059
+ [
1060
+ "▁state",
1061
+ -7.30645541347183
1062
+ ],
1063
+ [
1064
+ "ence",
1065
+ -7.318807589581895
1066
+ ],
1067
+ [
1068
+ "long",
1069
+ -7.332923092470053
1070
+ ],
1071
+ [
1072
+ "▁his",
1073
+ -7.3354486738370595
1074
+ ],
1075
+ [
1076
+ "ize",
1077
+ -7.335587746438243
1078
+ ],
1079
+ [
1080
+ "ose",
1081
+ -7.339855014717346
1082
+ ],
1083
+ [
1084
+ "▁know",
1085
+ -7.341930003063956
1086
+ ],
1087
+ [
1088
+ "ry",
1089
+ -7.344186782226564
1090
+ ],
1091
+ [
1092
+ "2",
1093
+ -7.344207514829643
1094
+ ],
1095
+ [
1096
+ "▁imp",
1097
+ -7.353859464596262
1098
+ ],
1099
+ [
1100
+ "ary",
1101
+ -7.354825893119824
1102
+ ],
1103
+ [
1104
+ "▁often",
1105
+ -7.358854125859942
1106
+ ],
1107
+ [
1108
+ "▁call",
1109
+ -7.378671801439579
1110
+ ],
1111
+ [
1112
+ "▁into",
1113
+ -7.386238978021677
1114
+ ],
1115
+ [
1116
+ "▁part",
1117
+ -7.392001248006106
1118
+ ],
1119
+ [
1120
+ "ture",
1121
+ -7.395700798454083
1122
+ ],
1123
+ [
1124
+ "▁how",
1125
+ -7.414132628968153
1126
+ ],
1127
+ [
1128
+ "que",
1129
+ -7.4158513488173
1130
+ ],
1131
+ [
1132
+ "where",
1133
+ -7.416644301772499
1134
+ ],
1135
+ [
1136
+ "▁19",
1137
+ -7.416822207519889
1138
+ ],
1139
+ [
1140
+ "1",
1141
+ -7.434230018742243
1142
+ ],
1143
+ [
1144
+ "▁travel",
1145
+ -7.451833886454098
1146
+ ],
1147
+ [
1148
+ "▁dur",
1149
+ -7.487770219521905
1150
+ ],
1151
+ [
1152
+ "▁mon",
1153
+ -7.48847200394581
1154
+ ],
1155
+ [
1156
+ "▁work",
1157
+ -7.491585291065162
1158
+ ],
1159
+ [
1160
+ "cause",
1161
+ -7.4924841199173855
1162
+ ],
1163
+ [
1164
+ "▁war",
1165
+ -7.494757093128933
1166
+ ],
1167
+ [
1168
+ "inter",
1169
+ -7.500819628415845
1170
+ ],
1171
+ [
1172
+ "▁like",
1173
+ -7.505488775076321
1174
+ ],
1175
+ [
1176
+ "thing",
1177
+ -7.510805171237376
1178
+ ],
1179
+ [
1180
+ "▁comm",
1181
+ -7.521731365223375
1182
+ ],
1183
+ [
1184
+ "▁cit",
1185
+ -7.524839744064275
1186
+ ],
1187
+ [
1188
+ "▁sai",
1189
+ -7.525578186853979
1190
+ ],
1191
+ [
1192
+ "read",
1193
+ -7.527557697032419
1194
+ ],
1195
+ [
1196
+ "▁different",
1197
+ -7.5464568591042385
1198
+ ],
1199
+ [
1200
+ "▁get",
1201
+ -7.552169862062854
1202
+ ],
1203
+ [
1204
+ "ock",
1205
+ -7.555306657435667
1206
+ ],
1207
+ [
1208
+ "▁show",
1209
+ -7.575335097438488
1210
+ ],
1211
+ [
1212
+ "▁north",
1213
+ -7.575620423647008
1214
+ ],
1215
+ [
1216
+ "▁make",
1217
+ -7.577043786702754
1218
+ ],
1219
+ [
1220
+ "▁tri",
1221
+ -7.590393027908454
1222
+ ],
1223
+ [
1224
+ "though",
1225
+ -7.593003994539069
1226
+ ],
1227
+ [
1228
+ "▁jo",
1229
+ -7.617850059876274
1230
+ ],
1231
+ [
1232
+ "ign",
1233
+ -7.621768802470928
1234
+ ],
1235
+ [
1236
+ "▁fre",
1237
+ -7.625386647341769
1238
+ ],
1239
+ [
1240
+ "▁see",
1241
+ -7.6336479886677315
1242
+ ],
1243
+ [
1244
+ "▁app",
1245
+ -7.639523684019808
1246
+ ],
1247
+ [
1248
+ "▁place",
1249
+ -7.651093729477778
1250
+ ],
1251
+ [
1252
+ "▁water",
1253
+ -7.651415059507122
1254
+ ],
1255
+ [
1256
+ "▁report",
1257
+ -7.666922795581368
1258
+ ],
1259
+ [
1260
+ "▁just",
1261
+ -7.683062641491382
1262
+ ],
1263
+ [
1264
+ "▁should",
1265
+ -7.684631124020186
1266
+ ],
1267
+ [
1268
+ "▁cha",
1269
+ -7.686782915070514
1270
+ ],
1271
+ [
1272
+ "side",
1273
+ -7.706471569720542
1274
+ ],
1275
+ [
1276
+ "ness",
1277
+ -7.708460850991384
1278
+ ],
1279
+ [
1280
+ "▁south",
1281
+ -7.716338337987308
1282
+ ],
1283
+ [
1284
+ "▁10",
1285
+ -7.7273771012898225
1286
+ ],
1287
+ [
1288
+ "▁includ",
1289
+ -7.733039602026582
1290
+ ],
1291
+ [
1292
+ "▁back",
1293
+ -7.733220730712199
1294
+ ],
1295
+ [
1296
+ "▁qu",
1297
+ -7.755042830237189
1298
+ ],
1299
+ [
1300
+ "▁par",
1301
+ -7.761722791424332
1302
+ ],
1303
+ [
1304
+ "▁small",
1305
+ -7.767843049176307
1306
+ ],
1307
+ [
1308
+ "▁need",
1309
+ -7.769123438158099
1310
+ ],
1311
+ [
1312
+ "3",
1313
+ -7.771541622268048
1314
+ ],
1315
+ [
1316
+ "tter",
1317
+ -7.772352757891329
1318
+ ],
1319
+ [
1320
+ "ough",
1321
+ -7.782401384709099
1322
+ ],
1323
+ [
1324
+ "▁take",
1325
+ -7.803964097405506
1326
+ ],
1327
+ [
1328
+ "▁require",
1329
+ -7.822395117221289
1330
+ ],
1331
+ [
1332
+ "▁base",
1333
+ -7.823680575156837
1334
+ ],
1335
+ [
1336
+ "▁gra",
1337
+ -7.825064277345991
1338
+ ],
1339
+ [
1340
+ "▁through",
1341
+ -7.841251976339118
1342
+ ],
1343
+ [
1344
+ "▁high",
1345
+ -7.841254254031162
1346
+ ],
1347
+ [
1348
+ "▁visit",
1349
+ -7.861379901015909
1350
+ ],
1351
+ [
1352
+ "▁home",
1353
+ -7.862319183680619
1354
+ ],
1355
+ [
1356
+ "cient",
1357
+ -7.8663293686340445
1358
+ ],
1359
+ [
1360
+ "▁ski",
1361
+ -7.872481670538821
1362
+ ],
1363
+ [
1364
+ "▁island",
1365
+ -7.882319522995838
1366
+ ],
1367
+ [
1368
+ "▁large",
1369
+ -7.900124903843213
1370
+ ],
1371
+ [
1372
+ "▁down",
1373
+ -7.900275079303846
1374
+ ],
1375
+ [
1376
+ "▁child",
1377
+ -7.900295796047779
1378
+ ],
1379
+ [
1380
+ "▁near",
1381
+ -7.9006274881876095
1382
+ ],
1383
+ [
1384
+ "each",
1385
+ -7.904823960369406
1386
+ ],
1387
+ [
1388
+ "▁number",
1389
+ -7.92049657311796
1390
+ ],
1391
+ [
1392
+ "ship",
1393
+ -7.923714185081133
1394
+ ],
1395
+ [
1396
+ "special",
1397
+ -7.924246082451086
1398
+ ],
1399
+ [
1400
+ "▁found",
1401
+ -7.941336564660717
1402
+ ],
1403
+ [
1404
+ "▁sea",
1405
+ -7.954838574459147
1406
+ ],
1407
+ [
1408
+ "▁again",
1409
+ -7.962674285284644
1410
+ ],
1411
+ [
1412
+ "▁learn",
1413
+ -7.962691878072893
1414
+ ],
1415
+ [
1416
+ "ember",
1417
+ -7.965884834890247
1418
+ ],
1419
+ [
1420
+ "▁vari",
1421
+ -7.96616597834795
1422
+ ],
1423
+ [
1424
+ "▁animal",
1425
+ -7.984363992669607
1426
+ ],
1427
+ [
1428
+ "▁what",
1429
+ -7.984502416198689
1430
+ ],
1431
+ [
1432
+ "▁europe",
1433
+ -8.006567891721467
1434
+ ],
1435
+ [
1436
+ "▁200",
1437
+ -8.00660766861835
1438
+ ],
1439
+ [
1440
+ "000000",
1441
+ -8.012236019613125
1442
+ ],
1443
+ [
1444
+ "mark",
1445
+ -8.01276901828499
1446
+ ],
1447
+ [
1448
+ "sure",
1449
+ -8.015138086899906
1450
+ ],
1451
+ [
1452
+ "▁adv",
1453
+ -8.023621444584538
1454
+ ],
1455
+ [
1456
+ "▁whil",
1457
+ -8.02780271246084
1458
+ ],
1459
+ [
1460
+ "▁chi",
1461
+ -8.045482944578183
1462
+ ],
1463
+ [
1464
+ "▁buil",
1465
+ -8.052815943522955
1466
+ ],
1467
+ [
1468
+ "▁every",
1469
+ -8.056538487431741
1470
+ ],
1471
+ [
1472
+ "▁va",
1473
+ -8.06777735608203
1474
+ ],
1475
+ [
1476
+ "▁become",
1477
+ -8.076393297026822
1478
+ ],
1479
+ [
1480
+ "▁local",
1481
+ -8.076412720519006
1482
+ ],
1483
+ [
1484
+ "▁stud",
1485
+ -8.07837962249372
1486
+ ],
1487
+ [
1488
+ "▁due",
1489
+ -8.078849866853641
1490
+ ],
1491
+ [
1492
+ "▁gree",
1493
+ -8.082887671546896
1494
+ ],
1495
+ [
1496
+ "▁centur",
1497
+ -8.100942155064343
1498
+ ],
1499
+ [
1500
+ "▁play",
1501
+ -8.101156837441584
1502
+ ],
1503
+ [
1504
+ "cross",
1505
+ -8.102046351520748
1506
+ ],
1507
+ [
1508
+ "▁government",
1509
+ -8.125750652904102
1510
+ ],
1511
+ [
1512
+ "▁system",
1513
+ -8.125751292965452
1514
+ ],
1515
+ [
1516
+ "▁point",
1517
+ -8.126694846256854
1518
+ ],
1519
+ [
1520
+ "▁trans",
1521
+ -8.128116666544686
1522
+ ],
1523
+ [
1524
+ "▁name",
1525
+ -8.13730426679406
1526
+ ],
1527
+ [
1528
+ "▁german",
1529
+ -8.151397109598614
1530
+ ],
1531
+ [
1532
+ "▁culture",
1533
+ -8.151404168932643
1534
+ ],
1535
+ [
1536
+ "▁famil",
1537
+ -8.15169431433106
1538
+ ],
1539
+ [
1540
+ "▁histor",
1541
+ -8.151992571385806
1542
+ ],
1543
+ [
1544
+ "▁help",
1545
+ -8.152379382465845
1546
+ ],
1547
+ [
1548
+ "▁find",
1549
+ -8.15399509736029
1550
+ ],
1551
+ [
1552
+ "▁run",
1553
+ -8.15551819933091
1554
+ ],
1555
+ [
1556
+ "▁medi",
1557
+ -8.175453324820076
1558
+ ],
1559
+ [
1560
+ "▁general",
1561
+ -8.177711046639622
1562
+ ],
1563
+ [
1564
+ "▁usual",
1565
+ -8.178143536295165
1566
+ ],
1567
+ [
1568
+ "▁lead",
1569
+ -8.17819966076631
1570
+ ],
1571
+ [
1572
+ "▁offer",
1573
+ -8.180232661192168
1574
+ ],
1575
+ [
1576
+ "came",
1577
+ -8.183643902583706
1578
+ ],
1579
+ [
1580
+ "qui",
1581
+ -8.192813528327466
1582
+ ],
1583
+ [
1584
+ "imate",
1585
+ -8.204585093979773
1586
+ ],
1587
+ [
1588
+ "▁provide",
1589
+ -8.204769066637121
1590
+ ],
1591
+ [
1592
+ "▁start",
1593
+ -8.206828802738865
1594
+ ],
1595
+ [
1596
+ "▁possibl",
1597
+ -8.23251237309725
1598
+ ],
1599
+ [
1600
+ "▁temple",
1601
+ -8.232512443222566
1602
+ ],
1603
+ [
1604
+ "▁develop",
1605
+ -8.23251253510426
1606
+ ],
1607
+ [
1608
+ "▁before",
1609
+ -8.232542372986714
1610
+ ],
1611
+ [
1612
+ "▁sever",
1613
+ -8.233431940537834
1614
+ ],
1615
+ [
1616
+ "▁game",
1617
+ -8.234672632014709
1618
+ ],
1619
+ [
1620
+ "▁human",
1621
+ -8.261150107977098
1622
+ ],
1623
+ [
1624
+ "▁consider",
1625
+ -8.261160080863537
1626
+ ],
1627
+ [
1628
+ "▁change",
1629
+ -8.261187464522354
1630
+ ],
1631
+ [
1632
+ "▁book",
1633
+ -8.261600372522377
1634
+ ],
1635
+ [
1636
+ "▁1000",
1637
+ -8.289718237006165
1638
+ ],
1639
+ [
1640
+ "▁tourist",
1641
+ -8.29054215262876
1642
+ ],
1643
+ [
1644
+ "▁great",
1645
+ -8.290624738675088
1646
+ ],
1647
+ [
1648
+ "▁region",
1649
+ -8.290771793199044
1650
+ ],
1651
+ [
1652
+ "▁protest",
1653
+ -8.290809821436957
1654
+ ],
1655
+ [
1656
+ "▁limit",
1657
+ -8.290876449378203
1658
+ ],
1659
+ [
1660
+ "▁under",
1661
+ -8.291576583902883
1662
+ ],
1663
+ [
1664
+ "▁driv",
1665
+ -8.291901980259404
1666
+ ],
1667
+ [
1668
+ "▁cru",
1669
+ -8.29526860346621
1670
+ ],
1671
+ [
1672
+ "▁relat",
1673
+ -8.315627957281512
1674
+ ],
1675
+ [
1676
+ "▁language",
1677
+ -8.320798343228995
1678
+ ],
1679
+ [
1680
+ "▁201",
1681
+ -8.320810101005854
1682
+ ],
1683
+ [
1684
+ "▁remain",
1685
+ -8.320835267121115
1686
+ ],
1687
+ [
1688
+ "▁receive",
1689
+ -8.32087919331787
1690
+ ],
1691
+ [
1692
+ "▁team",
1693
+ -8.320905438873691
1694
+ ],
1695
+ [
1696
+ "▁close",
1697
+ -8.321788682484456
1698
+ ],
1699
+ [
1700
+ "▁follow",
1701
+ -8.352087907661254
1702
+ ],
1703
+ [
1704
+ "▁major",
1705
+ -8.352166130913496
1706
+ ],
1707
+ [
1708
+ "▁much",
1709
+ -8.352275564219338
1710
+ ],
1711
+ [
1712
+ "▁wild",
1713
+ -8.352831893439728
1714
+ ],
1715
+ [
1716
+ "ject",
1717
+ -8.352943963542865
1718
+ ],
1719
+ [
1720
+ "▁case",
1721
+ -8.354836760378483
1722
+ ],
1723
+ [
1724
+ "▁official",
1725
+ -8.384306537193822
1726
+ ],
1727
+ [
1728
+ "▁group",
1729
+ -8.384309677325607
1730
+ ],
1731
+ [
1732
+ "▁africa",
1733
+ -8.384310670398985
1734
+ ],
1735
+ [
1736
+ "▁season",
1737
+ -8.38466131549998
1738
+ ],
1739
+ [
1740
+ "▁service",
1741
+ -8.417640029535818
1742
+ ],
1743
+ [
1744
+ "▁believe",
1745
+ -8.417644279258813
1746
+ ],
1747
+ [
1748
+ "▁look",
1749
+ -8.417696460056247
1750
+ ],
1751
+ [
1752
+ "▁camp",
1753
+ -8.4177560776734
1754
+ ],
1755
+ [
1756
+ "▁own",
1757
+ -8.417875270619882
1758
+ ],
1759
+ [
1760
+ "break",
1761
+ -8.417973529893512
1762
+ ],
1763
+ [
1764
+ "▁affect",
1765
+ -8.452123796949778
1766
+ ],
1767
+ [
1768
+ "▁sometime",
1769
+ -8.452127443718524
1770
+ ],
1771
+ [
1772
+ "▁america",
1773
+ -8.452135735357338
1774
+ ],
1775
+ [
1776
+ "▁charge",
1777
+ -8.45219566004275
1778
+ ],
1779
+ [
1780
+ "▁understand",
1781
+ -8.452216747124485
1782
+ ],
1783
+ [
1784
+ "▁type",
1785
+ -8.452232925203674
1786
+ ],
1787
+ [
1788
+ "▁class",
1789
+ -8.452279844492605
1790
+ ],
1791
+ [
1792
+ "▁earth",
1793
+ -8.452354317519077
1794
+ ],
1795
+ [
1796
+ "▁communi",
1797
+ -8.45314714922437
1798
+ ],
1799
+ [
1800
+ "▁locat",
1801
+ -8.458516662039814
1802
+ ],
1803
+ [
1804
+ "▁problem",
1805
+ -8.487840831803812
1806
+ ],
1807
+ [
1808
+ "▁public",
1809
+ -8.52487386512437
1810
+ ],
1811
+ [
1812
+ "▁particular",
1813
+ -8.52487392465013
1814
+ ],
1815
+ [
1816
+ "▁success",
1817
+ -8.52487575044468
1818
+ ],
1819
+ [
1820
+ "▁popular",
1821
+ -8.563335769640975
1822
+ ],
1823
+ [
1824
+ "▁japan",
1825
+ -8.563335928219438
1826
+ ],
1827
+ [
1828
+ "▁transport",
1829
+ -8.56333935408694
1830
+ ],
1831
+ [
1832
+ "▁snow",
1833
+ -8.563350743467298
1834
+ ],
1835
+ [
1836
+ "▁legal",
1837
+ -8.56349422889885
1838
+ ],
1839
+ [
1840
+ "company",
1841
+ -8.567823788313378
1842
+ ],
1843
+ [
1844
+ "▁australia",
1845
+ -8.603335377698876
1846
+ ],
1847
+ [
1848
+ "▁experience",
1849
+ -8.60333543690594
1850
+ ],
1851
+ [
1852
+ "▁effect",
1853
+ -8.603335572481557
1854
+ ],
1855
+ [
1856
+ "▁expect",
1857
+ -8.603336545491683
1858
+ ],
1859
+ [
1860
+ "▁religio",
1861
+ -8.60333719794059
1862
+ ],
1863
+ [
1864
+ "▁airline",
1865
+ -8.603337729717676
1866
+ ],
1867
+ [
1868
+ "▁food",
1869
+ -8.603423880765625
1870
+ ],
1871
+ [
1872
+ "▁process",
1873
+ -8.603434066309912
1874
+ ],
1875
+ [
1876
+ "▁franc",
1877
+ -8.607137440189035
1878
+ ],
1879
+ [
1880
+ "▁according",
1881
+ -8.645002056827511
1882
+ ],
1883
+ [
1884
+ "▁international",
1885
+ -8.645002161206236
1886
+ ],
1887
+ [
1888
+ "▁authorit",
1889
+ -8.645002186529833
1890
+ ],
1891
+ [
1892
+ "▁econom",
1893
+ -8.645003467085711
1894
+ ],
1895
+ [
1896
+ "▁check",
1897
+ -8.645008609181772
1898
+ ],
1899
+ [
1900
+ "mission",
1901
+ -8.652738598036581
1902
+ ],
1903
+ [
1904
+ "▁olympic",
1905
+ -8.688480321150827
1906
+ ],
1907
+ [
1908
+ "▁discover",
1909
+ -8.688481280484586
1910
+ ],
1911
+ [
1912
+ "▁direct",
1913
+ -8.68862007392972
1914
+ ],
1915
+ [
1916
+ "▁arriv",
1917
+ -8.68883024075118
1918
+ ],
1919
+ [
1920
+ "strict",
1921
+ -8.689662128023302
1922
+ ],
1923
+ [
1924
+ "%",
1925
+ -8.733934831176956
1926
+ ],
1927
+ [
1928
+ "▁window",
1929
+ -8.733935012824286
1930
+ ],
1931
+ [
1932
+ "▁support",
1933
+ -8.733935222634948
1934
+ ],
1935
+ [
1936
+ "▁document",
1937
+ -8.733936433415515
1938
+ ],
1939
+ [
1940
+ "▁school",
1941
+ -8.7339383128885
1942
+ ],
1943
+ [
1944
+ "▁univers",
1945
+ -8.733952912966306
1946
+ ],
1947
+ [
1948
+ "▁plann",
1949
+ -8.733959359687297
1950
+ ],
1951
+ [
1952
+ "▁mountain",
1953
+ -8.781553962524354
1954
+ ],
1955
+ [
1956
+ "▁attack",
1957
+ -8.78155483428725
1958
+ ],
1959
+ [
1960
+ "▁addition",
1961
+ -8.781556055460403
1962
+ ],
1963
+ [
1964
+ "▁education",
1965
+ -8.781556487999397
1966
+ ],
1967
+ [
1968
+ "▁crash",
1969
+ -8.781565077101176
1970
+ ],
1971
+ [
1972
+ "abilit",
1973
+ -8.782467496776816
1974
+ ],
1975
+ [
1976
+ "▁similar",
1977
+ -8.831553958948293
1978
+ ],
1979
+ [
1980
+ "▁regular",
1981
+ -8.831555814320183
1982
+ ],
1983
+ [
1984
+ "▁involve",
1985
+ -8.831558534910847
1986
+ ],
1987
+ [
1988
+ "▁infect",
1989
+ -8.831583753401237
1990
+ ],
1991
+ [
1992
+ "▁photo",
1993
+ -8.831590339505494
1994
+ ],
1995
+ [
1996
+ "▁establish",
1997
+ -8.884185458496821
1998
+ ],
1999
+ [
2000
+ "▁independen",
2001
+ -8.884185473898746
2002
+ ],
2003
+ [
2004
+ "▁physical",
2005
+ -8.884185942057826
2006
+ ],
2007
+ [
2008
+ "▁current",
2009
+ -8.884189149406275
2010
+ ],
2011
+ [
2012
+ "00000",
2013
+ -8.92697290848362
2014
+ ],
2015
+ [
2016
+ "▁organization",
2017
+ -8.939741013377946
2018
+ ],
2019
+ [
2020
+ "▁behavior",
2021
+ -8.93974101496351
2022
+ ],
2023
+ [
2024
+ "▁investigat",
2025
+ -8.9397411668559
2026
+ ],
2027
+ [
2028
+ "▁television",
2029
+ -8.998564542859848
2030
+ ],
2031
+ [
2032
+ "▁republic",
2033
+ -8.99856454364473
2034
+ ],
2035
+ [
2036
+ "▁potential",
2037
+ -8.998564656926455
2038
+ ],
2039
+ [
2040
+ "▁example",
2041
+ -8.998564715546378
2042
+ ],
2043
+ [
2044
+ "▁original",
2045
+ -8.99856478139713
2046
+ ],
2047
+ [
2048
+ "▁population",
2049
+ -8.998564898026078
2050
+ ],
2051
+ [
2052
+ "▁happen",
2053
+ -8.998564920034934
2054
+ ],
2055
+ [
2056
+ "▁website",
2057
+ -8.998566502024065
2058
+ ],
2059
+ [
2060
+ "▁individual",
2061
+ -9.061064542741311
2062
+ ],
2063
+ [
2064
+ "▁surviv",
2065
+ -9.061074061068329
2066
+ ],
2067
+ [
2068
+ "▁simpl",
2069
+ -9.06108516492456
2070
+ ],
2071
+ [
2072
+ "▁neighbor",
2073
+ -9.127731209385104
2074
+ ],
2075
+ [
2076
+ "▁observ",
2077
+ -9.12773163177753
2078
+ ],
2079
+ [
2080
+ "▁associat",
2081
+ -9.127733727827064
2082
+ ],
2083
+ [
2084
+ "▁earthquake",
2085
+ -9.199159780805967
2086
+ ],
2087
+ [
2088
+ "$",
2089
+ -9.359416191062518
2090
+ ],
2091
+ [
2092
+ ".",
2093
+ -9.359416191062522
2094
+ ],
2095
+ [
2096
+ "▁network",
2097
+ -9.359416194093198
2098
+ ],
2099
+ [
2100
+ "▁agricultur",
2101
+ -9.359428904038964
2102
+ ],
2103
+ [
2104
+ "▁geograph",
2105
+ -9.45032528230153
2106
+ ],
2107
+ [
2108
+ "▁legislati",
2109
+ -9.450326304956086
2110
+ ],
2111
+ [
2112
+ "]",
2113
+ -10.095960202659114
2114
+ ],
2115
+ [
2116
+ "[",
2117
+ -10.095960202659116
2118
+ ],
2119
+ [
2120
+ "-",
2121
+ -10.295960202619158
2122
+ ],
2123
+ [
2124
+ "¢",
2125
+ -11.37929353599245
2126
+ ],
2127
+ [
2128
+ "​",
2129
+ -11.37929353599245
2130
+ ],
2131
+ [
2132
+ "£",
2133
+ -12.378993535952503
2134
+ ],
2135
+ [
2136
+ "q",
2137
+ -12.379093535952505
2138
+ ],
2139
+ [
2140
+ ")",
2141
+ -12.379193535952504
2142
+ ],
2143
+ [
2144
+ "(",
2145
+ -12.379293535952502
2146
+ ],
2147
+ [
2148
+ "€",
2149
+ -12.379293535952502
2150
+ ]
2151
+ ]
2152
+ }
2153
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "([bos])",
3
+ "clean_up_tokenization_spaces": true,
4
+ "eos_token": "([eos])",
5
+ "mask_token": "([mask])",
6
+ "model_max_length": 1000000000000000019884624838656,
7
+ "pad_token": "([pad])",
8
+ "tokenizer_class": "PreTrainedTokenizerFast",
9
+ "unk_token": "([unk])"
10
+ }