ItsMaxNorm commited on
Commit
5c25055
Β·
verified Β·
1 Parent(s): 9b2a433

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. merges.json +1292 -1
  2. tokenizer_config.json +1 -1
  3. vocab.json +132 -1
merges.json CHANGED
@@ -1 +1,1292 @@
1
- []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "token": "οΏ½",
4
+ "token_bytes": [
5
+ 226,
6
+ 153
7
+ ],
8
+ "rank": 256
9
+ },
10
+ {
11
+ "token": ".οΏ½",
12
+ "token_bytes": [
13
+ 46,
14
+ 226,
15
+ 153
16
+ ],
17
+ "rank": 257
18
+ },
19
+ {
20
+ "token": "..",
21
+ "token_bytes": [
22
+ 46,
23
+ 46
24
+ ],
25
+ "rank": 258
26
+ },
27
+ {
28
+ "token": " b",
29
+ "token_bytes": [
30
+ 32,
31
+ 98
32
+ ],
33
+ "rank": 259
34
+ },
35
+ {
36
+ "token": " w",
37
+ "token_bytes": [
38
+ 32,
39
+ 119
40
+ ],
41
+ "rank": 260
42
+ },
43
+ {
44
+ "token": ".x",
45
+ "token_bytes": [
46
+ 46,
47
+ 120
48
+ ],
49
+ "rank": 261
50
+ },
51
+ {
52
+ "token": ".β™Ÿ",
53
+ "token_bytes": [
54
+ 46,
55
+ 226,
56
+ 153,
57
+ 159
58
+ ],
59
+ "rank": 262
60
+ },
61
+ {
62
+ "token": ".β™™",
63
+ "token_bytes": [
64
+ 46,
65
+ 226,
66
+ 153,
67
+ 153
68
+ ],
69
+ "rank": 263
70
+ },
71
+ {
72
+ "token": "β™Ÿ",
73
+ "token_bytes": [
74
+ 226,
75
+ 153,
76
+ 159
77
+ ],
78
+ "rank": 264
79
+ },
80
+ {
81
+ "token": "β™™",
82
+ "token_bytes": [
83
+ 226,
84
+ 153,
85
+ 153
86
+ ],
87
+ "rank": 265
88
+ },
89
+ {
90
+ "token": "β™–",
91
+ "token_bytes": [
92
+ 226,
93
+ 153,
94
+ 150
95
+ ],
96
+ "rank": 266
97
+ },
98
+ {
99
+ "token": "β™œ",
100
+ "token_bytes": [
101
+ 226,
102
+ 153,
103
+ 156
104
+ ],
105
+ "rank": 267
106
+ },
107
+ {
108
+ "token": "β™˜",
109
+ "token_bytes": [
110
+ 226,
111
+ 153,
112
+ 152
113
+ ],
114
+ "rank": 268
115
+ },
116
+ {
117
+ "token": ".β™˜",
118
+ "token_bytes": [
119
+ 46,
120
+ 226,
121
+ 153,
122
+ 152
123
+ ],
124
+ "rank": 269
125
+ },
126
+ {
127
+ "token": "β™ž",
128
+ "token_bytes": [
129
+ 226,
130
+ 153,
131
+ 158
132
+ ],
133
+ "rank": 270
134
+ },
135
+ {
136
+ "token": ".β™ž",
137
+ "token_bytes": [
138
+ 46,
139
+ 226,
140
+ 153,
141
+ 158
142
+ ],
143
+ "rank": 271
144
+ },
145
+ {
146
+ "token": ".β™–",
147
+ "token_bytes": [
148
+ 46,
149
+ 226,
150
+ 153,
151
+ 150
152
+ ],
153
+ "rank": 272
154
+ },
155
+ {
156
+ "token": ".β™œ",
157
+ "token_bytes": [
158
+ 46,
159
+ 226,
160
+ 153,
161
+ 156
162
+ ],
163
+ "rank": 273
164
+ },
165
+ {
166
+ "token": "β™—",
167
+ "token_bytes": [
168
+ 226,
169
+ 153,
170
+ 151
171
+ ],
172
+ "rank": 274
173
+ },
174
+ {
175
+ "token": ".β™—",
176
+ "token_bytes": [
177
+ 46,
178
+ 226,
179
+ 153,
180
+ 151
181
+ ],
182
+ "rank": 275
183
+ },
184
+ {
185
+ "token": "♝",
186
+ "token_bytes": [
187
+ 226,
188
+ 153,
189
+ 157
190
+ ],
191
+ "rank": 276
192
+ },
193
+ {
194
+ "token": ".♝",
195
+ "token_bytes": [
196
+ 46,
197
+ 226,
198
+ 153,
199
+ 157
200
+ ],
201
+ "rank": 277
202
+ },
203
+ {
204
+ "token": "β™š",
205
+ "token_bytes": [
206
+ 226,
207
+ 153,
208
+ 154
209
+ ],
210
+ "rank": 278
211
+ },
212
+ {
213
+ "token": ".β™š",
214
+ "token_bytes": [
215
+ 46,
216
+ 226,
217
+ 153,
218
+ 154
219
+ ],
220
+ "rank": 279
221
+ },
222
+ {
223
+ "token": "β™”",
224
+ "token_bytes": [
225
+ 226,
226
+ 153,
227
+ 148
228
+ ],
229
+ "rank": 280
230
+ },
231
+ {
232
+ "token": ".β™”",
233
+ "token_bytes": [
234
+ 46,
235
+ 226,
236
+ 153,
237
+ 148
238
+ ],
239
+ "rank": 281
240
+ },
241
+ {
242
+ "token": "β™•",
243
+ "token_bytes": [
244
+ 226,
245
+ 153,
246
+ 149
247
+ ],
248
+ "rank": 282
249
+ },
250
+ {
251
+ "token": ".β™•",
252
+ "token_bytes": [
253
+ 46,
254
+ 226,
255
+ 153,
256
+ 149
257
+ ],
258
+ "rank": 283
259
+ },
260
+ {
261
+ "token": "β™›",
262
+ "token_bytes": [
263
+ 226,
264
+ 153,
265
+ 155
266
+ ],
267
+ "rank": 284
268
+ },
269
+ {
270
+ "token": ".β™›",
271
+ "token_bytes": [
272
+ 46,
273
+ 226,
274
+ 153,
275
+ 155
276
+ ],
277
+ "rank": 285
278
+ },
279
+ {
280
+ "token": "..+",
281
+ "token_bytes": [
282
+ 46,
283
+ 46,
284
+ 43
285
+ ],
286
+ "rank": 286
287
+ },
288
+ {
289
+ "token": "β™Ÿd",
290
+ "token_bytes": [
291
+ 226,
292
+ 153,
293
+ 159,
294
+ 100
295
+ ],
296
+ "rank": 287
297
+ },
298
+ {
299
+ "token": "β™™d",
300
+ "token_bytes": [
301
+ 226,
302
+ 153,
303
+ 153,
304
+ 100
305
+ ],
306
+ "rank": 288
307
+ },
308
+ {
309
+ "token": "β™™e",
310
+ "token_bytes": [
311
+ 226,
312
+ 153,
313
+ 153,
314
+ 101
315
+ ],
316
+ "rank": 289
317
+ },
318
+ {
319
+ "token": "β™Ÿe",
320
+ "token_bytes": [
321
+ 226,
322
+ 153,
323
+ 159,
324
+ 101
325
+ ],
326
+ "rank": 290
327
+ },
328
+ {
329
+ "token": "β™šg",
330
+ "token_bytes": [
331
+ 226,
332
+ 153,
333
+ 154,
334
+ 103
335
+ ],
336
+ "rank": 291
337
+ },
338
+ {
339
+ "token": "β™”g",
340
+ "token_bytes": [
341
+ 226,
342
+ 153,
343
+ 148,
344
+ 103
345
+ ],
346
+ "rank": 292
347
+ },
348
+ {
349
+ "token": "β™žf",
350
+ "token_bytes": [
351
+ 226,
352
+ 153,
353
+ 158,
354
+ 102
355
+ ],
356
+ "rank": 293
357
+ },
358
+ {
359
+ "token": "β™˜f",
360
+ "token_bytes": [
361
+ 226,
362
+ 153,
363
+ 152,
364
+ 102
365
+ ],
366
+ "rank": 294
367
+ },
368
+ {
369
+ "token": "β™Ÿc",
370
+ "token_bytes": [
371
+ 226,
372
+ 153,
373
+ 159,
374
+ 99
375
+ ],
376
+ "rank": 295
377
+ },
378
+ {
379
+ "token": "β™–d",
380
+ "token_bytes": [
381
+ 226,
382
+ 153,
383
+ 150,
384
+ 100
385
+ ],
386
+ "rank": 296
387
+ },
388
+ {
389
+ "token": "β™–f",
390
+ "token_bytes": [
391
+ 226,
392
+ 153,
393
+ 150,
394
+ 102
395
+ ],
396
+ "rank": 297
397
+ },
398
+ {
399
+ "token": "β™œf",
400
+ "token_bytes": [
401
+ 226,
402
+ 153,
403
+ 156,
404
+ 102
405
+ ],
406
+ "rank": 298
407
+ },
408
+ {
409
+ "token": "β™žd",
410
+ "token_bytes": [
411
+ 226,
412
+ 153,
413
+ 158,
414
+ 100
415
+ ],
416
+ "rank": 299
417
+ },
418
+ {
419
+ "token": "β™™c",
420
+ "token_bytes": [
421
+ 226,
422
+ 153,
423
+ 153,
424
+ 99
425
+ ],
426
+ "rank": 300
427
+ },
428
+ {
429
+ "token": "β™˜d",
430
+ "token_bytes": [
431
+ 226,
432
+ 153,
433
+ 152,
434
+ 100
435
+ ],
436
+ "rank": 301
437
+ },
438
+ {
439
+ "token": "β™˜c",
440
+ "token_bytes": [
441
+ 226,
442
+ 153,
443
+ 152,
444
+ 99
445
+ ],
446
+ "rank": 302
447
+ },
448
+ {
449
+ "token": "β™œd",
450
+ "token_bytes": [
451
+ 226,
452
+ 153,
453
+ 156,
454
+ 100
455
+ ],
456
+ "rank": 303
457
+ },
458
+ {
459
+ "token": ".+",
460
+ "token_bytes": [
461
+ 46,
462
+ 43
463
+ ],
464
+ "rank": 304
465
+ },
466
+ {
467
+ "token": "β™žc",
468
+ "token_bytes": [
469
+ 226,
470
+ 153,
471
+ 158,
472
+ 99
473
+ ],
474
+ "rank": 305
475
+ },
476
+ {
477
+ "token": "β™–e",
478
+ "token_bytes": [
479
+ 226,
480
+ 153,
481
+ 150,
482
+ 101
483
+ ],
484
+ "rank": 306
485
+ },
486
+ {
487
+ "token": "β™˜e",
488
+ "token_bytes": [
489
+ 226,
490
+ 153,
491
+ 152,
492
+ 101
493
+ ],
494
+ "rank": 307
495
+ },
496
+ {
497
+ "token": "β™—e",
498
+ "token_bytes": [
499
+ 226,
500
+ 153,
501
+ 151,
502
+ 101
503
+ ],
504
+ "rank": 308
505
+ },
506
+ {
507
+ "token": "β™že",
508
+ "token_bytes": [
509
+ 226,
510
+ 153,
511
+ 158,
512
+ 101
513
+ ],
514
+ "rank": 309
515
+ },
516
+ {
517
+ "token": "♝e",
518
+ "token_bytes": [
519
+ 226,
520
+ 153,
521
+ 157,
522
+ 101
523
+ ],
524
+ "rank": 310
525
+ },
526
+ {
527
+ "token": "β™™g",
528
+ "token_bytes": [
529
+ 226,
530
+ 153,
531
+ 153,
532
+ 103
533
+ ],
534
+ "rank": 311
535
+ },
536
+ {
537
+ "token": "β™™f",
538
+ "token_bytes": [
539
+ 226,
540
+ 153,
541
+ 153,
542
+ 102
543
+ ],
544
+ "rank": 312
545
+ },
546
+ {
547
+ "token": "β™œc",
548
+ "token_bytes": [
549
+ 226,
550
+ 153,
551
+ 156,
552
+ 99
553
+ ],
554
+ "rank": 313
555
+ },
556
+ {
557
+ "token": "β™Ÿg",
558
+ "token_bytes": [
559
+ 226,
560
+ 153,
561
+ 159,
562
+ 103
563
+ ],
564
+ "rank": 314
565
+ },
566
+ {
567
+ "token": "β™Ÿb",
568
+ "token_bytes": [
569
+ 226,
570
+ 153,
571
+ 159,
572
+ 98
573
+ ],
574
+ "rank": 315
575
+ },
576
+ {
577
+ "token": "β™œe",
578
+ "token_bytes": [
579
+ 226,
580
+ 153,
581
+ 156,
582
+ 101
583
+ ],
584
+ "rank": 316
585
+ },
586
+ {
587
+ "token": "β™œh",
588
+ "token_bytes": [
589
+ 226,
590
+ 153,
591
+ 156,
592
+ 104
593
+ ],
594
+ "rank": 317
595
+ },
596
+ {
597
+ "token": "β™–h",
598
+ "token_bytes": [
599
+ 226,
600
+ 153,
601
+ 150,
602
+ 104
603
+ ],
604
+ "rank": 318
605
+ },
606
+ {
607
+ "token": "β™—d",
608
+ "token_bytes": [
609
+ 226,
610
+ 153,
611
+ 151,
612
+ 100
613
+ ],
614
+ "rank": 319
615
+ },
616
+ {
617
+ "token": "β™–c",
618
+ "token_bytes": [
619
+ 226,
620
+ 153,
621
+ 150,
622
+ 99
623
+ ],
624
+ "rank": 320
625
+ },
626
+ {
627
+ "token": "β™™b",
628
+ "token_bytes": [
629
+ 226,
630
+ 153,
631
+ 153,
632
+ 98
633
+ ],
634
+ "rank": 321
635
+ },
636
+ {
637
+ "token": "β™Ÿf",
638
+ "token_bytes": [
639
+ 226,
640
+ 153,
641
+ 159,
642
+ 102
643
+ ],
644
+ "rank": 322
645
+ },
646
+ {
647
+ "token": "♝d",
648
+ "token_bytes": [
649
+ 226,
650
+ 153,
651
+ 157,
652
+ 100
653
+ ],
654
+ "rank": 323
655
+ },
656
+ {
657
+ "token": "β™™h",
658
+ "token_bytes": [
659
+ 226,
660
+ 153,
661
+ 153,
662
+ 104
663
+ ],
664
+ "rank": 324
665
+ },
666
+ {
667
+ "token": "β™Ÿa",
668
+ "token_bytes": [
669
+ 226,
670
+ 153,
671
+ 159,
672
+ 97
673
+ ],
674
+ "rank": 325
675
+ },
676
+ {
677
+ "token": "β™•d",
678
+ "token_bytes": [
679
+ 226,
680
+ 153,
681
+ 149,
682
+ 100
683
+ ],
684
+ "rank": 326
685
+ },
686
+ {
687
+ "token": "β™šf",
688
+ "token_bytes": [
689
+ 226,
690
+ 153,
691
+ 154,
692
+ 102
693
+ ],
694
+ "rank": 327
695
+ },
696
+ {
697
+ "token": "β™™a",
698
+ "token_bytes": [
699
+ 226,
700
+ 153,
701
+ 153,
702
+ 97
703
+ ],
704
+ "rank": 328
705
+ },
706
+ {
707
+ "token": "β™Ÿh",
708
+ "token_bytes": [
709
+ 226,
710
+ 153,
711
+ 159,
712
+ 104
713
+ ],
714
+ "rank": 329
715
+ },
716
+ {
717
+ "token": "β™”f",
718
+ "token_bytes": [
719
+ 226,
720
+ 153,
721
+ 148,
722
+ 102
723
+ ],
724
+ "rank": 330
725
+ },
726
+ {
727
+ "token": "β™—f",
728
+ "token_bytes": [
729
+ 226,
730
+ 153,
731
+ 151,
732
+ 102
733
+ ],
734
+ "rank": 331
735
+ },
736
+ {
737
+ "token": "β™•e",
738
+ "token_bytes": [
739
+ 226,
740
+ 153,
741
+ 149,
742
+ 101
743
+ ],
744
+ "rank": 332
745
+ },
746
+ {
747
+ "token": "β™—g",
748
+ "token_bytes": [
749
+ 226,
750
+ 153,
751
+ 151,
752
+ 103
753
+ ],
754
+ "rank": 333
755
+ },
756
+ {
757
+ "token": "β™›d",
758
+ "token_bytes": [
759
+ 226,
760
+ 153,
761
+ 155,
762
+ 100
763
+ ],
764
+ "rank": 334
765
+ },
766
+ {
767
+ "token": "♝g",
768
+ "token_bytes": [
769
+ 226,
770
+ 153,
771
+ 157,
772
+ 103
773
+ ],
774
+ "rank": 335
775
+ },
776
+ {
777
+ "token": "♝f",
778
+ "token_bytes": [
779
+ 226,
780
+ 153,
781
+ 157,
782
+ 102
783
+ ],
784
+ "rank": 336
785
+ },
786
+ {
787
+ "token": "β™›c",
788
+ "token_bytes": [
789
+ 226,
790
+ 153,
791
+ 155,
792
+ 99
793
+ ],
794
+ "rank": 337
795
+ },
796
+ {
797
+ "token": "β™œb",
798
+ "token_bytes": [
799
+ 226,
800
+ 153,
801
+ 156,
802
+ 98
803
+ ],
804
+ "rank": 338
805
+ },
806
+ {
807
+ "token": "β™—c",
808
+ "token_bytes": [
809
+ 226,
810
+ 153,
811
+ 151,
812
+ 99
813
+ ],
814
+ "rank": 339
815
+ },
816
+ {
817
+ "token": "β™–a",
818
+ "token_bytes": [
819
+ 226,
820
+ 153,
821
+ 150,
822
+ 97
823
+ ],
824
+ "rank": 340
825
+ },
826
+ {
827
+ "token": "β™œa",
828
+ "token_bytes": [
829
+ 226,
830
+ 153,
831
+ 156,
832
+ 97
833
+ ],
834
+ "rank": 341
835
+ },
836
+ {
837
+ "token": "β™›e",
838
+ "token_bytes": [
839
+ 226,
840
+ 153,
841
+ 155,
842
+ 101
843
+ ],
844
+ "rank": 342
845
+ },
846
+ {
847
+ "token": "β™–b",
848
+ "token_bytes": [
849
+ 226,
850
+ 153,
851
+ 150,
852
+ 98
853
+ ],
854
+ "rank": 343
855
+ },
856
+ {
857
+ "token": "♝c",
858
+ "token_bytes": [
859
+ 226,
860
+ 153,
861
+ 157,
862
+ 99
863
+ ],
864
+ "rank": 344
865
+ },
866
+ {
867
+ "token": "β™še",
868
+ "token_bytes": [
869
+ 226,
870
+ 153,
871
+ 154,
872
+ 101
873
+ ],
874
+ "rank": 345
875
+ },
876
+ {
877
+ "token": "♝b",
878
+ "token_bytes": [
879
+ 226,
880
+ 153,
881
+ 157,
882
+ 98
883
+ ],
884
+ "rank": 346
885
+ },
886
+ {
887
+ "token": "β™•c",
888
+ "token_bytes": [
889
+ 226,
890
+ 153,
891
+ 149,
892
+ 99
893
+ ],
894
+ "rank": 347
895
+ },
896
+ {
897
+ "token": "β™”e",
898
+ "token_bytes": [
899
+ 226,
900
+ 153,
901
+ 148,
902
+ 101
903
+ ],
904
+ "rank": 348
905
+ },
906
+ {
907
+ "token": "β™—b",
908
+ "token_bytes": [
909
+ 226,
910
+ 153,
911
+ 151,
912
+ 98
913
+ ],
914
+ "rank": 349
915
+ },
916
+ {
917
+ "token": "β™•f",
918
+ "token_bytes": [
919
+ 226,
920
+ 153,
921
+ 149,
922
+ 102
923
+ ],
924
+ "rank": 350
925
+ },
926
+ {
927
+ "token": "β™›b",
928
+ "token_bytes": [
929
+ 226,
930
+ 153,
931
+ 155,
932
+ 98
933
+ ],
934
+ "rank": 351
935
+ },
936
+ {
937
+ "token": "β™šh",
938
+ "token_bytes": [
939
+ 226,
940
+ 153,
941
+ 154,
942
+ 104
943
+ ],
944
+ "rank": 352
945
+ },
946
+ {
947
+ "token": "β™”h",
948
+ "token_bytes": [
949
+ 226,
950
+ 153,
951
+ 148,
952
+ 104
953
+ ],
954
+ "rank": 353
955
+ },
956
+ {
957
+ "token": "β™šd",
958
+ "token_bytes": [
959
+ 226,
960
+ 153,
961
+ 154,
962
+ 100
963
+ ],
964
+ "rank": 354
965
+ },
966
+ {
967
+ "token": "β™›f",
968
+ "token_bytes": [
969
+ 226,
970
+ 153,
971
+ 155,
972
+ 102
973
+ ],
974
+ "rank": 355
975
+ },
976
+ {
977
+ "token": "β™”d",
978
+ "token_bytes": [
979
+ 226,
980
+ 153,
981
+ 148,
982
+ 100
983
+ ],
984
+ "rank": 356
985
+ },
986
+ {
987
+ "token": "β™”c",
988
+ "token_bytes": [
989
+ 226,
990
+ 153,
991
+ 148,
992
+ 99
993
+ ],
994
+ "rank": 357
995
+ },
996
+ {
997
+ "token": "β™•b",
998
+ "token_bytes": [
999
+ 226,
1000
+ 153,
1001
+ 149,
1002
+ 98
1003
+ ],
1004
+ "rank": 358
1005
+ },
1006
+ {
1007
+ "token": "β™œg",
1008
+ "token_bytes": [
1009
+ 226,
1010
+ 153,
1011
+ 156,
1012
+ 103
1013
+ ],
1014
+ "rank": 359
1015
+ },
1016
+ {
1017
+ "token": "β™–g",
1018
+ "token_bytes": [
1019
+ 226,
1020
+ 153,
1021
+ 150,
1022
+ 103
1023
+ ],
1024
+ "rank": 360
1025
+ },
1026
+ {
1027
+ "token": "β™šc",
1028
+ "token_bytes": [
1029
+ 226,
1030
+ 153,
1031
+ 154,
1032
+ 99
1033
+ ],
1034
+ "rank": 361
1035
+ },
1036
+ {
1037
+ "token": "β™žb",
1038
+ "token_bytes": [
1039
+ 226,
1040
+ 153,
1041
+ 158,
1042
+ 98
1043
+ ],
1044
+ "rank": 362
1045
+ },
1046
+ {
1047
+ "token": "β™˜g",
1048
+ "token_bytes": [
1049
+ 226,
1050
+ 153,
1051
+ 152,
1052
+ 103
1053
+ ],
1054
+ "rank": 363
1055
+ },
1056
+ {
1057
+ "token": "β™•g",
1058
+ "token_bytes": [
1059
+ 226,
1060
+ 153,
1061
+ 149,
1062
+ 103
1063
+ ],
1064
+ "rank": 364
1065
+ },
1066
+ {
1067
+ "token": "β™˜b",
1068
+ "token_bytes": [
1069
+ 226,
1070
+ 153,
1071
+ 152,
1072
+ 98
1073
+ ],
1074
+ "rank": 365
1075
+ },
1076
+ {
1077
+ "token": "β™žg",
1078
+ "token_bytes": [
1079
+ 226,
1080
+ 153,
1081
+ 158,
1082
+ 103
1083
+ ],
1084
+ "rank": 366
1085
+ },
1086
+ {
1087
+ "token": "β™›a",
1088
+ "token_bytes": [
1089
+ 226,
1090
+ 153,
1091
+ 155,
1092
+ 97
1093
+ ],
1094
+ "rank": 367
1095
+ },
1096
+ {
1097
+ "token": "β™›g",
1098
+ "token_bytes": [
1099
+ 226,
1100
+ 153,
1101
+ 155,
1102
+ 103
1103
+ ],
1104
+ "rank": 368
1105
+ },
1106
+ {
1107
+ "token": "β™•h",
1108
+ "token_bytes": [
1109
+ 226,
1110
+ 153,
1111
+ 149,
1112
+ 104
1113
+ ],
1114
+ "rank": 369
1115
+ },
1116
+ {
1117
+ "token": "β™”b",
1118
+ "token_bytes": [
1119
+ 226,
1120
+ 153,
1121
+ 148,
1122
+ 98
1123
+ ],
1124
+ "rank": 370
1125
+ },
1126
+ {
1127
+ "token": "β™—h",
1128
+ "token_bytes": [
1129
+ 226,
1130
+ 153,
1131
+ 151,
1132
+ 104
1133
+ ],
1134
+ "rank": 371
1135
+ },
1136
+ {
1137
+ "token": "β™•a",
1138
+ "token_bytes": [
1139
+ 226,
1140
+ 153,
1141
+ 149,
1142
+ 97
1143
+ ],
1144
+ "rank": 372
1145
+ },
1146
+ {
1147
+ "token": "β™›h",
1148
+ "token_bytes": [
1149
+ 226,
1150
+ 153,
1151
+ 155,
1152
+ 104
1153
+ ],
1154
+ "rank": 373
1155
+ },
1156
+ {
1157
+ "token": "♝a",
1158
+ "token_bytes": [
1159
+ 226,
1160
+ 153,
1161
+ 157,
1162
+ 97
1163
+ ],
1164
+ "rank": 374
1165
+ },
1166
+ {
1167
+ "token": "β™ža",
1168
+ "token_bytes": [
1169
+ 226,
1170
+ 153,
1171
+ 158,
1172
+ 97
1173
+ ],
1174
+ "rank": 375
1175
+ },
1176
+ {
1177
+ "token": "β™šb",
1178
+ "token_bytes": [
1179
+ 226,
1180
+ 153,
1181
+ 154,
1182
+ 98
1183
+ ],
1184
+ "rank": 376
1185
+ },
1186
+ {
1187
+ "token": "β™—a",
1188
+ "token_bytes": [
1189
+ 226,
1190
+ 153,
1191
+ 151,
1192
+ 97
1193
+ ],
1194
+ "rank": 377
1195
+ },
1196
+ {
1197
+ "token": "♝h",
1198
+ "token_bytes": [
1199
+ 226,
1200
+ 153,
1201
+ 157,
1202
+ 104
1203
+ ],
1204
+ "rank": 378
1205
+ },
1206
+ {
1207
+ "token": "β™žh",
1208
+ "token_bytes": [
1209
+ 226,
1210
+ 153,
1211
+ 158,
1212
+ 104
1213
+ ],
1214
+ "rank": 379
1215
+ },
1216
+ {
1217
+ "token": "β™˜a",
1218
+ "token_bytes": [
1219
+ 226,
1220
+ 153,
1221
+ 152,
1222
+ 97
1223
+ ],
1224
+ "rank": 380
1225
+ },
1226
+ {
1227
+ "token": "β™˜h",
1228
+ "token_bytes": [
1229
+ 226,
1230
+ 153,
1231
+ 152,
1232
+ 104
1233
+ ],
1234
+ "rank": 381
1235
+ },
1236
+ {
1237
+ "token": "β™”a",
1238
+ "token_bytes": [
1239
+ 226,
1240
+ 153,
1241
+ 148,
1242
+ 97
1243
+ ],
1244
+ "rank": 382
1245
+ },
1246
+ {
1247
+ "token": "β™ša",
1248
+ "token_bytes": [
1249
+ 226,
1250
+ 153,
1251
+ 154,
1252
+ 97
1253
+ ],
1254
+ "rank": 383
1255
+ },
1256
+ {
1257
+ "token": "..+#",
1258
+ "token_bytes": [
1259
+ 46,
1260
+ 46,
1261
+ 43,
1262
+ 35
1263
+ ],
1264
+ "rank": 384
1265
+ },
1266
+ {
1267
+ "token": "*.",
1268
+ "token_bytes": [
1269
+ 42,
1270
+ 46
1271
+ ],
1272
+ "rank": 385
1273
+ },
1274
+ {
1275
+ "token": ".+#",
1276
+ "token_bytes": [
1277
+ 46,
1278
+ 43,
1279
+ 35
1280
+ ],
1281
+ "rank": 386
1282
+ },
1283
+ {
1284
+ "token": "*.+",
1285
+ "token_bytes": [
1286
+ 42,
1287
+ 46,
1288
+ 43
1289
+ ],
1290
+ "rank": 387
1291
+ }
1292
+ ]
tokenizer_config.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "tokenizer_type": "BPE",
3
- "vocab_size": 256,
4
  "pattern": "'(?i:[sdmt]|ll|ve|re)|[^\\r\\n\\p{L}\\p{N}]?+\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]++[\\r\\n]*|\\s*[\\r\\n]|\\s+(?!\\S)|\\s+",
5
  "special_tokens": {},
6
  "training_config": {
 
1
  {
2
  "tokenizer_type": "BPE",
3
+ "vocab_size": 388,
4
  "pattern": "'(?i:[sdmt]|ll|ve|re)|[^\\r\\n\\p{L}\\p{N}]?+\\p{L}+|\\p{N}{1,3}| ?[^\\s\\p{L}\\p{N}]++[\\r\\n]*|\\s*[\\r\\n]|\\s+(?!\\S)|\\s+",
5
  "special_tokens": {},
6
  "training_config": {
vocab.json CHANGED
@@ -127,5 +127,136 @@
127
  "}": 125,
128
  "~": 126,
129
  "": 127,
130
- "οΏ½": 255
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
131
  }
 
127
  "}": 125,
128
  "~": 126,
129
  "": 127,
130
+ "οΏ½": 256,
131
+ ".οΏ½": 257,
132
+ "..": 258,
133
+ " b": 259,
134
+ " w": 260,
135
+ ".x": 261,
136
+ ".β™Ÿ": 262,
137
+ ".β™™": 263,
138
+ "β™Ÿ": 264,
139
+ "β™™": 265,
140
+ "β™–": 266,
141
+ "β™œ": 267,
142
+ "β™˜": 268,
143
+ ".β™˜": 269,
144
+ "β™ž": 270,
145
+ ".β™ž": 271,
146
+ ".β™–": 272,
147
+ ".β™œ": 273,
148
+ "β™—": 274,
149
+ ".β™—": 275,
150
+ "♝": 276,
151
+ ".♝": 277,
152
+ "β™š": 278,
153
+ ".β™š": 279,
154
+ "β™”": 280,
155
+ ".β™”": 281,
156
+ "β™•": 282,
157
+ ".β™•": 283,
158
+ "β™›": 284,
159
+ ".β™›": 285,
160
+ "..+": 286,
161
+ "β™Ÿd": 287,
162
+ "β™™d": 288,
163
+ "β™™e": 289,
164
+ "β™Ÿe": 290,
165
+ "β™šg": 291,
166
+ "β™”g": 292,
167
+ "β™žf": 293,
168
+ "β™˜f": 294,
169
+ "β™Ÿc": 295,
170
+ "β™–d": 296,
171
+ "β™–f": 297,
172
+ "β™œf": 298,
173
+ "β™žd": 299,
174
+ "β™™c": 300,
175
+ "β™˜d": 301,
176
+ "β™˜c": 302,
177
+ "β™œd": 303,
178
+ ".+": 304,
179
+ "β™žc": 305,
180
+ "β™–e": 306,
181
+ "β™˜e": 307,
182
+ "β™—e": 308,
183
+ "β™že": 309,
184
+ "♝e": 310,
185
+ "β™™g": 311,
186
+ "β™™f": 312,
187
+ "β™œc": 313,
188
+ "β™Ÿg": 314,
189
+ "β™Ÿb": 315,
190
+ "β™œe": 316,
191
+ "β™œh": 317,
192
+ "β™–h": 318,
193
+ "β™—d": 319,
194
+ "β™–c": 320,
195
+ "β™™b": 321,
196
+ "β™Ÿf": 322,
197
+ "♝d": 323,
198
+ "β™™h": 324,
199
+ "β™Ÿa": 325,
200
+ "β™•d": 326,
201
+ "β™šf": 327,
202
+ "β™™a": 328,
203
+ "β™Ÿh": 329,
204
+ "β™”f": 330,
205
+ "β™—f": 331,
206
+ "β™•e": 332,
207
+ "β™—g": 333,
208
+ "β™›d": 334,
209
+ "♝g": 335,
210
+ "♝f": 336,
211
+ "β™›c": 337,
212
+ "β™œb": 338,
213
+ "β™—c": 339,
214
+ "β™–a": 340,
215
+ "β™œa": 341,
216
+ "β™›e": 342,
217
+ "β™–b": 343,
218
+ "♝c": 344,
219
+ "β™še": 345,
220
+ "♝b": 346,
221
+ "β™•c": 347,
222
+ "β™”e": 348,
223
+ "β™—b": 349,
224
+ "β™•f": 350,
225
+ "β™›b": 351,
226
+ "β™šh": 352,
227
+ "β™”h": 353,
228
+ "β™šd": 354,
229
+ "β™›f": 355,
230
+ "β™”d": 356,
231
+ "β™”c": 357,
232
+ "β™•b": 358,
233
+ "β™œg": 359,
234
+ "β™–g": 360,
235
+ "β™šc": 361,
236
+ "β™žb": 362,
237
+ "β™˜g": 363,
238
+ "β™•g": 364,
239
+ "β™˜b": 365,
240
+ "β™žg": 366,
241
+ "β™›a": 367,
242
+ "β™›g": 368,
243
+ "β™•h": 369,
244
+ "β™”b": 370,
245
+ "β™—h": 371,
246
+ "β™•a": 372,
247
+ "β™›h": 373,
248
+ "♝a": 374,
249
+ "β™ža": 375,
250
+ "β™šb": 376,
251
+ "β™—a": 377,
252
+ "♝h": 378,
253
+ "β™žh": 379,
254
+ "β™˜a": 380,
255
+ "β™˜h": 381,
256
+ "β™”a": 382,
257
+ "β™ša": 383,
258
+ "..+#": 384,
259
+ "*.": 385,
260
+ ".+#": 386,
261
+ "*.+": 387
262
  }