Endezyar commited on
Commit
cabaf76
·
verified ·
1 Parent(s): 0146566

Upload tokenizer

Browse files
Files changed (4) hide show
  1. README.md +199 -0
  2. special_tokens_map.json +1 -0
  3. tokenizer.json +956 -0
  4. tokenizer_config.json +16 -0
README.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags: []
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+ This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {}
tokenizer.json ADDED
@@ -0,0 +1,956 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
5
+ "added_tokens": [
6
+ {
7
+ "id": 0,
8
+ "content": "[UNK]",
9
+ "single_word": false,
10
+ "lstrip": false,
11
+ "rstrip": false,
12
+ "normalized": false,
13
+ "special": true
14
+ }
15
+ ],
16
+ "normalizer": null,
17
+ "pre_tokenizer": {
18
+ "type": "Whitespace"
19
+ },
20
+ "post_processor": null,
21
+ "decoder": null,
22
+ "model": {
23
+ "type": "BPE",
24
+ "dropout": null,
25
+ "unk_token": null,
26
+ "continuing_subword_prefix": null,
27
+ "end_of_word_suffix": null,
28
+ "fuse_unk": false,
29
+ "byte_fallback": false,
30
+ "ignore_merges": false,
31
+ "vocab": {
32
+ "[UNK]": 0,
33
+ "\"": 1,
34
+ "'": 2,
35
+ "(": 3,
36
+ ")": 4,
37
+ ",": 5,
38
+ "-": 6,
39
+ ".": 7,
40
+ "/": 8,
41
+ "0": 9,
42
+ "1": 10,
43
+ "2": 11,
44
+ "3": 12,
45
+ "4": 13,
46
+ "5": 14,
47
+ "6": 15,
48
+ "7": 16,
49
+ "8": 17,
50
+ "9": 18,
51
+ ":": 19,
52
+ ";": 20,
53
+ "A": 21,
54
+ "B": 22,
55
+ "C": 23,
56
+ "D": 24,
57
+ "E": 25,
58
+ "F": 26,
59
+ "G": 27,
60
+ "H": 28,
61
+ "I": 29,
62
+ "J": 30,
63
+ "K": 31,
64
+ "L": 32,
65
+ "M": 33,
66
+ "N": 34,
67
+ "O": 35,
68
+ "P": 36,
69
+ "Q": 37,
70
+ "R": 38,
71
+ "S": 39,
72
+ "T": 40,
73
+ "U": 41,
74
+ "V": 42,
75
+ "X": 43,
76
+ "Y": 44,
77
+ "Z": 45,
78
+ "[": 46,
79
+ "]": 47,
80
+ "a": 48,
81
+ "b": 49,
82
+ "c": 50,
83
+ "d": 51,
84
+ "e": 52,
85
+ "f": 53,
86
+ "g": 54,
87
+ "h": 55,
88
+ "i": 56,
89
+ "j": 57,
90
+ "k": 58,
91
+ "l": 59,
92
+ "m": 60,
93
+ "n": 61,
94
+ "o": 62,
95
+ "p": 63,
96
+ "q": 64,
97
+ "r": 65,
98
+ "s": 66,
99
+ "t": 67,
100
+ "u": 68,
101
+ "v": 69,
102
+ "w": 70,
103
+ "x": 71,
104
+ "y": 72,
105
+ "z": 73,
106
+ "Ç": 74,
107
+ "Î": 75,
108
+ "â": 76,
109
+ "ç": 77,
110
+ "ê": 78,
111
+ "ë": 79,
112
+ "î": 80,
113
+ "ö": 81,
114
+ "û": 82,
115
+ "ü": 83,
116
+ "ı": 84,
117
+ "Ş": 85,
118
+ "ş": 86,
119
+ "‘": 87,
120
+ "’": 88,
121
+ "…": 89,
122
+ "an": 90,
123
+ "er": 91,
124
+ "ku": 92,
125
+ "in": 93,
126
+ "ên": 94,
127
+ "rd": 95,
128
+ "man": 96,
129
+ "ar": 97,
130
+ "ir": 98,
131
+ "ek": 99,
132
+ "li": 100,
133
+ "bi": 101,
134
+ "iy": 102,
135
+ "kurd": 103,
136
+ "de": 104,
137
+ "iman": 105,
138
+ "în": 106,
139
+ "di": 107,
140
+ "ziman": 108,
141
+ "av": 109,
142
+ "st": 110,
143
+ "we": 111,
144
+ "he": 112,
145
+ "ay": 113,
146
+ "at": 114,
147
+ "ji": 115,
148
+ "jî": 116,
149
+ "xwe": 117,
150
+ "el": 118,
151
+ "kurdî": 119,
152
+ "tê": 120,
153
+ "ne": 121,
154
+ "Ku": 122,
155
+ "or": 123,
156
+ "ist": 124,
157
+ "ye": 125,
158
+ "zimanê": 126,
159
+ "ey": 127,
160
+ "ber": 128,
161
+ "ro": 129,
162
+ "ser": 130,
163
+ "tin": 131,
164
+ "kir": 132,
165
+ "gel": 133,
166
+ "Kurd": 134,
167
+ "lê": 135,
168
+ "pê": 136,
169
+ "istan": 137,
170
+ "kî": 138,
171
+ "bû": 139,
172
+ "manc": 140,
173
+ "iya": 141,
174
+ "en": 142,
175
+ "ekî": 143,
176
+ "ax": 144,
177
+ "na": 145,
178
+ "ba": 146,
179
+ "din": 147,
180
+ "arav": 148,
181
+ "ra": 149,
182
+ "hat": 150,
183
+ "ko": 151,
184
+ "kur": 152,
185
+ "sa": 153,
186
+ "vî": 154,
187
+ "zarav": 155,
188
+ "anî": 156,
189
+ "kirin": 157,
190
+ "dî": 158,
191
+ "nav": 159,
192
+ "ya": 160,
193
+ "ih": 161,
194
+ "Kurdistan": 162,
195
+ "ew": 163,
196
+ "her": 164,
197
+ "mî": 165,
198
+ "iye": 166,
199
+ "ma": 167,
200
+ "tî": 168,
201
+ "zaravay": 169,
202
+ "da": 170,
203
+ "eb": 171,
204
+ "ni": 172,
205
+ "Ji": 173,
206
+ "ftin": 174,
207
+ "me": 175,
208
+ "mancî": 176,
209
+ "axa": 177,
210
+ "Kurdistanê": 178,
211
+ "em": 179,
212
+ "is": 180,
213
+ "be": 181,
214
+ "lat": 182,
215
+ "ve": 183,
216
+ "axaftin": 184,
217
+ "iş": 185,
218
+ "kar": 186,
219
+ "yê": 187,
220
+ "irk": 188,
221
+ "Li": 189,
222
+ "andin": 190,
223
+ "bo": 191,
224
+ "ev": 192,
225
+ "wek": 193,
226
+ "yên": 194,
227
+ "anîn": 195,
228
+ "re": 196,
229
+ "roj": 197,
230
+ "vîs": 198,
231
+ "nivîs": 199,
232
+ "al": 200,
233
+ "van": 201,
234
+ "yek": 202,
235
+ "arê": 203,
236
+ "hem": 204,
237
+ "oranî": 205,
238
+ "du": 206,
239
+ "nî": 207,
240
+ "wan": 208,
241
+ "êr": 209,
242
+ "ûr": 210,
243
+ "kurdan": 211,
244
+ "et": 212,
245
+ "hin": 213,
246
+ "vê": 214,
247
+ "eyên": 215,
248
+ "gelek": 216,
249
+ "pêş": 217,
250
+ "nas": 218,
251
+ "hatiye": 219,
252
+ "saz": 220,
253
+ "ebî": 221,
254
+ "ali": 222,
255
+ "hi": 223,
256
+ "lîn": 224,
257
+ "ok": 225,
258
+ "ine": 226,
259
+ "iyê": 227,
260
+ "dev": 228,
261
+ "dik": 229,
262
+ "ayê": 230,
263
+ "lêko": 231,
264
+ "bûna": 232,
265
+ "bakur": 233,
266
+ "kurmancî": 234,
267
+ "iha": 235,
268
+ "lêkolîn": 236,
269
+ "aw": 237,
270
+ "eh": 238,
271
+ "far": 239,
272
+ "go": 240,
273
+ "lî": 241,
274
+ "mi": 242,
275
+ "wê": 243,
276
+ "zê": 244,
277
+ "îr": 245,
278
+ "erebî": 246,
279
+ "iyên": 247,
280
+ "îna": 248,
281
+ "Kur": 249,
282
+ "aliyê": 250,
283
+ "hilat": 251,
284
+ "faris": 252,
285
+ "Tirk": 253,
286
+ "as": 254,
287
+ "az": 255
288
+ },
289
+ "merges": [
290
+ [
291
+ "a",
292
+ "n"
293
+ ],
294
+ [
295
+ "e",
296
+ "r"
297
+ ],
298
+ [
299
+ "k",
300
+ "u"
301
+ ],
302
+ [
303
+ "i",
304
+ "n"
305
+ ],
306
+ [
307
+ "ê",
308
+ "n"
309
+ ],
310
+ [
311
+ "r",
312
+ "d"
313
+ ],
314
+ [
315
+ "m",
316
+ "an"
317
+ ],
318
+ [
319
+ "a",
320
+ "r"
321
+ ],
322
+ [
323
+ "i",
324
+ "r"
325
+ ],
326
+ [
327
+ "e",
328
+ "k"
329
+ ],
330
+ [
331
+ "l",
332
+ "i"
333
+ ],
334
+ [
335
+ "b",
336
+ "i"
337
+ ],
338
+ [
339
+ "i",
340
+ "y"
341
+ ],
342
+ [
343
+ "ku",
344
+ "rd"
345
+ ],
346
+ [
347
+ "d",
348
+ "e"
349
+ ],
350
+ [
351
+ "i",
352
+ "man"
353
+ ],
354
+ [
355
+ "î",
356
+ "n"
357
+ ],
358
+ [
359
+ "d",
360
+ "i"
361
+ ],
362
+ [
363
+ "z",
364
+ "iman"
365
+ ],
366
+ [
367
+ "a",
368
+ "v"
369
+ ],
370
+ [
371
+ "s",
372
+ "t"
373
+ ],
374
+ [
375
+ "w",
376
+ "e"
377
+ ],
378
+ [
379
+ "h",
380
+ "e"
381
+ ],
382
+ [
383
+ "a",
384
+ "y"
385
+ ],
386
+ [
387
+ "a",
388
+ "t"
389
+ ],
390
+ [
391
+ "j",
392
+ "i"
393
+ ],
394
+ [
395
+ "j",
396
+ "î"
397
+ ],
398
+ [
399
+ "x",
400
+ "we"
401
+ ],
402
+ [
403
+ "e",
404
+ "l"
405
+ ],
406
+ [
407
+ "kurd",
408
+ "î"
409
+ ],
410
+ [
411
+ "t",
412
+ "ê"
413
+ ],
414
+ [
415
+ "n",
416
+ "e"
417
+ ],
418
+ [
419
+ "K",
420
+ "u"
421
+ ],
422
+ [
423
+ "o",
424
+ "r"
425
+ ],
426
+ [
427
+ "i",
428
+ "st"
429
+ ],
430
+ [
431
+ "y",
432
+ "e"
433
+ ],
434
+ [
435
+ "ziman",
436
+ "ê"
437
+ ],
438
+ [
439
+ "e",
440
+ "y"
441
+ ],
442
+ [
443
+ "b",
444
+ "er"
445
+ ],
446
+ [
447
+ "r",
448
+ "o"
449
+ ],
450
+ [
451
+ "s",
452
+ "er"
453
+ ],
454
+ [
455
+ "t",
456
+ "in"
457
+ ],
458
+ [
459
+ "k",
460
+ "ir"
461
+ ],
462
+ [
463
+ "g",
464
+ "el"
465
+ ],
466
+ [
467
+ "Ku",
468
+ "rd"
469
+ ],
470
+ [
471
+ "l",
472
+ "ê"
473
+ ],
474
+ [
475
+ "p",
476
+ "ê"
477
+ ],
478
+ [
479
+ "ist",
480
+ "an"
481
+ ],
482
+ [
483
+ "k",
484
+ "î"
485
+ ],
486
+ [
487
+ "b",
488
+ "û"
489
+ ],
490
+ [
491
+ "man",
492
+ "c"
493
+ ],
494
+ [
495
+ "iy",
496
+ "a"
497
+ ],
498
+ [
499
+ "e",
500
+ "n"
501
+ ],
502
+ [
503
+ "ek",
504
+ "î"
505
+ ],
506
+ [
507
+ "a",
508
+ "x"
509
+ ],
510
+ [
511
+ "n",
512
+ "a"
513
+ ],
514
+ [
515
+ "b",
516
+ "a"
517
+ ],
518
+ [
519
+ "d",
520
+ "in"
521
+ ],
522
+ [
523
+ "ar",
524
+ "av"
525
+ ],
526
+ [
527
+ "r",
528
+ "a"
529
+ ],
530
+ [
531
+ "h",
532
+ "at"
533
+ ],
534
+ [
535
+ "k",
536
+ "o"
537
+ ],
538
+ [
539
+ "ku",
540
+ "r"
541
+ ],
542
+ [
543
+ "s",
544
+ "a"
545
+ ],
546
+ [
547
+ "v",
548
+ "î"
549
+ ],
550
+ [
551
+ "z",
552
+ "arav"
553
+ ],
554
+ [
555
+ "an",
556
+ "î"
557
+ ],
558
+ [
559
+ "kir",
560
+ "in"
561
+ ],
562
+ [
563
+ "d",
564
+ "î"
565
+ ],
566
+ [
567
+ "n",
568
+ "av"
569
+ ],
570
+ [
571
+ "y",
572
+ "a"
573
+ ],
574
+ [
575
+ "i",
576
+ "h"
577
+ ],
578
+ [
579
+ "Kurd",
580
+ "istan"
581
+ ],
582
+ [
583
+ "e",
584
+ "w"
585
+ ],
586
+ [
587
+ "h",
588
+ "er"
589
+ ],
590
+ [
591
+ "m",
592
+ "î"
593
+ ],
594
+ [
595
+ "iy",
596
+ "e"
597
+ ],
598
+ [
599
+ "m",
600
+ "a"
601
+ ],
602
+ [
603
+ "t",
604
+ "î"
605
+ ],
606
+ [
607
+ "zarav",
608
+ "ay"
609
+ ],
610
+ [
611
+ "d",
612
+ "a"
613
+ ],
614
+ [
615
+ "e",
616
+ "b"
617
+ ],
618
+ [
619
+ "n",
620
+ "i"
621
+ ],
622
+ [
623
+ "J",
624
+ "i"
625
+ ],
626
+ [
627
+ "f",
628
+ "tin"
629
+ ],
630
+ [
631
+ "m",
632
+ "e"
633
+ ],
634
+ [
635
+ "manc",
636
+ "î"
637
+ ],
638
+ [
639
+ "ax",
640
+ "a"
641
+ ],
642
+ [
643
+ "Kurdistan",
644
+ "ê"
645
+ ],
646
+ [
647
+ "e",
648
+ "m"
649
+ ],
650
+ [
651
+ "i",
652
+ "s"
653
+ ],
654
+ [
655
+ "b",
656
+ "e"
657
+ ],
658
+ [
659
+ "l",
660
+ "at"
661
+ ],
662
+ [
663
+ "v",
664
+ "e"
665
+ ],
666
+ [
667
+ "axa",
668
+ "ftin"
669
+ ],
670
+ [
671
+ "i",
672
+ "ş"
673
+ ],
674
+ [
675
+ "k",
676
+ "ar"
677
+ ],
678
+ [
679
+ "y",
680
+ "ê"
681
+ ],
682
+ [
683
+ "ir",
684
+ "k"
685
+ ],
686
+ [
687
+ "L",
688
+ "i"
689
+ ],
690
+ [
691
+ "an",
692
+ "din"
693
+ ],
694
+ [
695
+ "b",
696
+ "o"
697
+ ],
698
+ [
699
+ "e",
700
+ "v"
701
+ ],
702
+ [
703
+ "w",
704
+ "ek"
705
+ ],
706
+ [
707
+ "y",
708
+ "ên"
709
+ ],
710
+ [
711
+ "an",
712
+ "în"
713
+ ],
714
+ [
715
+ "r",
716
+ "e"
717
+ ],
718
+ [
719
+ "ro",
720
+ "j"
721
+ ],
722
+ [
723
+ "vî",
724
+ "s"
725
+ ],
726
+ [
727
+ "ni",
728
+ "vîs"
729
+ ],
730
+ [
731
+ "a",
732
+ "l"
733
+ ],
734
+ [
735
+ "v",
736
+ "an"
737
+ ],
738
+ [
739
+ "y",
740
+ "ek"
741
+ ],
742
+ [
743
+ "ar",
744
+ "ê"
745
+ ],
746
+ [
747
+ "he",
748
+ "m"
749
+ ],
750
+ [
751
+ "or",
752
+ "anî"
753
+ ],
754
+ [
755
+ "d",
756
+ "u"
757
+ ],
758
+ [
759
+ "n",
760
+ "î"
761
+ ],
762
+ [
763
+ "w",
764
+ "an"
765
+ ],
766
+ [
767
+ "ê",
768
+ "r"
769
+ ],
770
+ [
771
+ "û",
772
+ "r"
773
+ ],
774
+ [
775
+ "kurd",
776
+ "an"
777
+ ],
778
+ [
779
+ "e",
780
+ "t"
781
+ ],
782
+ [
783
+ "h",
784
+ "in"
785
+ ],
786
+ [
787
+ "v",
788
+ "ê"
789
+ ],
790
+ [
791
+ "ey",
792
+ "ên"
793
+ ],
794
+ [
795
+ "gel",
796
+ "ek"
797
+ ],
798
+ [
799
+ "pê",
800
+ "ş"
801
+ ],
802
+ [
803
+ "na",
804
+ "s"
805
+ ],
806
+ [
807
+ "hat",
808
+ "iye"
809
+ ],
810
+ [
811
+ "sa",
812
+ "z"
813
+ ],
814
+ [
815
+ "eb",
816
+ "î"
817
+ ],
818
+ [
819
+ "a",
820
+ "li"
821
+ ],
822
+ [
823
+ "h",
824
+ "i"
825
+ ],
826
+ [
827
+ "l",
828
+ "în"
829
+ ],
830
+ [
831
+ "o",
832
+ "k"
833
+ ],
834
+ [
835
+ "in",
836
+ "e"
837
+ ],
838
+ [
839
+ "iy",
840
+ "ê"
841
+ ],
842
+ [
843
+ "de",
844
+ "v"
845
+ ],
846
+ [
847
+ "di",
848
+ "k"
849
+ ],
850
+ [
851
+ "ay",
852
+ "ê"
853
+ ],
854
+ [
855
+ "lê",
856
+ "ko"
857
+ ],
858
+ [
859
+ "bû",
860
+ "na"
861
+ ],
862
+ [
863
+ "ba",
864
+ "kur"
865
+ ],
866
+ [
867
+ "kur",
868
+ "mancî"
869
+ ],
870
+ [
871
+ "ih",
872
+ "a"
873
+ ],
874
+ [
875
+ "lêko",
876
+ "lîn"
877
+ ],
878
+ [
879
+ "a",
880
+ "w"
881
+ ],
882
+ [
883
+ "e",
884
+ "h"
885
+ ],
886
+ [
887
+ "f",
888
+ "ar"
889
+ ],
890
+ [
891
+ "g",
892
+ "o"
893
+ ],
894
+ [
895
+ "l",
896
+ "î"
897
+ ],
898
+ [
899
+ "m",
900
+ "i"
901
+ ],
902
+ [
903
+ "w",
904
+ "ê"
905
+ ],
906
+ [
907
+ "z",
908
+ "ê"
909
+ ],
910
+ [
911
+ "î",
912
+ "r"
913
+ ],
914
+ [
915
+ "er",
916
+ "ebî"
917
+ ],
918
+ [
919
+ "iy",
920
+ "ên"
921
+ ],
922
+ [
923
+ "în",
924
+ "a"
925
+ ],
926
+ [
927
+ "Ku",
928
+ "r"
929
+ ],
930
+ [
931
+ "ali",
932
+ "yê"
933
+ ],
934
+ [
935
+ "hi",
936
+ "lat"
937
+ ],
938
+ [
939
+ "far",
940
+ "is"
941
+ ],
942
+ [
943
+ "T",
944
+ "irk"
945
+ ],
946
+ [
947
+ "a",
948
+ "s"
949
+ ],
950
+ [
951
+ "a",
952
+ "z"
953
+ ]
954
+ ]
955
+ }
956
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[UNK]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ }
11
+ },
12
+ "clean_up_tokenization_spaces": false,
13
+ "extra_special_tokens": {},
14
+ "model_max_length": 1000000000000000019884624838656,
15
+ "tokenizer_class": "PreTrainedTokenizerFast"
16
+ }