Upload tokenizer

Browse files

Files changed (4) hide show

README.md +199 -0
special_tokens_map.json +7 -0
tokenizer.json +1064 -0
tokenizer_config.json +52 -0

README.md ADDED Viewed

	@@ -0,0 +1,199 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "bos_token": "<s>",
+  "eos_token": "</s>",
+  "mask_token": "<mask>",
+  "pad_token": "<pad>",
+  "unk_token": "<unk>"
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,1064 @@

+{
+  "version": "1.0",
+  "truncation": null,
+  "padding": null,
+  "added_tokens": [
+    {
+      "id": 0,
+      "content": "<s>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 1,
+      "content": "</s>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 2,
+      "content": "<unk>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 3,
+      "content": "<pad>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 4,
+      "content": "<mask>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    }
+  ],
+  "normalizer": null,
+  "pre_tokenizer": {
+    "type": "ByteLevel",
+    "add_prefix_space": true,
+    "trim_offsets": true,
+    "use_regex": true
+  },
+  "post_processor": {
+    "type": "TemplateProcessing",
+    "single": [
+      {
+        "Sequence": {
+          "id": "A",
+          "type_id": 0
+        }
+      },
+      {
+        "SpecialToken": {
+          "id": "</s>",
+          "type_id": 0
+        }
+      }
+    ],
+    "pair": [
+      {
+        "Sequence": {
+          "id": "A",
+          "type_id": 0
+        }
+      },
+      {
+        "SpecialToken": {
+          "id": "</s>",
+          "type_id": 0
+        }
+      },
+      {
+        "Sequence": {
+          "id": "B",
+          "type_id": 1
+        }
+      },
+      {
+        "SpecialToken": {
+          "id": "</s>",
+          "type_id": 1
+        }
+      }
+    ],
+    "special_tokens": {
+      "</s>": {
+        "id": "</s>",
+        "ids": [
+          1
+        ],
+        "tokens": [
+          "</s>"
+        ]
+      },
+      "<s>": {
+        "id": "<s>",
+        "ids": [
+          0
+        ],
+        "tokens": [
+          "<s>"
+        ]
+      }
+    }
+  },
+  "decoder": {
+    "type": "ByteLevel",
+    "add_prefix_space": true,
+    "trim_offsets": true,
+    "use_regex": true
+  },
+  "model": {
+    "type": "BPE",
+    "dropout": null,
+    "unk_token": "<unk>",
+    "continuing_subword_prefix": null,
+    "end_of_word_suffix": null,
+    "fuse_unk": false,
+    "byte_fallback": false,
+    "vocab": {
+      "<s>": 0,
+      "</s>": 1,
+      "<unk>": 2,
+      "<pad>": 3,
+      "<mask>": 4,
+      "-": 5,
+      "a": 6,
+      "b": 7,
+      "c": 8,
+      "d": 9,
+      "e": 10,
+      "f": 11,
+      "g": 12,
+      "h": 13,
+      "i": 14,
+      "j": 15,
+      "k": 16,
+      "l": 17,
+      "m": 18,
+      "n": 19,
+      "o": 20,
+      "p": 21,
+      "q": 22,
+      "r": 23,
+      "s": 24,
+      "t": 25,
+      "u": 26,
+      "v": 27,
+      "w": 28,
+      "x": 29,
+      "y": 30,
+      "z": 31,
+      "¡": 32,
+      "£": 33,
+      "¤": 34,
+      "¥": 35,
+      "¦": 36,
+      "§": 37,
+      "¨": 38,
+      "©": 39,
+      "«": 40,
+      "¬": 41,
+      "¯": 42,
+      "±": 43,
+      "³": 44,
+      "¶": 45,
+      "¸": 46,
+      "º": 47,
+      "¼": 48,
+      "½": 49,
+      "¾": 50,
+      "Ã": 51,
+      "Ä": 52,
+      "Å": 53,
+      "Ì": 54,
+      "â": 55,
+      "Ġ": 56,
+      "Ģ": 57,
+      "ģ": 58,
+      "Ĥ": 59,
+      "ĥ": 60,
+      "Ħ": 61,
+      "ĩ": 62,
+      "Ī": 63,
+      "į": 64,
+      "ı": 65,
+      "ĳ": 66,
+      "ĵ": 67,
+      "Ķ": 68,
+      "Ĺ": 69,
+      "Ļ": 70,
+      "Ľ": 71,
+      "ľ": 72,
+      "ŀ": 73,
+      "Ł": 74,
+      "ł": 75,
+      "Ń": 76,
+      "ÃŃ": 77,
+      "Ã¡": 78,
+      "Ġp": 79,
+      "Ġs": 80,
+      "ÄĽ": 81,
+      "Ġt": 82,
+      "Ġv": 83,
+      "Ġn": 84,
+      "ÅĻ": 85,
+      "Ġj": 86,
+      "nÃŃ": 87,
+      "Ã©": 88,
+      "st": 89,
+      "Å¾": 90,
+      "Ġz": 91,
+      "Ġd": 92,
+      "ro": 93,
+      "Ġa": 94,
+      "ch": 95,
+      "ov": 96,
+      "Äį": 97,
+      "Ġm": 98,
+      "Ġk": 99,
+      "Ġpo": 100,
+      "Ã½": 101,
+      "Ġo": 102,
+      "ed": 103,
+      "Å¡": 104,
+      "la": 105,
+      "en": 106,
+      "Ġb": 107,
+      "ra": 108,
+      "ou": 109,
+      "ak": 110,
+      "em": 111,
+      "li": 112,
+      "Å¯": 113,
+      "te": 114,
+      "le": 115,
+      "ho": 116,
+      "Ġna": 117,
+      "Ġto": 118,
+      "ĠpÅĻ": 119,
+      "nÄĽ": 120,
+      "Ġje": 121,
+      "Ġpro": 122,
+      "Ġse": 123,
+      "Ġne": 124,
+      "ce": 125,
+      "Å¾e": 126,
+      "to": 127,
+      "in": 128,
+      "an": 129,
+      "sk": 130,
+      "Ġdo": 131,
+      "at": 132,
+      "ÃŃm": 133,
+      "rÃ¡": 134,
+      "Ġby": 135,
+      "Ġza": 136,
+      "ĠÄį": 137,
+      "uj": 138,
+      "lo": 139,
+      "no": 140,
+      "it": 141,
+      "Ġst": 142,
+      "Ġu": 143,
+      "ÅĻe": 144,
+      "ĠÅ¾e": 145,
+      "Ġtak": 146,
+      "ni": 147,
+      "po": 148,
+      "ad": 149,
+      "ci": 150,
+      "al": 151,
+      "Ġko": 152,
+      "ko": 153,
+      "ĠnÃ¡": 154,
+      "nÃ¡": 155,
+      "na": 156,
+      "Ġro": 157,
+      "vo": 158,
+      "ru": 159,
+      "ku": 160,
+      "ti": 161,
+      "Ġvy": 162,
+      "va": 163,
+      "Ġh": 164,
+      "re": 165,
+      "Ã½ch": 166,
+      "de": 167,
+      "Ġjs": 168,
+      "ck": 169,
+      "nÃ©": 170,
+      "lÃ¡": 171,
+      "Ġve": 172,
+      "cÃŃ": 173,
+      "ĠzÃ¡": 174,
+      "Ġkte": 175,
+      "ne": 176,
+      "by": 177,
+      "ky": 178,
+      "ÅĻÃŃ": 179,
+      "vÄĽ": 180,
+      "Ġob": 181,
+      "ĠpÅĻed": 182,
+      "Ã¡t": 183,
+      "Ġf": 184,
+      "mi": 185,
+      "ka": 186,
+      "me": 187,
+      "Ġpos": 188,
+      "Ġpod": 189,
+      "dy": 190,
+      "Ãº": 191,
+      "Å¡e": 192,
+      "mÄĽ": 193,
+      "Ġpan": 194,
+      "Ġjak": 195,
+      "Ġjed": 196,
+      "ĠÃº": 197,
+      "ovÃ¡": 198,
+      "ĠpÅĻi": 199,
+      "Å¡ÃŃ": 200,
+      "mu": 201,
+      "jÃŃ": 202,
+      "skÃ©": 203,
+      "vr": 204,
+      "ĠvÃ½": 205,
+      "bo": 206,
+      "vÃ¡": 207,
+      "Ġod": 208,
+      "sti": 209,
+      "lu": 210,
+      "Ġe": 211,
+      "ze": 212,
+      "ÄĽk": 213,
+      "Ġposla": 214,
+      "Ġi": 215,
+      "ÅĻi": 216,
+      "pra": 217,
+      "Ġroz": 218,
+      "Ġkter": 219,
+      "tÄĽ": 220,
+      "da": 221,
+      "vrh": 222,
+      "ist": 223,
+      "ovÃ©": 224,
+      "dÄĽ": 225,
+      "Ġre": 226,
+      "du": 227,
+      "uji": 228,
+      "ar": 229,
+      "Ġbu": 230,
+      "ova": 231,
+      "ĠvÃ¡": 232,
+      "vnÃŃ": 233,
+      "Ġmo": 234,
+      "Ġch": 235,
+      "il": 236,
+      "or": 237,
+      "ĠmÄĽ": 238,
+      "er": 239,
+      "is": 240,
+      "ĠpÅĻÃŃ": 241,
+      "ovat": 242,
+      "uje": 243,
+      "ob": 244,
+      "sta": 245,
+      "ny": 246,
+      "vÃŃ": 247,
+      "sl": 248,
+      "ĠmÃ¡": 249,
+      "ĠvÄĽ": 250,
+      "ĠnÃ¡vrh": 251,
+      "ent": 252,
+      "Ġc": 253,
+      "ĠjÃ¡": 254,
+      "ĠnÄĽ": 255,
+      "am": 256,
+      "Ġale": 257,
+      "Ġsi": 258,
+      "ct": 259,
+      "Ġaby": 260,
+      "Ġbyl": 261,
+      "ÅĪ": 262,
+      "nÃŃm": 263,
+      "cho": 264,
+      "ĠpÅĻe": 265,
+      "Ġpr": 266,
+      "ckÃ©": 267,
+      "nu": 268,
+      "Ã¡l": 269,
+      "Ġmin": 270,
+      "nost": 271,
+      "je": 272,
+      "Ġsou": 273,
+      "Ã½m": 274,
+      "lÃ©": 275,
+      "nÃŃch": 276,
+      "Ġpoz": 277,
+      "ĠdÄĽk": 278,
+      "Äįe": 279,
+      "se": 280,
+      "ĠÅ¾": 281,
+      "Ġde": 282,
+      "eme": 283,
+      "Ġpra": 284,
+      "ji": 285,
+      "ady": 286,
+      "hod": 287,
+      "Ġjako": 288,
+      "kÃ¡": 289,
+      "Ġten": 290,
+      "tÃŃ": 291,
+      "Ġpa": 292,
+      "las": 293,
+      "ĠdÄĽkuji": 294,
+      "Ġjsou": 295,
+      "ĠprÃ¡": 296,
+      "sed": 297,
+      "ty": 298,
+      "Ġnej": 299,
+      "prav": 300,
+      "ĠdÅ¯": 301,
+      "tu": 302,
+      "pe": 303,
+      "nou": 304,
+      "Ġproto": 305,
+      "Ġle": 306,
+      "eno": 307,
+      "Ġjsem": 308,
+      "ĠÅ¡": 309,
+      "Ġposlan": 310,
+      "zi": 311,
+      "do": 312,
+      "ry": 313,
+      "Ġdva": 314,
+      "ĠtÃ©": 315,
+      "Ġspo": 316,
+      "Ġkon": 317,
+      "Ġin": 318,
+      "ovÃ¡nÃŃ": 319,
+      "Ġtady": 320,
+      "ĠÄįe": 321,
+      "lov": 322,
+      "Ġmy": 323,
+      "ve": 324,
+      "ĠkterÃ©": 325,
+      "ÅĻed": 326,
+      "dÃ¡": 327,
+      "Ġsv": 328,
+      "stu": 329,
+      "sÃŃm": 330,
+      "ÄįnÃŃ": 331,
+      "kla": 332,
+      "Ġmi": 333,
+      "Ġos": 334,
+      "Ġni": 335,
+      "Ġminist": 336,
+      "Ġco": 337,
+      "Ġvo": 338,
+      "Ġmu": 339,
+      "tel": 340,
+      "ĠzÃ¡ko": 341,
+      "nÃ½": 342,
+      "as": 343,
+      "Ġev": 344,
+      "Ġnem": 345,
+      "ri": 346,
+      "ĠtakÃ©": 347,
+      "rop": 348,
+      "Ġte": 349,
+      "Ġvel": 350,
+      "Ġbo": 351,
+      "ĠvlÃ¡": 352,
+      "ĠmÃŃ": 353,
+      "Å¡tÄĽ": 354,
+      "dnÃŃ": 355,
+      "ly": 356,
+      "Ġli": 357,
+      "Ġposlane": 358,
+      "ĠpÅĻedsed": 359,
+      "ĠtÄĽ": 360,
+      "Ġce": 361,
+      "led": 362,
+      "Ġkdy": 363,
+      "mÃŃ": 364,
+      "pad": 365,
+      "di": 366,
+      "ĠÅĻÃŃ": 367,
+      "Ġtoho": 368,
+      "Ġtom": 369,
+      "len": 370,
+      "pu": 371,
+      "bu": 372,
+      "ta": 373,
+      "ujÃŃ": 374,
+      "lou": 375,
+      "Ġevrop": 376,
+      "ĠstÃ¡t": 377,
+      "Ã¡d": 378,
+      "prÃ¡": 379,
+      "Ġtu": 380,
+      "vy": 381,
+      "sto": 382,
+      "sÃ¡t": 383,
+      "vi": 384,
+      "Ġty": 385,
+      "Ġjsme": 386,
+      "Å¾en": 387,
+      "ĠÅĻe": 388,
+      "Ġta": 389,
+      "ÅĻej": 390,
+      "ba": 391,
+      "nosti": 392,
+      "Ġhlas": 393,
+      "Ġnebo": 394,
+      "mo": 395,
+      "Ġji": 396,
+      "my": 397,
+      "ajÃŃ": 398,
+      "tÃ¡": 399,
+      "oval": 400,
+      "Ã©ho": 401,
+      "Ġbud": 402,
+      "leg": 403,
+      "Ġsta": 404,
+      "Ġpane": 405,
+      "isk": 406,
+      "Å¾ÃŃ": 407,
+      "Ġho": 408,
+      "ste": 409,
+      "ĠnenÃŃ": 410,
+      "stup": 411,
+      "vÄĽt": 412,
+      "ĠtÅĻi": 413,
+      "mov": 414,
+      "Ġdal": 415,
+      "Ġprost": 416,
+      "ez": 417,
+      "Ġkoleg": 418,
+      "Ġbude": 419,
+      "Ġka": 420,
+      "Ġvz": 421,
+      "lÃŃ": 422,
+      "ĠpanÃŃ": 423,
+      "nÃŃho": 424,
+      "cet": 425,
+      "za": 426,
+      "ĠkterÃ½": 427,
+      "ĠprotoÅ¾e": 428,
+      "Ġslov": 429,
+      "chÃ¡": 430,
+      "Ġdob": 431,
+      "men": 432,
+      "Ġpot": 433,
+      "ruh": 434,
+      "Å¾i": 435,
+      "sÃŃ": 436,
+      "Ġze": 437,
+      "Ġtomu": 438,
+      "ÄįnÄĽ": 439,
+      "Ġpoli": 440,
+      "ĠtÃŃm": 441,
+      "ĠvÅ¡": 442,
+      "rov": 443,
+      "ĠsnÄĽ": 444,
+      "ĠvÃ½bo": 445,
+      "ĠdÃ¡": 446,
+      "Ġbylo": 447,
+      "ÄĽt": 448,
+      "Ġsam": 449,
+      "Ġbych": 450,
+      "Ġbyla": 451,
+      "ĠsnÄĽmov": 452,
+      "isÃŃ": 453,
+      "Ġg": 454,
+      "ĠbÃ½": 455,
+      "ĠnÄĽk": 456,
+      "Ġsto": 457,
+      "dÃŃ": 458,
+      "kÅ¯": 459,
+      "ĠtakÅ¾e": 460,
+      "ÄįÃŃ": 461,
+      "sa": 462,
+      "Ġdne": 463,
+      "ma": 464,
+      "ĠprosÃŃm": 465,
+      "zÃŃ": 466,
+      "Ġjedno": 467,
+      "ter": 468,
+      "Ġdruh": 469,
+      "ĠvÅ¡e": 470,
+      "ĠuÅ¾": 471,
+      "Ġjeho": 472,
+      "nÃ½ch": 473,
+      "edy": 474,
+      "Ġprob": 475,
+      "ĠdalÅ¡ÃŃ": 476,
+      "chom": 477,
+      "Ġzd": 478,
+      "kou": 479,
+      "rÅ¯": 480,
+      "Ġtedy": 481,
+      "Ġsku": 482,
+      "Å¡ÃŃm": 483,
+      "Ġpou": 484,
+      "ÅĻad": 485,
+      "Ġpoku": 486,
+      "vnÄĽ": 487,
+      "Ġsed": 488,
+      "ovÄĽ": 489,
+      "Ġzem": 490,
+      "ĠtisÃŃ": 491,
+      "Ġsamo": 492,
+      "vod": 493,
+      "Å¾it": 494,
+      "bli": 495,
+      "Ã©m": 496,
+      "Ġstra": 497,
+      "tick": 498,
+      "ĠmoÅ¾": 499
+    },
+    "merges": [
+      "Ã Ń",
+      "Ã ¡",
+      "Ġ p",
+      "Ġ s",
+      "Ä Ľ",
+      "Ġ t",
+      "Ġ v",
+      "Ġ n",
+      "Å Ļ",
+      "Ġ j",
+      "n ÃŃ",
+      "Ã ©",
+      "s t",
+      "Å ¾",
+      "Ġ z",
+      "Ġ d",
+      "r o",
+      "Ġ a",
+      "c h",
+      "o v",
+      "Ä į",
+      "Ġ m",
+      "Ġ k",
+      "Ġp o",
+      "Ã ½",
+      "Ġ o",
+      "e d",
+      "Å ¡",
+      "l a",
+      "e n",
+      "Ġ b",
+      "r a",
+      "o u",
+      "a k",
+      "e m",
+      "l i",
+      "Å ¯",
+      "t e",
+      "l e",
+      "h o",
+      "Ġn a",
+      "Ġt o",
+      "Ġp ÅĻ",
+      "n ÄĽ",
+      "Ġj e",
+      "Ġp ro",
+      "Ġs e",
+      "Ġn e",
+      "c e",
+      "Å¾ e",
+      "t o",
+      "i n",
+      "a n",
+      "s k",
+      "Ġd o",
+      "a t",
+      "ÃŃ m",
+      "r Ã¡",
+      "Ġb y",
+      "Ġz a",
+      "Ġ Äį",
+      "u j",
+      "l o",
+      "n o",
+      "i t",
+      "Ġs t",
+      "Ġ u",
+      "ÅĻ e",
+      "Ġ Å¾e",
+      "Ġt ak",
+      "n i",
+      "p o",
+      "a d",
+      "c i",
+      "a l",
+      "Ġk o",
+      "k o",
+      "Ġn Ã¡",
+      "n Ã¡",
+      "n a",
+      "Ġ ro",
+      "v o",
+      "r u",
+      "k u",
+      "t i",
+      "Ġv y",
+      "v a",
+      "Ġ h",
+      "r e",
+      "Ã½ ch",
+      "d e",
+      "Ġj s",
+      "c k",
+      "n Ã©",
+      "l Ã¡",
+      "Ġv e",
+      "c ÃŃ",
+      "Ġz Ã¡",
+      "Ġk te",
+      "n e",
+      "b y",
+      "k y",
+      "ÅĻ ÃŃ",
+      "v ÄĽ",
+      "Ġo b",
+      "ĠpÅĻ ed",
+      "Ã¡ t",
+      "Ġ f",
+      "m i",
+      "k a",
+      "m e",
+      "Ġpo s",
+      "Ġpo d",
+      "d y",
+      "Ã º",
+      "Å¡ e",
+      "m ÄĽ",
+      "Ġp an",
+      "Ġj ak",
+      "Ġj ed",
+      "Ġ Ãº",
+      "ov Ã¡",
+      "ĠpÅĻ i",
+      "Å¡ ÃŃ",
+      "m u",
+      "j ÃŃ",
+      "sk Ã©",
+      "v r",
+      "Ġv Ã½",
+      "b o",
+      "v Ã¡",
+      "Ġo d",
+      "st i",
+      "l u",
+      "Ġ e",
+      "z e",
+      "ÄĽ k",
+      "Ġpos la",
+      "Ġ i",
+      "ÅĻ i",
+      "p ra",
+      "Ġro z",
+      "Ġkte r",
+      "t ÄĽ",
+      "d a",
+      "vr h",
+      "i st",
+      "ov Ã©",
+      "d ÄĽ",
+      "Ġ re",
+      "d u",
+      "uj i",
+      "a r",
+      "Ġb u",
+      "ov a",
+      "Ġv Ã¡",
+      "v nÃŃ",
+      "Ġm o",
+      "Ġ ch",
+      "i l",
+      "o r",
+      "Ġm ÄĽ",
+      "e r",
+      "i s",
+      "ĠpÅĻ ÃŃ",
+      "ov at",
+      "uj e",
+      "o b",
+      "st a",
+      "n y",
+      "v ÃŃ",
+      "s l",
+      "Ġm Ã¡",
+      "Ġv ÄĽ",
+      "ĠnÃ¡ vrh",
+      "en t",
+      "Ġ c",
+      "Ġj Ã¡",
+      "Ġn ÄĽ",
+      "a m",
+      "Ġa le",
+      "Ġs i",
+      "c t",
+      "Ġa by",
+      "Ġby l",
+      "Å Ī",
+      "nÃŃ m",
+      "ch o",
+      "ĠpÅĻ e",
+      "Ġp r",
+      "ck Ã©",
+      "n u",
+      "Ã¡ l",
+      "Ġm in",
+      "no st",
+      "j e",
+      "Ġs ou",
+      "Ã½ m",
+      "l Ã©",
+      "nÃŃ ch",
+      "Ġpo z",
+      "Ġd ÄĽk",
+      "Äį e",
+      "s e",
+      "Ġ Å¾",
+      "Ġd e",
+      "em e",
+      "Ġp ra",
+      "j i",
+      "ad y",
+      "ho d",
+      "Ġjak o",
+      "k Ã¡",
+      "Ġt en",
+      "t ÃŃ",
+      "Ġp a",
+      "la s",
+      "ĠdÄĽk uji",
+      "Ġjs ou",
+      "Ġp rÃ¡",
+      "s ed",
+      "t y",
+      "Ġne j",
+      "pra v",
+      "Ġd Å¯",
+      "t u",
+      "p e",
+      "n ou",
+      "Ġpro to",
+      "Ġ le",
+      "en o",
+      "Ġjs em",
+      "Ġ Å¡",
+      "Ġposla n",
+      "z i",
+      "d o",
+      "r y",
+      "Ġd va",
+      "Ġt Ã©",
+      "Ġs po",
+      "Ġko n",
+      "Ġ in",
+      "ovÃ¡ nÃŃ",
+      "Ġt ady",
+      "ĠÄį e",
+      "l ov",
+      "Ġm y",
+      "v e",
+      "Ġkter Ã©",
+      "ÅĻ ed",
+      "d Ã¡",
+      "Ġs v",
+      "st u",
+      "s ÃŃm",
+      "Äį nÃŃ",
+      "k la",
+      "Ġm i",
+      "Ġo s",
+      "Ġn i",
+      "Ġmin ist",
+      "Ġc o",
+      "Ġv o",
+      "Ġm u",
+      "te l",
+      "ĠzÃ¡ ko",
+      "n Ã½",
+      "a s",
+      "Ġe v",
+      "Ġn em",
+      "r i",
+      "Ġtak Ã©",
+      "ro p",
+      "Ġt e",
+      "Ġve l",
+      "Ġb o",
+      "Ġv lÃ¡",
+      "Ġm ÃŃ",
+      "Å¡ tÄĽ",
+      "d nÃŃ",
+      "l y",
+      "Ġ li",
+      "Ġposla ne",
+      "ĠpÅĻed sed",
+      "Ġt ÄĽ",
+      "Ġ ce",
+      "l ed",
+      "Ġk dy",
+      "m ÃŃ",
+      "p ad",
+      "d i",
+      "Ġ ÅĻÃŃ",
+      "Ġto ho",
+      "Ġto m",
+      "l en",
+      "p u",
+      "b u",
+      "t a",
+      "uj ÃŃ",
+      "l ou",
+      "Ġev rop",
+      "Ġst Ã¡t",
+      "Ã¡ d",
+      "p rÃ¡",
+      "Ġt u",
+      "v y",
+      "st o",
+      "s Ã¡t",
+      "v i",
+      "Ġt y",
+      "Ġjs me",
+      "Å¾ en",
+      "Ġ ÅĻe",
+      "Ġt a",
+      "ÅĻe j",
+      "b a",
+      "no sti",
+      "Ġh las",
+      "Ġne bo",
+      "m o",
+      "Ġj i",
+      "m y",
+      "a jÃŃ",
+      "t Ã¡",
+      "ov al",
+      "Ã© ho",
+      "Ġbu d",
+      "le g",
+      "Ġst a",
+      "Ġpan e",
+      "i sk",
+      "Å¾ ÃŃ",
+      "Ġ ho",
+      "st e",
+      "Ġne nÃŃ",
+      "stu p",
+      "vÄĽ t",
+      "Ġt ÅĻi",
+      "m ov",
+      "Ġd al",
+      "Ġpro st",
+      "e z",
+      "Ġko leg",
+      "Ġbu de",
+      "Ġk a",
+      "Ġv z",
+      "l ÃŃ",
+      "Ġpa nÃŃ",
+      "nÃŃ ho",
+      "ce t",
+      "z a",
+      "Ġkter Ã½",
+      "Ġproto Å¾e",
+      "Ġs lov",
+      "ch Ã¡",
+      "Ġdo b",
+      "m en",
+      "Ġpo t",
+      "ru h",
+      "Å¾ i",
+      "s ÃŃ",
+      "Ġz e",
+      "Ġto mu",
+      "Äį nÄĽ",
+      "Ġpo li",
+      "Ġt ÃŃm",
+      "Ġv Å¡",
+      "ro v",
+      "Ġs nÄĽ",
+      "ĠvÃ½ bo",
+      "Ġd Ã¡",
+      "Ġby lo",
+      "ÄĽ t",
+      "Ġs am",
+      "Ġby ch",
+      "Ġby la",
+      "ĠsnÄĽ mov",
+      "is ÃŃ",
+      "Ġ g",
+      "Ġb Ã½",
+      "Ġn ÄĽk",
+      "Ġs to",
+      "d ÃŃ",
+      "k Å¯",
+      "Ġtak Å¾e",
+      "Äį ÃŃ",
+      "s a",
+      "Ġd ne",
+      "m a",
+      "Ġpro sÃŃm",
+      "z ÃŃ",
+      "Ġjed no",
+      "te r",
+      "Ġd ruh",
+      "Ġv Å¡e",
+      "Ġu Å¾",
+      "Ġje ho",
+      "n Ã½ch",
+      "ed y",
+      "Ġpro b",
+      "Ġdal Å¡ÃŃ",
+      "cho m",
+      "Ġz d",
+      "k ou",
+      "r Å¯",
+      "Ġt edy",
+      "Ġs ku",
+      "Å¡ ÃŃm",
+      "Ġpo u",
+      "ÅĻ ad",
+      "Ġpo ku",
+      "v nÄĽ",
+      "Ġs ed",
+      "ov ÄĽ",
+      "Ġz em",
+      "Ġt isÃŃ",
+      "Ġsam o",
+      "vo d",
+      "Å¾ it",
+      "b li",
+      "Ã© m",
+      "Ġst ra",
+      "ti ck",
+      "Ġmo Å¾"
+    ]
+  }
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,52 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<mask>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "</s>",
+  "mask_token": "<mask>",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<pad>",
+  "tokenizer_class": "PreTrainedTokenizerFast",
+  "unk_token": "<unk>"
+}