Upload tokenizer

Browse files

Files changed (4) hide show

README.md +199 -0
special_tokens_map.json +30 -0
tokenizer.json +404 -0
tokenizer_config.json +44 -0

README.md ADDED Viewed

	@@ -0,0 +1,199 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": true,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": true,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,404 @@

+{
+  "version": "1.0",
+  "truncation": null,
+  "padding": null,
+  "added_tokens": [
+    {
+      "id": 0,
+      "content": "<unk>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 1,
+      "content": "<s>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 2,
+      "content": "</s>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": true,
+      "normalized": false,
+      "special": true
+    }
+  ],
+  "normalizer": null,
+  "pre_tokenizer": {
+    "type": "Metaspace",
+    "replacement": "▁",
+    "prepend_scheme": "never",
+    "split": false
+  },
+  "post_processor": {
+    "type": "TemplateProcessing",
+    "single": [
+      {
+        "Sequence": {
+          "id": "A",
+          "type_id": 0
+        }
+      }
+    ],
+    "pair": [
+      {
+        "Sequence": {
+          "id": "A",
+          "type_id": 0
+        }
+      },
+      {
+        "Sequence": {
+          "id": "B",
+          "type_id": 1
+        }
+      }
+    ],
+    "special_tokens": {}
+  },
+  "decoder": {
+    "type": "Sequence",
+    "decoders": [
+      {
+        "type": "Replace",
+        "pattern": {
+          "String": "▁"
+        },
+        "content": " "
+      },
+      {
+        "type": "ByteFallback"
+      },
+      {
+        "type": "Fuse"
+      }
+    ]
+  },
+  "model": {
+    "type": "BPE",
+    "dropout": null,
+    "unk_token": "<unk>",
+    "continuing_subword_prefix": null,
+    "end_of_word_suffix": null,
+    "fuse_unk": true,
+    "byte_fallback": true,
+    "ignore_merges": false,
+    "vocab": {
+      "<unk>": 0,
+      "<s>": 1,
+      "</s>": 2,
+      "<0x00>": 3,
+      "<0x01>": 4,
+      "<0x02>": 5,
+      "<0x03>": 6,
+      "<0x04>": 7,
+      "<0x05>": 8,
+      "<0x06>": 9,
+      "<0x07>": 10,
+      "<0x08>": 11,
+      "<0x09>": 12,
+      "<0x0A>": 13,
+      "<0x0B>": 14,
+      "<0x0C>": 15,
+      "<0x0D>": 16,
+      "<0x0E>": 17,
+      "<0x0F>": 18,
+      "<0x10>": 19,
+      "<0x11>": 20,
+      "<0x12>": 21,
+      "<0x13>": 22,
+      "<0x14>": 23,
+      "<0x15>": 24,
+      "<0x16>": 25,
+      "<0x17>": 26,
+      "<0x18>": 27,
+      "<0x19>": 28,
+      "<0x1A>": 29,
+      "<0x1B>": 30,
+      "<0x1C>": 31,
+      "<0x1D>": 32,
+      "<0x1E>": 33,
+      "<0x1F>": 34,
+      "<0x20>": 35,
+      "<0x21>": 36,
+      "<0x22>": 37,
+      "<0x23>": 38,
+      "<0x24>": 39,
+      "<0x25>": 40,
+      "<0x26>": 41,
+      "<0x27>": 42,
+      "<0x28>": 43,
+      "<0x29>": 44,
+      "<0x2A>": 45,
+      "<0x2B>": 46,
+      "<0x2C>": 47,
+      "<0x2D>": 48,
+      "<0x2E>": 49,
+      "<0x2F>": 50,
+      "<0x30>": 51,
+      "<0x31>": 52,
+      "<0x32>": 53,
+      "<0x33>": 54,
+      "<0x34>": 55,
+      "<0x35>": 56,
+      "<0x36>": 57,
+      "<0x37>": 58,
+      "<0x38>": 59,
+      "<0x39>": 60,
+      "<0x3A>": 61,
+      "<0x3B>": 62,
+      "<0x3C>": 63,
+      "<0x3D>": 64,
+      "<0x3E>": 65,
+      "<0x3F>": 66,
+      "<0x40>": 67,
+      "<0x41>": 68,
+      "<0x42>": 69,
+      "<0x43>": 70,
+      "<0x44>": 71,
+      "<0x45>": 72,
+      "<0x46>": 73,
+      "<0x47>": 74,
+      "<0x48>": 75,
+      "<0x49>": 76,
+      "<0x4A>": 77,
+      "<0x4B>": 78,
+      "<0x4C>": 79,
+      "<0x4D>": 80,
+      "<0x4E>": 81,
+      "<0x4F>": 82,
+      "<0x50>": 83,
+      "<0x51>": 84,
+      "<0x52>": 85,
+      "<0x53>": 86,
+      "<0x54>": 87,
+      "<0x55>": 88,
+      "<0x56>": 89,
+      "<0x57>": 90,
+      "<0x58>": 91,
+      "<0x59>": 92,
+      "<0x5A>": 93,
+      "<0x5B>": 94,
+      "<0x5C>": 95,
+      "<0x5D>": 96,
+      "<0x5E>": 97,
+      "<0x5F>": 98,
+      "<0x60>": 99,
+      "<0x61>": 100,
+      "<0x62>": 101,
+      "<0x63>": 102,
+      "<0x64>": 103,
+      "<0x65>": 104,
+      "<0x66>": 105,
+      "<0x67>": 106,
+      "<0x68>": 107,
+      "<0x69>": 108,
+      "<0x6A>": 109,
+      "<0x6B>": 110,
+      "<0x6C>": 111,
+      "<0x6D>": 112,
+      "<0x6E>": 113,
+      "<0x6F>": 114,
+      "<0x70>": 115,
+      "<0x71>": 116,
+      "<0x72>": 117,
+      "<0x73>": 118,
+      "<0x74>": 119,
+      "<0x75>": 120,
+      "<0x76>": 121,
+      "<0x77>": 122,
+      "<0x78>": 123,
+      "<0x79>": 124,
+      "<0x7A>": 125,
+      "<0x7B>": 126,
+      "<0x7C>": 127,
+      "<0x7D>": 128,
+      "<0x7E>": 129,
+      "<0x7F>": 130,
+      "<0x80>": 131,
+      "<0x81>": 132,
+      "<0x82>": 133,
+      "<0x83>": 134,
+      "<0x84>": 135,
+      "<0x85>": 136,
+      "<0x86>": 137,
+      "<0x87>": 138,
+      "<0x88>": 139,
+      "<0x89>": 140,
+      "<0x8A>": 141,
+      "<0x8B>": 142,
+      "<0x8C>": 143,
+      "<0x8D>": 144,
+      "<0x8E>": 145,
+      "<0x8F>": 146,
+      "<0x90>": 147,
+      "<0x91>": 148,
+      "<0x92>": 149,
+      "<0x93>": 150,
+      "<0x94>": 151,
+      "<0x95>": 152,
+      "<0x96>": 153,
+      "<0x97>": 154,
+      "<0x98>": 155,
+      "<0x99>": 156,
+      "<0x9A>": 157,
+      "<0x9B>": 158,
+      "<0x9C>": 159,
+      "<0x9D>": 160,
+      "<0x9E>": 161,
+      "<0x9F>": 162,
+      "<0xA0>": 163,
+      "<0xA1>": 164,
+      "<0xA2>": 165,
+      "<0xA3>": 166,
+      "<0xA4>": 167,
+      "<0xA5>": 168,
+      "<0xA6>": 169,
+      "<0xA7>": 170,
+      "<0xA8>": 171,
+      "<0xA9>": 172,
+      "<0xAA>": 173,
+      "<0xAB>": 174,
+      "<0xAC>": 175,
+      "<0xAD>": 176,
+      "<0xAE>": 177,
+      "<0xAF>": 178,
+      "<0xB0>": 179,
+      "<0xB1>": 180,
+      "<0xB2>": 181,
+      "<0xB3>": 182,
+      "<0xB4>": 183,
+      "<0xB5>": 184,
+      "<0xB6>": 185,
+      "<0xB7>": 186,
+      "<0xB8>": 187,
+      "<0xB9>": 188,
+      "<0xBA>": 189,
+      "<0xBB>": 190,
+      "<0xBC>": 191,
+      "<0xBD>": 192,
+      "<0xBE>": 193,
+      "<0xBF>": 194,
+      "<0xC0>": 195,
+      "<0xC1>": 196,
+      "<0xC2>": 197,
+      "<0xC3>": 198,
+      "<0xC4>": 199,
+      "<0xC5>": 200,
+      "<0xC6>": 201,
+      "<0xC7>": 202,
+      "<0xC8>": 203,
+      "<0xC9>": 204,
+      "<0xCA>": 205,
+      "<0xCB>": 206,
+      "<0xCC>": 207,
+      "<0xCD>": 208,
+      "<0xCE>": 209,
+      "<0xCF>": 210,
+      "<0xD0>": 211,
+      "<0xD1>": 212,
+      "<0xD2>": 213,
+      "<0xD3>": 214,
+      "<0xD4>": 215,
+      "<0xD5>": 216,
+      "<0xD6>": 217,
+      "<0xD7>": 218,
+      "<0xD8>": 219,
+      "<0xD9>": 220,
+      "<0xDA>": 221,
+      "<0xDB>": 222,
+      "<0xDC>": 223,
+      "<0xDD>": 224,
+      "<0xDE>": 225,
+      "<0xDF>": 226,
+      "<0xE0>": 227,
+      "<0xE1>": 228,
+      "<0xE2>": 229,
+      "<0xE3>": 230,
+      "<0xE4>": 231,
+      "<0xE5>": 232,
+      "<0xE6>": 233,
+      "<0xE7>": 234,
+      "<0xE8>": 235,
+      "<0xE9>": 236,
+      "<0xEA>": 237,
+      "<0xEB>": 238,
+      "<0xEC>": 239,
+      "<0xED>": 240,
+      "<0xEE>": 241,
+      "<0xEF>": 242,
+      "<0xF0>": 243,
+      "<0xF1>": 244,
+      "<0xF2>": 245,
+      "<0xF3>": 246,
+      "<0xF4>": 247,
+      "<0xF5>": 248,
+      "<0xF6>": 249,
+      "<0xF7>": 250,
+      "<0xF8>": 251,
+      "<0xF9>": 252,
+      "<0xFA>": 253,
+      "<0xFB>": 254,
+      "<0xFC>": 255,
+      "<0xFD>": 256,
+      "<0xFE>": 257,
+      "<0xFF>": 258,
+      "▁": 259,
+      "e": 260,
+      "t": 261,
+      "a": 262,
+      "o": 263,
+      "i": 264,
+      "n": 265,
+      "s": 266,
+      "r": 267,
+      "h": 268,
+      "l": 269,
+      "d": 270,
+      "c": 271,
+      "u": 272,
+      "m": 273,
+      "f": 274,
+      "p": 275,
+      "g": 276,
+      "y": 277,
+      "w": 278,
+      "b": 279,
+      ".": 280,
+      ",": 281,
+      "v": 282,
+      "k": 283,
+      "T": 284,
+      "I": 285,
+      "A": 286,
+      "S": 287,
+      "-": 288,
+      "1": 289,
+      "C": 290,
+      "0": 291,
+      "x": 292,
+      "’": 293,
+      "P": 294,
+      "M": 295,
+      "2": 296,
+      "B": 297,
+      "W": 298,
+      "E": 299,
+      "D": 300,
+      "H": 301,
+      ")": 302,
+      "(": 303,
+      "F": 304,
+      "O": 305
+    },
+    "merges": []
+  }
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "add_bos_token": false,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": true,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "chat_template": "{% for message in messages %}{% if message['role'] == 'system' %}{{'<|system|>\n' + message['content'] + '<|end|>\n'}}{% elif message['role'] == 'user' %}{{'<|user|>\n' + message['content'] + '<|end|>\n'}}{% elif message['role'] == 'assistant' %}{{'<|assistant|>\n' + message['content'] + '<|end|>\n'}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>\n' }}{% else %}{{ eos_token }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "legacy": false,
+  "model_max_length": 4096,
+  "pad_token": "</s>",
+  "padding_side": "left",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": "<unk>",
+  "use_default_system_prompt": false
+}