Add trained tokenizer

Files changed (2) hide show

tokenizer/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer/tokenizer_config.json ADDED Viewed

+{
+  "vocab_size": 32000,
+  "max_seq_len": 128,
+  "pad_id": 0,
+  "cls_id": 2,
+  "sep_id": 3,
+  "pad_token": "[PAD]",
+  "cls_token": "[CLS]",
+  "sep_token": "[SEP]"
+}